hfs: prevent MDB and bitmap buffer_head aliasing

[PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Yue Sun 1 week, 3 days ago

hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)->mdb_bh is
locked. A crafted image can set drVBMSt so that the bitmap block resolves
to the same buffer_head as the MDB. When writeback later calls
lock_buffer() for that bitmap block, the task tries to lock mdb_bh again
and self-deadlocks in __lock_buffer().

Reject images whose volume bitmap starts at or before the MDB during
mount. Also guard the bitmap writeback path itself: if the bitmap block
would resolve to mdb_bh, force the filesystem read-only and stop bitmap
writeback before taking the buffer lock. This keeps the deadlock fix in
the MDB commit path and reuses the existing bitmap size/writeback logic.

Reported-by: Yue Sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/all/CAEkJfYMB47v1yOWHB8q2dc8kf=uj-rLO=+yMyudwPguJ8Kd3jA@mail.gmail.com/
Signed-off-by: Yue Sun <samsun1006219@gmail.com>
---
Changes in v2:
- Add a commit-time guard before locking bitmap buffer_heads.
- Replace the mount-time byte-range check with a simple drVBMSt check.
- Reuse the existing bitmap writeback size calculation.

 fs/hfs/mdb.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..53cd137892b5 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
 		sb->s_flags |= SB_RDONLY;
 	}
 
+	if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
+		pr_err("volume bitmap overlaps MDB\n");
+		return -EIO;
+	}
+
 	/* TRY to get the alternate (backup) MDB. */
 	sect = part_start + part_size - 2;
 	bh = sb_bread512(sb, sect, mdb2);
@@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block *sb)
 		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
 		ptr = (u8 *)HFS_SB(sb)->bitmap;
 		while (size) {
+			if (unlikely(block == HFS_SB(sb)->mdb_bh->b_blocknr)) {
+				pr_err("volume bitmap overlaps MDB, forcing read-only\n");
+				sb->s_flags |= SB_RDONLY;
+				break;
+			}
 			bh = sb_bread(sb, block);
 			if (!bh) {
 				pr_err("unable to read volume bitmap\n");
-- 
2.34.1

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 1 week, 2 days ago

On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)->mdb_bh is
> locked. A crafted image can set drVBMSt so that the bitmap block resolves
> to the same buffer_head as the MDB. When writeback later calls
> lock_buffer() for that bitmap block, the task tries to lock mdb_bh again
> and self-deadlocks in __lock_buffer().
> 
> Reject images whose volume bitmap starts at or before the MDB during
> mount. Also guard the bitmap writeback path itself: if the bitmap block
> would resolve to mdb_bh, force the filesystem read-only and stop bitmap
> writeback before taking the buffer lock. This keeps the deadlock fix in
> the MDB commit path and reuses the existing bitmap size/writeback logic.
> 
> Reported-by: Yue Sun <samsun1006219@gmail.com>
> Closes: https://lore.kernel.org/all/CAEkJfYMB47v1yOWHB8q2dc8kf=uj-rLO=+yMyudwPguJ8Kd3jA@mail.gmail.com/
> Signed-off-by: Yue Sun <samsun1006219@gmail.com>
> ---
> Changes in v2:
> - Add a commit-time guard before locking bitmap buffer_heads.
> - Replace the mount-time byte-range check with a simple drVBMSt check.
> - Reuse the existing bitmap writeback size calculation.
> 
>  fs/hfs/mdb.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..53cd137892b5 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
>  		sb->s_flags |= SB_RDONLY;
>  	}
>  
> +	if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {

Technically speaking, if we are trying to check the overlapping of volume bitmap
with the main MDB record, then we need to check the overlapping with alternative
MDB record, and with Catalog File and Extents Overflow File. However, it sounds
like we are trying to add some FSCK logic here. :)

> +		pr_err("volume bitmap overlaps MDB\n");

This situation means volume corruption. It makes sense to recommend to run FSCK.

> +		return -EIO;

This code error is wrong because the read operation was OK. But we have
corrupted volume. Even if we have overlapping of volume bitmap with MDB record,
then we cannot reject mount operation. We must mount in READ-ONLY mode because,
potentially, the rest of metadata could be completely OK. We simply cannot mount
in RW mode.

> +	}
> +
>  	/* TRY to get the alternate (backup) MDB. */
>  	sect = part_start + part_size - 2;
>  	bh = sb_bread512(sb, sect, mdb2);
> @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block *sb)
>  		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
>  		ptr = (u8 *)HFS_SB(sb)->bitmap;
>  		while (size) {
> +			if (unlikely(block == HFS_SB(sb)->mdb_bh->b_blocknr)) {
> +				pr_err("volume bitmap overlaps MDB, forcing read-only\n");
> +				sb->s_flags |= SB_RDONLY;
> +				break;
> +			}

At this point, we already wrote main MDB and alternative MDB to the volume.
Theoretically, it is possible to imagine that if size of volume bitmap is big
enough, then we could partially process the bitmap too. Probably, we need to
check the overlapping at the beginning of the method and reject the whole
superblock commit.

Initial issue was the deadlock. Could we implement some check that buffer_heads
don't overlap before trying to lock? Does it make sense to you?

Thanks,
Slava.

>  			bh = sb_bread(sb, block);
>  			if (!bh) {
>  				pr_err("unable to read volume bitmap\n");

Re: [EXTERNAL] Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 1 week, 2 days ago

On Fri, 2026-05-29 at 14:29 -0700, Viacheslav Dubeyko wrote:
> On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> > hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)->mdb_bh is
> > locked. A crafted image can set drVBMSt so that the bitmap block resolves
> > to the same buffer_head as the MDB. When writeback later calls
> > lock_buffer() for that bitmap block, the task tries to lock mdb_bh again
> > and self-deadlocks in __lock_buffer().
> > 
> > Reject images whose volume bitmap starts at or before the MDB during
> > mount. Also guard the bitmap writeback path itself: if the bitmap block
> > would resolve to mdb_bh, force the filesystem read-only and stop bitmap
> > writeback before taking the buffer lock. This keeps the deadlock fix in
> > the MDB commit path and reuses the existing bitmap size/writeback logic.
> > 
> > Reported-by: Yue Sun <samsun1006219@gmail.com>
> > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_CAEkJfYMB47v1yOWHB8q2dc8kf-3Duj-2DrLO-3D-2ByMyudwPguJ8Kd3jA-40mail.gmail.com_&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=2ATrdgpvRdYnJfz2T53Ih3iQv6Y9wogU2Ba19prTYgchh1mh4KI7lo9TYLQnlu3X&s=S8XdWVW0SpxB05CfctrqCJtJEXqtWDfLNdtzPHIZAY8&e= 
> > Signed-off-by: Yue Sun <samsun1006219@gmail.com>
> > ---
> > Changes in v2:
> > - Add a commit-time guard before locking bitmap buffer_heads.
> > - Replace the mount-time byte-range check with a simple drVBMSt check.
> > - Reuse the existing bitmap writeback size calculation.
> > 
> >  fs/hfs/mdb.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..53cd137892b5 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
> >  		sb->s_flags |= SB_RDONLY;
> >  	}
> >  
> > +	if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
> 
> Technically speaking, if we are trying to check the overlapping of volume bitmap
> with the main MDB record, then we need to check the overlapping with alternative
> MDB record, and with Catalog File and Extents Overflow File. However, it sounds
> like we are trying to add some FSCK logic here. :)
> 
> > +		pr_err("volume bitmap overlaps MDB\n");
> 
> This situation means volume corruption. It makes sense to recommend to run FSCK.
> 
> > +		return -EIO;
> 
> This code error is wrong because the read operation was OK. But we have
> corrupted volume. Even if we have overlapping of volume bitmap with MDB record,
> then we cannot reject mount operation. We must mount in READ-ONLY mode because,
> potentially, the rest of metadata could be completely OK. We simply cannot mount
> in RW mode.
> 
> > +	}
> > +
> >  	/* TRY to get the alternate (backup) MDB. */
> >  	sect = part_start + part_size - 2;
> >  	bh = sb_bread512(sb, sect, mdb2);
> > @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block *sb)
> >  		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> >  		ptr = (u8 *)HFS_SB(sb)->bitmap;
> >  		while (size) {
> > +			if (unlikely(block == HFS_SB(sb)->mdb_bh->b_blocknr)) {
> > +				pr_err("volume bitmap overlaps MDB, forcing read-only\n");
> > +				sb->s_flags |= SB_RDONLY;
> > +				break;
> > +			}
> 
> At this point, we already wrote main MDB and alternative MDB to the volume.
> Theoretically, it is possible to imagine that if size of volume bitmap is big
> enough, then we could partially process the bitmap too. Probably, we need to
> check the overlapping at the beginning of the method and reject the whole
> superblock commit.
> 
> Initial issue was the deadlock. Could we implement some check that buffer_heads
> don't overlap before trying to lock? Does it make sense to you?
> 

By the way, I can see the deadlock in hfs_mdb_commit() even for completely valid
HFS volume during generic/013 test run. This issue takes place because folio
lock related issue. I remember that somebody has sent the patch related to this
issue but we haven't finished the discussion with some reasonable solution. I
think we need to rework the locking scheme:

diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index 97e8d1f96d6d..919798eda0f8 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -64,6 +64,7 @@ struct hfs_inode_info {
  * The HFS-specific part of a Linux (struct super_block)
  */
 struct hfs_sb_info {
+	struct mutex mdb_lock;			/* MDB operations lock */
 	struct buffer_head *mdb_bh;		/* The hfs_buffer
 						   holding the real
 						   superblock (aka VIB
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..94497c155d29 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -291,9 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
 	if (sb_rdonly(sb))
 		return;
 
-	lock_buffer(HFS_SB(sb)->mdb_bh);
 	if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
 		/* These parameters may have been modified, so write them back
*/
+		lock_buffer(HFS_SB(sb)->mdb_bh);
 		mdb->drLsMod = hfs_mtime();
 		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
 		mdb->drNxtCNID =
@@ -304,6 +304,7 @@ void hfs_mdb_commit(struct super_block *sb)
 			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
>file_count));
 		mdb->drDirCnt =
 			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
>folder_count));
+		unlock_buffer(HFS_SB(sb)->mdb_bh);
 
 		/* write MDB to disk */
 		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
@@ -360,7 +361,6 @@ void hfs_mdb_commit(struct super_block *sb)
 			size -= len;
 		}
 	}
-	unlock_buffer(HFS_SB(sb)->mdb_bh);
 }
 
 void hfs_mdb_close(struct super_block *sb)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index a466c401f6bb..5d4caf3ddda6 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
 static int hfs_sync_fs(struct super_block *sb, int wait)
 {
 	is_hfs_cnid_counts_valid(sb);
+
+	mutex_lock(&HFS_SB(sb)->mdb_lock);
 	hfs_mdb_commit(sb);
+	mutex_unlock(&HFS_SB(sb)->mdb_lock);
+
 	return 0;
 }
 
@@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
 
 	is_hfs_cnid_counts_valid(sb);
 
+	mutex_lock(&sbi->mdb_lock);
 	hfs_mdb_commit(sb);
+	mutex_unlock(&sbi->mdb_lock);
 }
 
 void hfs_mark_mdb_dirty(struct super_block *sb)
@@ -339,9 +345,13 @@ static int hfs_fill_super(struct super_block *sb, struct
fs_context *fc)
 	sb->s_op = &hfs_super_operations;
 	sb->s_xattr = hfs_xattr_handlers;
 	sb->s_flags |= SB_NOATIME | SB_NODIRATIME;
+	mutex_init(&sbi->mdb_lock);
 	mutex_init(&sbi->bitmap_lock);
 
+	mutex_lock(&sbi->mdb_lock);
 	res = hfs_mdb_get(sb);
+	mutex_unlock(&sbi->mdb_lock);
+
 	if (res) {
 		if (!silent)
 			pr_warn("can't find a HFS filesystem on dev %s\n",

What do you think?

Thanks,
Slava.

Re: [EXTERNAL] Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Sam Sun 6 days, 20 hours ago

On Sat, May 30, 2026 at 7:23 AM Viacheslav Dubeyko <vdubeyko@redhat.com> wrote:
>
> On Fri, 2026-05-29 at 14:29 -0700, Viacheslav Dubeyko wrote:
> > On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> > > hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)->mdb_bh is
> > > locked. A crafted image can set drVBMSt so that the bitmap block resolves
> > > to the same buffer_head as the MDB. When writeback later calls
> > > lock_buffer() for that bitmap block, the task tries to lock mdb_bh again
> > > and self-deadlocks in __lock_buffer().
> > >
> > > Reject images whose volume bitmap starts at or before the MDB during
> > > mount. Also guard the bitmap writeback path itself: if the bitmap block
> > > would resolve to mdb_bh, force the filesystem read-only and stop bitmap
> > > writeback before taking the buffer lock. This keeps the deadlock fix in
> > > the MDB commit path and reuses the existing bitmap size/writeback logic.
> > >
> > > Reported-by: Yue Sun <samsun1006219@gmail.com>
> > > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_CAEkJfYMB47v1yOWHB8q2dc8kf-3Duj-2DrLO-3D-2ByMyudwPguJ8Kd3jA-40mail.gmail.com_&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=2ATrdgpvRdYnJfz2T53Ih3iQv6Y9wogU2Ba19prTYgchh1mh4KI7lo9TYLQnlu3X&s=S8XdWVW0SpxB05CfctrqCJtJEXqtWDfLNdtzPHIZAY8&e=
> > > Signed-off-by: Yue Sun <samsun1006219@gmail.com>
> > > ---
> > > Changes in v2:
> > > - Add a commit-time guard before locking bitmap buffer_heads.
> > > - Replace the mount-time byte-range check with a simple drVBMSt check.
> > > - Reuse the existing bitmap writeback size calculation.
> > >
> > >  fs/hfs/mdb.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > index a97cea35ca2e..53cd137892b5 100644
> > > --- a/fs/hfs/mdb.c
> > > +++ b/fs/hfs/mdb.c
> > > @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
> > >             sb->s_flags |= SB_RDONLY;
> > >     }
> > >
> > > +   if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
> >
> > Technically speaking, if we are trying to check the overlapping of volume bitmap
> > with the main MDB record, then we need to check the overlapping with alternative
> > MDB record, and with Catalog File and Extents Overflow File. However, it sounds
> > like we are trying to add some FSCK logic here. :)
> >
> > > +           pr_err("volume bitmap overlaps MDB\n");
> >
> > This situation means volume corruption. It makes sense to recommend to run FSCK.
> >
> > > +           return -EIO;
> >
> > This code error is wrong because the read operation was OK. But we have
> > corrupted volume. Even if we have overlapping of volume bitmap with MDB record,
> > then we cannot reject mount operation. We must mount in READ-ONLY mode because,
> > potentially, the rest of metadata could be completely OK. We simply cannot mount
> > in RW mode.
> >
> > > +   }
> > > +
> > >     /* TRY to get the alternate (backup) MDB. */
> > >     sect = part_start + part_size - 2;
> > >     bh = sb_bread512(sb, sect, mdb2);
> > > @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block *sb)
> > >             size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> > >             ptr = (u8 *)HFS_SB(sb)->bitmap;
> > >             while (size) {
> > > +                   if (unlikely(block == HFS_SB(sb)->mdb_bh->b_blocknr)) {
> > > +                           pr_err("volume bitmap overlaps MDB, forcing read-only\n");
> > > +                           sb->s_flags |= SB_RDONLY;
> > > +                           break;
> > > +                   }
> >
> > At this point, we already wrote main MDB and alternative MDB to the volume.
> > Theoretically, it is possible to imagine that if size of volume bitmap is big
> > enough, then we could partially process the bitmap too. Probably, we need to
> > check the overlapping at the beginning of the method and reject the whole
> > superblock commit.
> >
> > Initial issue was the deadlock. Could we implement some check that buffer_heads
> > don't overlap before trying to lock? Does it make sense to you?
> >
>
> By the way, I can see the deadlock in hfs_mdb_commit() even for completely valid
> HFS volume during generic/013 test run. This issue takes place because folio
> lock related issue. I remember that somebody has sent the patch related to this
> issue but we haven't finished the discussion with some reasonable solution. I
> think we need to rework the locking scheme:
>
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index 97e8d1f96d6d..919798eda0f8 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -64,6 +64,7 @@ struct hfs_inode_info {
>   * The HFS-specific part of a Linux (struct super_block)
>   */
>  struct hfs_sb_info {
> +       struct mutex mdb_lock;                  /* MDB operations lock */
>         struct buffer_head *mdb_bh;             /* The hfs_buffer
>                                                    holding the real
>                                                    superblock (aka VIB
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..94497c155d29 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -291,9 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
>         if (sb_rdonly(sb))
>                 return;
>
> -       lock_buffer(HFS_SB(sb)->mdb_bh);
>         if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
>                 /* These parameters may have been modified, so write them back
> */
> +               lock_buffer(HFS_SB(sb)->mdb_bh);
>                 mdb->drLsMod = hfs_mtime();
>                 mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
>                 mdb->drNxtCNID =
> @@ -304,6 +304,7 @@ void hfs_mdb_commit(struct super_block *sb)
>                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >file_count));
>                 mdb->drDirCnt =
>                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >folder_count));
> +               unlock_buffer(HFS_SB(sb)->mdb_bh);
>
>                 /* write MDB to disk */
>                 mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> @@ -360,7 +361,6 @@ void hfs_mdb_commit(struct super_block *sb)
>                         size -= len;
>                 }
>         }
> -       unlock_buffer(HFS_SB(sb)->mdb_bh);
>  }
>
>  void hfs_mdb_close(struct super_block *sb)
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index a466c401f6bb..5d4caf3ddda6 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
>  static int hfs_sync_fs(struct super_block *sb, int wait)
>  {
>         is_hfs_cnid_counts_valid(sb);
> +
> +       mutex_lock(&HFS_SB(sb)->mdb_lock);
>         hfs_mdb_commit(sb);
> +       mutex_unlock(&HFS_SB(sb)->mdb_lock);
> +
>         return 0;
>  }
>
> @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
>
>         is_hfs_cnid_counts_valid(sb);
>
> +       mutex_lock(&sbi->mdb_lock);
>         hfs_mdb_commit(sb);
> +       mutex_unlock(&sbi->mdb_lock);
>  }
>
>  void hfs_mark_mdb_dirty(struct super_block *sb)
> @@ -339,9 +345,13 @@ static int hfs_fill_super(struct super_block *sb, struct
> fs_context *fc)
>         sb->s_op = &hfs_super_operations;
>         sb->s_xattr = hfs_xattr_handlers;
>         sb->s_flags |= SB_NOATIME | SB_NODIRATIME;
> +       mutex_init(&sbi->mdb_lock);
>         mutex_init(&sbi->bitmap_lock);
>
> +       mutex_lock(&sbi->mdb_lock);
>         res = hfs_mdb_get(sb);
> +       mutex_unlock(&sbi->mdb_lock);
> +
>         if (res) {
>                 if (!silent)
>                         pr_warn("can't find a HFS filesystem on dev %s\n",
>
> What do you think?
>
> Thanks,
> Slava.
>

Thanks for the detailed comments. I agree that my patch was too
narrow. It only handled the corrupted-layout case without fixing the
more general locking problem in hfs_mdb_commit(), including the case
you mentioned where a valid HFS volume can still deadlock during
xfstests.

I also agree with your main direction that the MDB buffer lock should
not be used as a transaction lock for the whole MDB commit. However,
one detail in the locking sketch worries me, though. After narrowing
the mdb_bh lock, the HFS_FLG_ALT_MDB_DIRTY path still modifies the
primary MDB buffer through hfs_inode_write_fork():
hfs_inode_write_fork(..., mdb->drXTExtRec, &mdb->drXTFlSize, NULL);
hfs_inode_write_fork(..., mdb->drCTExtRec, &mdb->drCTFlSize, NULL);

hfs_inode_write_fork() writes to the extent record and size pointers that
are passed to it, so these are still direct writes into mdb_bh->b_data.
The new HFS-level mdb_lock serializes HFS MDB commits, but generic
buffer writeback does not take that filesystem-private mutex. It only
synchronizes with the buffer_head lock. So I think these primary MDB
updates still need to be covered by lock_buffer(mdb_bh), even if
HFS_FLG_MDB_DIRTY itself is not set.

The approach I am considering is to split the fix into two parts:

1. Rework the locking in hfs_mdb_commit().
   Add an HFS-level mdb_lock and take it inside hfs_mdb_commit(), so all
   MDB commits are serialized at the HFS layer. Then narrow the mdb_bh
   buffer lock so it only covers direct updates of the primary MDB
   buffer, including the hfs_inode_write_fork() calls from the
   HFS_FLG_ALT_MDB_DIRTY path. mark_buffer_dirty(mdb_bh) is done in that
   short critical section. The alternate MDB sync and bitmap writeback
   happen after mdb_bh has been unlocked.

2. Handle the corrupted volume-bitmap layout separately.
   Instead of trying to do broader fsck-style validation, only check the
   condition relevant to this report: whether the volume bitmap's HFS
   512-byte sector range overlaps the main MDB sector. If this is found
   during mount, warn and keep the filesystem read-only, rather than
   returning -EIO. In hfs_mdb_commit(), check the same condition before
   clearing dirty bits or doing any writeback, force read-only, and
   return. I also keep a flag for this condition so a later remount-rw
   attempt leaves the filesystem read-only as well.

Draft patches are attached to this email. What do you think?

Thanks,
Yue
--- draft patch 1: hfs: serialize MDB commits and narrow mdb_bh locking ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..e4ed9071c498 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -124,6 +124,7 @@ struct hfs_sb_info {
 
 	int session, part;
 	struct nls_table *nls_io, *nls_disk;
+	struct mutex mdb_lock;			/* serializes MDB updates */
 	struct mutex bitmap_lock;
 	unsigned long flags;
 	u16 blockoffset;
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..991013ae0c33 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -286,60 +286,74 @@ int hfs_mdb_get(struct super_block *sb)
  */
 void hfs_mdb_commit(struct super_block *sb)
 {
-	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
+	struct hfs_sb_info *sbi = HFS_SB(sb);
+	struct hfs_mdb *mdb = sbi->mdb;
+	bool mdb_dirty, alt_dirty, bitmap_dirty;
 
 	if (sb_rdonly(sb))
 		return;
 
-	lock_buffer(HFS_SB(sb)->mdb_bh);
-	if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
+	mutex_lock(&sbi->mdb_lock);
+	if (sb_rdonly(sb))
+		goto out;
+
+	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
+	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
+		    sbi->alt_mdb;
+	bitmap_dirty = test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &sbi->flags);
+
+	if (mdb_dirty || alt_dirty)
+		lock_buffer(sbi->mdb_bh);
+	if (mdb_dirty) {
 		/* These parameters may have been modified, so write them back */
 		mdb->drLsMod = hfs_mtime();
-		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
+		mdb->drFreeBks = cpu_to_be16(sbi->free_ablocks);
 		mdb->drNxtCNID =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->next_id));
-		mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
-		mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
+			cpu_to_be32((u32)atomic64_read(&sbi->next_id));
+		mdb->drNmFls = cpu_to_be16(sbi->root_files);
+		mdb->drNmRtDirs = cpu_to_be16(sbi->root_dirs);
 		mdb->drFilCnt =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->file_count));
+			cpu_to_be32((u32)atomic64_read(&sbi->file_count));
 		mdb->drDirCnt =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->folder_count));
-
-		/* write MDB to disk */
-		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
+			cpu_to_be32((u32)atomic64_read(&sbi->folder_count));
 	}
 
 	/* write the backup MDB, not returning until it is written.
 	 * we only do this when either the catalog or extents overflow
 	 * files grow. */
-	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)->flags) &&
-	    HFS_SB(sb)->alt_mdb) {
-		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb->drXTExtRec,
+	if (alt_dirty) {
+		hfs_inode_write_fork(sbi->ext_tree->inode, mdb->drXTExtRec,
 				     &mdb->drXTFlSize, NULL);
-		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb->drCTExtRec,
+		hfs_inode_write_fork(sbi->cat_tree->inode, mdb->drCTExtRec,
 				     &mdb->drCTFlSize, NULL);
+	}
+	if (mdb_dirty || alt_dirty) {
+		mark_buffer_dirty(sbi->mdb_bh);
+		unlock_buffer(sbi->mdb_bh);
+	}
 
-		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
-		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb, HFS_SECTOR_SIZE);
-		HFS_SB(sb)->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
-		HFS_SB(sb)->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
-		unlock_buffer(HFS_SB(sb)->alt_mdb_bh);
+	if (alt_dirty) {
+		lock_buffer(sbi->alt_mdb_bh);
+		memcpy(sbi->alt_mdb, sbi->mdb, HFS_SECTOR_SIZE);
+		sbi->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
+		sbi->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
+		unlock_buffer(sbi->alt_mdb_bh);
 
-		mark_buffer_dirty(HFS_SB(sb)->alt_mdb_bh);
-		sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
+		mark_buffer_dirty(sbi->alt_mdb_bh);
+		sync_dirty_buffer(sbi->alt_mdb_bh);
 	}
 
-	if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)->flags)) {
+	if (bitmap_dirty) {
 		struct buffer_head *bh;
 		sector_t block;
 		char *ptr;
 		int off, size, len;
 
-		block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) + HFS_SB(sb)->part_start;
+		block = be16_to_cpu(sbi->mdb->drVBMSt) + sbi->part_start;
 		off = (block << HFS_SECTOR_SIZE_BITS) & (sb->s_blocksize - 1);
 		block >>= sb->s_blocksize_bits - HFS_SECTOR_SIZE_BITS;
-		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
-		ptr = (u8 *)HFS_SB(sb)->bitmap;
+		size = (sbi->fs_ablocks + 7) / 8;
+		ptr = (u8 *)sbi->bitmap;
 		while (size) {
 			bh = sb_bread(sb, block);
 			if (!bh) {
@@ -360,7 +374,9 @@ void hfs_mdb_commit(struct super_block *sb)
 			size -= len;
 		}
 	}
-	unlock_buffer(HFS_SB(sb)->mdb_bh);
+
+out:
+	mutex_unlock(&sbi->mdb_lock);
 }
 
 void hfs_mdb_close(struct super_block *sb)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index a4f2a2bfa6d3..60a93c31b3ff 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -339,6 +339,7 @@ static int hfs_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_op = &hfs_super_operations;
 	sb->s_xattr = hfs_xattr_handlers;
 	sb->s_flags |= SB_NODIRATIME;
+	mutex_init(&sbi->mdb_lock);
 	mutex_init(&sbi->bitmap_lock);
 
 	res = hfs_mdb_get(sb);

--- draft patch 2: hfs: detect volume bitmap overlap with MDB ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index e4ed9071c498..df6ed1298151 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -138,6 +138,7 @@ struct hfs_sb_info {
 #define HFS_FLG_BITMAP_DIRTY	0
 #define HFS_FLG_MDB_DIRTY	1
 #define HFS_FLG_ALT_MDB_DIRTY	2
+#define HFS_FLG_VBM_OVERLAP	3
 
 /* bitmap.c */
 extern u32 hfs_vbm_search_free(struct super_block *sb, u32 goal, u32 *num_bits);
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index 991013ae0c33..4b6f1df5e4a8 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -85,6 +85,24 @@ bool is_hfs_cnid_counts_valid(struct super_block *sb)
 	return !corrupted;
 }
 
+static bool hfs_vbm_overlaps_mdb(struct super_block *sb)
+{
+	struct hfs_sb_info *sbi = HFS_SB(sb);
+	sector_t mdb_start, vbm_start, vbm_end;
+	u32 vbm_bytes, vbm_sectors;
+
+	vbm_bytes = (sbi->fs_ablocks + 7) / 8;
+	if (!vbm_bytes)
+		return false;
+
+	mdb_start = sbi->part_start + HFS_MDB_BLK;
+	vbm_start = sbi->part_start + be16_to_cpu(sbi->mdb->drVBMSt);
+	vbm_sectors = DIV_ROUND_UP(vbm_bytes, HFS_SECTOR_SIZE);
+	vbm_end = vbm_start + vbm_sectors;
+
+	return vbm_start < mdb_start + 1 && mdb_start < vbm_end;
+}
+
 /*
  * hfs_mdb_get()
  *
@@ -185,6 +203,12 @@ int hfs_mdb_get(struct super_block *sb)
 		sb->s_flags |= SB_RDONLY;
 	}
 
+	if (hfs_vbm_overlaps_mdb(sb)) {
+		pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended. Mounting read-only.\n");
+		set_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags);
+		sb->s_flags |= SB_RDONLY;
+	}
+
 	/* TRY to get the alternate (backup) MDB. */
 	sect = part_start + part_size - 2;
 	bh = sb_bread512(sb, sect, mdb2);
@@ -296,6 +320,13 @@ void hfs_mdb_commit(struct super_block *sb)
 	mutex_lock(&sbi->mdb_lock);
 	if (sb_rdonly(sb))
 		goto out;
+	if (test_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags) ||
+	    hfs_vbm_overlaps_mdb(sb)) {
+		pr_err("volume bitmap overlaps MDB, forcing read-only\n");
+		set_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags);
+		sb->s_flags |= SB_RDONLY;
+		goto out;
+	}
 
 	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
 	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 60a93c31b3ff..81de9607ab24 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -133,6 +133,10 @@ static int hfs_reconfigure(struct fs_context *fc)
 			pr_warn("filesystem is marked locked, leaving read-only.\n");
 			sb->s_flags |= SB_RDONLY;
 			fc->sb_flags |= SB_RDONLY;
+		} else if (test_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags)) {
+			pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended.  leaving read-only.\n");
+			sb->s_flags |= SB_RDONLY;
+			fc->sb_flags |= SB_RDONLY;
 		}
 	}
 	return 0;

Re: [EXTERNAL] Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by slava@dubeyko.com 6 days, 16 hours ago

On Mon, 2026-06-01 at 22:01 +0800, Sam Sun wrote:
> On Sat, May 30, 2026 at 7:23 AM Viacheslav Dubeyko
> <vdubeyko@redhat.com> wrote:
> > 
> > On Fri, 2026-05-29 at 14:29 -0700, Viacheslav Dubeyko wrote:
> > > On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> > > > hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)-
> > > > >mdb_bh is
> > > > locked. A crafted image can set drVBMSt so that the bitmap
> > > > block resolves
> > > > to the same buffer_head as the MDB. When writeback later calls
> > > > lock_buffer() for that bitmap block, the task tries to lock
> > > > mdb_bh again
> > > > and self-deadlocks in __lock_buffer().
> > > > 
> > > > Reject images whose volume bitmap starts at or before the MDB
> > > > during
> > > > mount. Also guard the bitmap writeback path itself: if the
> > > > bitmap block
> > > > would resolve to mdb_bh, force the filesystem read-only and
> > > > stop bitmap
> > > > writeback before taking the buffer lock. This keeps the
> > > > deadlock fix in
> > > > the MDB commit path and reuses the existing bitmap
> > > > size/writeback logic.
> > > > 
> > > > Reported-by: Yue Sun <samsun1006219@gmail.com>
> > > > Closes:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_CAEkJfYMB47v1yOWHB8q2dc8kf-3Duj-2DrLO-3D-2ByMyudwPguJ8Kd3jA-40mail.gmail.com_&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=2ATrdgpvRdYnJfz2T53Ih3iQv6Y9wogU2Ba19prTYgchh1mh4KI7lo9TYLQnlu3X&s=S8XdWVW0SpxB05CfctrqCJtJEXqtWDfLNdtzPHIZAY8&e=
> > > > Signed-off-by: Yue Sun <samsun1006219@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > > - Add a commit-time guard before locking bitmap buffer_heads.
> > > > - Replace the mount-time byte-range check with a simple drVBMSt
> > > > check.
> > > > - Reuse the existing bitmap writeback size calculation.
> > > > 
> > > >  fs/hfs/mdb.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > > index a97cea35ca2e..53cd137892b5 100644
> > > > --- a/fs/hfs/mdb.c
> > > > +++ b/fs/hfs/mdb.c
> > > > @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
> > > >             sb->s_flags |= SB_RDONLY;
> > > >     }
> > > > 
> > > > +   if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
> > > 
> > > Technically speaking, if we are trying to check the overlapping
> > > of volume bitmap
> > > with the main MDB record, then we need to check the overlapping
> > > with alternative
> > > MDB record, and with Catalog File and Extents Overflow File.
> > > However, it sounds
> > > like we are trying to add some FSCK logic here. :)
> > > 
> > > > +           pr_err("volume bitmap overlaps MDB\n");
> > > 
> > > This situation means volume corruption. It makes sense to
> > > recommend to run FSCK.
> > > 
> > > > +           return -EIO;
> > > 
> > > This code error is wrong because the read operation was OK. But
> > > we have
> > > corrupted volume. Even if we have overlapping of volume bitmap
> > > with MDB record,
> > > then we cannot reject mount operation. We must mount in READ-ONLY
> > > mode because,
> > > potentially, the rest of metadata could be completely OK. We
> > > simply cannot mount
> > > in RW mode.
> > > 
> > > > +   }
> > > > +
> > > >     /* TRY to get the alternate (backup) MDB. */
> > > >     sect = part_start + part_size - 2;
> > > >     bh = sb_bread512(sb, sect, mdb2);
> > > > @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block
> > > > *sb)
> > > >             size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> > > >             ptr = (u8 *)HFS_SB(sb)->bitmap;
> > > >             while (size) {
> > > > +                   if (unlikely(block == HFS_SB(sb)->mdb_bh-
> > > > >b_blocknr)) {
> > > > +                           pr_err("volume bitmap overlaps MDB,
> > > > forcing read-only\n");
> > > > +                           sb->s_flags |= SB_RDONLY;
> > > > +                           break;
> > > > +                   }
> > > 
> > > At this point, we already wrote main MDB and alternative MDB to
> > > the volume.
> > > Theoretically, it is possible to imagine that if size of volume
> > > bitmap is big
> > > enough, then we could partially process the bitmap too. Probably,
> > > we need to
> > > check the overlapping at the beginning of the method and reject
> > > the whole
> > > superblock commit.
> > > 
> > > Initial issue was the deadlock. Could we implement some check
> > > that buffer_heads
> > > don't overlap before trying to lock? Does it make sense to you?
> > > 
> > 
> > By the way, I can see the deadlock in hfs_mdb_commit() even for
> > completely valid
> > HFS volume during generic/013 test run. This issue takes place
> > because folio
> > lock related issue. I remember that somebody has sent the patch
> > related to this
> > issue but we haven't finished the discussion with some reasonable
> > solution. I
> > think we need to rework the locking scheme:
> > 
> > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > index 97e8d1f96d6d..919798eda0f8 100644
> > --- a/fs/hfs/hfs_fs.h
> > +++ b/fs/hfs/hfs_fs.h
> > @@ -64,6 +64,7 @@ struct hfs_inode_info {
> >   * The HFS-specific part of a Linux (struct super_block)
> >   */
> >  struct hfs_sb_info {
> > +       struct mutex mdb_lock;                  /* MDB operations
> > lock */
> >         struct buffer_head *mdb_bh;             /* The hfs_buffer
> >                                                    holding the real
> >                                                    superblock (aka
> > VIB
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..94497c155d29 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -291,9 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
> >         if (sb_rdonly(sb))
> >                 return;
> > 
> > -       lock_buffer(HFS_SB(sb)->mdb_bh);
> >         if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)-
> > >flags)) {
> >                 /* These parameters may have been modified, so
> > write them back
> > */
> > +               lock_buffer(HFS_SB(sb)->mdb_bh);
> >                 mdb->drLsMod = hfs_mtime();
> >                 mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > >free_ablocks);
> >                 mdb->drNxtCNID =
> > @@ -304,6 +304,7 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > file_count));
> >                 mdb->drDirCnt =
> >                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > folder_count));
> > +               unlock_buffer(HFS_SB(sb)->mdb_bh);
> > 
> >                 /* write MDB to disk */
> >                 mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > @@ -360,7 +361,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         size -= len;
> >                 }
> >         }
> > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> >  }
> > 
> >  void hfs_mdb_close(struct super_block *sb)
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index a466c401f6bb..5d4caf3ddda6 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> >  static int hfs_sync_fs(struct super_block *sb, int wait)
> >  {
> >         is_hfs_cnid_counts_valid(sb);
> > +
> > +       mutex_lock(&HFS_SB(sb)->mdb_lock);
> >         hfs_mdb_commit(sb);
> > +       mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > +
> >         return 0;
> >  }
> > 
> > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> > 
> >         is_hfs_cnid_counts_valid(sb);
> > 
> > +       mutex_lock(&sbi->mdb_lock);
> >         hfs_mdb_commit(sb);
> > +       mutex_unlock(&sbi->mdb_lock);
> >  }
> > 
> >  void hfs_mark_mdb_dirty(struct super_block *sb)
> > @@ -339,9 +345,13 @@ static int hfs_fill_super(struct super_block
> > *sb, struct
> > fs_context *fc)
> >         sb->s_op = &hfs_super_operations;
> >         sb->s_xattr = hfs_xattr_handlers;
> >         sb->s_flags |= SB_NOATIME | SB_NODIRATIME;
> > +       mutex_init(&sbi->mdb_lock);
> >         mutex_init(&sbi->bitmap_lock);
> > 
> > +       mutex_lock(&sbi->mdb_lock);
> >         res = hfs_mdb_get(sb);
> > +       mutex_unlock(&sbi->mdb_lock);
> > +
> >         if (res) {
> >                 if (!silent)
> >                         pr_warn("can't find a HFS filesystem on dev
> > %s\n",
> > 
> > What do you think?
> > 
> > Thanks,
> > Slava.
> > 
> 
> Thanks for the detailed comments. I agree that my patch was too
> narrow. It only handled the corrupted-layout case without fixing the
> more general locking problem in hfs_mdb_commit(), including the case
> you mentioned where a valid HFS volume can still deadlock during
> xfstests.
> 
> I also agree with your main direction that the MDB buffer lock should
> not be used as a transaction lock for the whole MDB commit. However,
> one detail in the locking sketch worries me, though. After narrowing
> the mdb_bh lock, the HFS_FLG_ALT_MDB_DIRTY path still modifies the
> primary MDB buffer through hfs_inode_write_fork():
> hfs_inode_write_fork(..., mdb->drXTExtRec, &mdb->drXTFlSize, NULL);
> hfs_inode_write_fork(..., mdb->drCTExtRec, &mdb->drCTFlSize, NULL);
> 
> hfs_inode_write_fork() writes to the extent record and size pointers
> that
> are passed to it, so these are still direct writes into mdb_bh-
> >b_data.
> The new HFS-level mdb_lock serializes HFS MDB commits, but generic
> buffer writeback does not take that filesystem-private mutex. It only
> synchronizes with the buffer_head lock. So I think these primary MDB
> updates still need to be covered by lock_buffer(mdb_bh), even if
> HFS_FLG_MDB_DIRTY itself is not set.

I don't quite follow what you mean here. If you take a look into the
hfs_inode_write_fork():

void hfs_inode_write_fork(struct inode *inode, struct hfs_extent *ext,
			  __be32 *log_size, __be32 *phys_size)
{
	memcpy(ext, HFS_I(inode)->first_extents,
sizeof(hfs_extent_rec));

	if (log_size)
		*log_size = cpu_to_be32(inode->i_size);
	if (phys_size)
		*phys_size = cpu_to_be32(HFS_I(inode)->alloc_blocks *
					 HFS_SB(inode->i_sb)-
>alloc_blksz);
}

You can see that it is simply extracting values into buffers and
nothing more. No mdb_bh->b_data operations are involved.

> 
> The approach I am considering is to split the fix into two parts:
> 
> 1. Rework the locking in hfs_mdb_commit().
>    Add an HFS-level mdb_lock and take it inside hfs_mdb_commit(), so
> all
>    MDB commits are serialized at the HFS layer. Then narrow the
> mdb_bh
>    buffer lock so it only covers direct updates of the primary MDB
>    buffer, including the hfs_inode_write_fork() calls from the
>    HFS_FLG_ALT_MDB_DIRTY path. mark_buffer_dirty(mdb_bh) is done in
> that
>    short critical section. The alternate MDB sync and bitmap
> writeback
>    happen after mdb_bh has been unlocked.
> 
> 2. Handle the corrupted volume-bitmap layout separately.
>    Instead of trying to do broader fsck-style validation, only check
> the
>    condition relevant to this report: whether the volume bitmap's HFS
>    512-byte sector range overlaps the main MDB sector. If this is
> found
>    during mount, warn and keep the filesystem read-only, rather than
>    returning -EIO. In hfs_mdb_commit(), check the same condition
> before
>    clearing dirty bits or doing any writeback, force read-only, and
>    return. I also keep a flag for this condition so a later remount-
> rw
>    attempt leaves the filesystem read-only as well.
> 
> Draft patches are attached to this email. What do you think?
> 
> 

I don't know how to review attachments. If you would like to share some
code for review, then, please, include the diff into the body of email.

Thanks,
Slava.

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Yue Sun 4 days, 19 hours ago

On Tue, Jun 2, 2026 at 2:46 AM <slava@dubeyko.com> wrote:
>
> I don't quite follow what you mean here. If you take a look into the
> hfs_inode_write_fork():
>
> void hfs_inode_write_fork(struct inode *inode, struct hfs_extent *ext,
>                           __be32 *log_size, __be32 *phys_size)
> {
>         memcpy(ext, HFS_I(inode)->first_extents,
> sizeof(hfs_extent_rec));
>
>         if (log_size)
>                 *log_size = cpu_to_be32(inode->i_size);
>         if (phys_size)
>                 *phys_size = cpu_to_be32(HFS_I(inode)->alloc_blocks *
>                                          HFS_SB(inode->i_sb)-
> >alloc_blksz);
> }
>
> You can see that it is simply extracting values into buffers and
> nothing more. No mdb_bh->b_data operations are involved.

Thanks, I explained this part poorly.

I agree that hfs_inode_write_fork() itself does not know anything about
mdb_bh. My concern is about the actual buffers passed by
hfs_mdb_commit().

In hfs_mdb_get(), the primary MDB is set up by sb_bread512():

        bh = sb_bread512(sb, part_start + HFS_MDB_BLK, mdb);
        ...
        HFS_SB(sb)->mdb_bh = bh;
        HFS_SB(sb)->mdb = mdb;

and sb_bread512() sets the data pointer like this:

        data = (void *)(__bh->b_data + __offset);

So after mount:

        sbi->mdb == sbi->mdb_bh->b_data + __offset

Then in the HFS_FLG_ALT_MDB_DIRTY path we pass MDB fields to
hfs_inode_write_fork():

        hfs_inode_write_fork(sbi->ext_tree->inode, mdb->drXTExtRec,
                             &mdb->drXTFlSize, NULL);
        hfs_inode_write_fork(sbi->cat_tree->inode, mdb->drCTExtRec,
                             &mdb->drCTFlSize, NULL);

In this call site, the ext and log_size arguments point into the primary
MDB buffer. The memcpy() writes drXTExtRec/drCTExtRec, and the
*log_size assignment writes drXTFlSize/drCTFlSize. So the function is
only writing to the buffers passed to it, but these buffers are fields
inside sbi->mdb_bh->b_data.

The race I am worried about is with generic buffer writeback, not with
another hfs_mdb_commit() caller. The new mdb_lock serializes HFS MDB
commits, but generic writeback does not take this filesystem-private
mutex. It synchronizes with buffer users through the buffer_head lock.

One possible interleaving with the shortened mdb_bh lock is:

CPU0: hfs_mdb_commit()                 CPU1: generic buffer writeback
--------------------------------       ------------------------------

mutex_lock(&sbi->mdb_lock)

if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, ...)) {
        lock_buffer(sbi->mdb_bh)
        update primary MDB fields
        unlock_buffer(sbi->mdb_bh)

        mark_buffer_dirty(sbi->mdb_bh)
}

/* about to handle HFS_FLG_ALT_MDB_DIRTY */
<preempted here>
                                       write_dirty_buffer(sbi->mdb_bh)

                                         lock_buffer(sbi->mdb_bh)
                                         test_clear_buffer_dirty(sbi->mdb_bh)

                                         submit_bh()

                                           submit_bh_wbc()

                                             bio_add_folio_nofail(
                                               bio, bh->b_folio,
                                               bh->b_size, bh_offset(bh))

                                             blk_crypto_submit_bio(bio)

                                       /*
                                        * BH_Lock stays held until I/O
                                        * completion unlocks the buffer.
                                        */

<resumed>
if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, ...) && sbi->alt_mdb) {
        hfs_inode_write_fork(..., mdb->drXTExtRec,
                             &mdb->drXTFlSize, NULL)

          memcpy(ext, ...)
                 ^ ext points inside sbi->mdb_bh->b_data

          *log_size = ...
                 ^ log_size points inside sbi->mdb_bh->b_data

        hfs_inode_write_fork(..., mdb->drCTExtRec,
                             &mdb->drCTFlSize, NULL)

          memcpy(ext, ...)
                 ^ ext points inside sbi->mdb_bh->b_data

          *log_size = ...
                 ^ log_size points inside sbi->mdb_bh->b_data
}

At this point CPU1 holds the buffer lock for I/O, but CPU0 does not try
to take that lock before modifying the same buffer contents. The HFS
mutex does not help here because CPU1 does not take it. The result could
be that the submitted write observes a partially updated primary MDB, or
that the dirty bit is cleared before the ALT_MDB_DIRTY updates are made.

This is why I think the mdb_bh lock still needs to cover both the
HFS_FLG_MDB_DIRTY field updates and the HFS_FLG_ALT_MDB_DIRTY
hfs_inode_write_fork() calls. The lock does not need to cover the whole
commit anymore; it only needs to cover direct modifications of the
primary MDB buffer and mark_buffer_dirty(mdb_bh).

>
> I don't know how to review attachments. If you would like to share some
> code for review, then, please, include the diff into the body of email.

Sure. Sorry about the attachment. The draft diffs are included inline
below. They are still for discussion, not a formal v3 submission yet.

The first diff reworks the locking. The second diff handles the
corrupted volume-bitmap layout separately by forcing read-only instead
of rejecting the mount with -EIO, and by checking before consuming dirty
bits in hfs_mdb_commit().

Thanks,
Yue

--- draft diff 1: hfs: serialize MDB commits and narrow mdb_bh locking ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..e4ed9071c498 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -124,6 +124,7 @@ struct hfs_sb_info {
 
 	int session, part;
 	struct nls_table *nls_io, *nls_disk;
+	struct mutex mdb_lock;			/* serializes MDB updates */
 	struct mutex bitmap_lock;
 	unsigned long flags;
 	u16 blockoffset;
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..991013ae0c33 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -286,60 +286,74 @@ int hfs_mdb_get(struct super_block *sb)
  */
 void hfs_mdb_commit(struct super_block *sb)
 {
-	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
+	struct hfs_sb_info *sbi = HFS_SB(sb);
+	struct hfs_mdb *mdb = sbi->mdb;
+	bool mdb_dirty, alt_dirty, bitmap_dirty;
 
 	if (sb_rdonly(sb))
 		return;
 
-	lock_buffer(HFS_SB(sb)->mdb_bh);
-	if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
+	mutex_lock(&sbi->mdb_lock);
+	if (sb_rdonly(sb))
+		goto out;
+
+	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
+	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
+		    sbi->alt_mdb;
+	bitmap_dirty = test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &sbi->flags);
+
+	if (mdb_dirty || alt_dirty)
+		lock_buffer(sbi->mdb_bh);
+	if (mdb_dirty) {
 		/* These parameters may have been modified, so write them back */
 		mdb->drLsMod = hfs_mtime();
-		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)->free_ablocks);
+		mdb->drFreeBks = cpu_to_be16(sbi->free_ablocks);
 		mdb->drNxtCNID =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->next_id));
-		mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
-		mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
+			cpu_to_be32((u32)atomic64_read(&sbi->next_id));
+		mdb->drNmFls = cpu_to_be16(sbi->root_files);
+		mdb->drNmRtDirs = cpu_to_be16(sbi->root_dirs);
 		mdb->drFilCnt =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->file_count));
+			cpu_to_be32((u32)atomic64_read(&sbi->file_count));
 		mdb->drDirCnt =
-			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)->folder_count));
-
-		/* write MDB to disk */
-		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
+			cpu_to_be32((u32)atomic64_read(&sbi->folder_count));
 	}
 
 	/* write the backup MDB, not returning until it is written.
 	 * we only do this when either the catalog or extents overflow
 	 * files grow. */
-	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)->flags) &&
-	    HFS_SB(sb)->alt_mdb) {
-		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb->drXTExtRec,
+	if (alt_dirty) {
+		hfs_inode_write_fork(sbi->ext_tree->inode, mdb->drXTExtRec,
 				     &mdb->drXTFlSize, NULL);
-		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb->drCTExtRec,
+		hfs_inode_write_fork(sbi->cat_tree->inode, mdb->drCTExtRec,
 				     &mdb->drCTFlSize, NULL);
+	}
+	if (mdb_dirty || alt_dirty) {
+		mark_buffer_dirty(sbi->mdb_bh);
+		unlock_buffer(sbi->mdb_bh);
+	}
 
-		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
-		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb, HFS_SECTOR_SIZE);
-		HFS_SB(sb)->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
-		HFS_SB(sb)->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
-		unlock_buffer(HFS_SB(sb)->alt_mdb_bh);
+	if (alt_dirty) {
+		lock_buffer(sbi->alt_mdb_bh);
+		memcpy(sbi->alt_mdb, sbi->mdb, HFS_SECTOR_SIZE);
+		sbi->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
+		sbi->alt_mdb->drAtrb &= cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
+		unlock_buffer(sbi->alt_mdb_bh);
 
-		mark_buffer_dirty(HFS_SB(sb)->alt_mdb_bh);
-		sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
+		mark_buffer_dirty(sbi->alt_mdb_bh);
+		sync_dirty_buffer(sbi->alt_mdb_bh);
 	}
 
-	if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)->flags)) {
+	if (bitmap_dirty) {
 		struct buffer_head *bh;
 		sector_t block;
 		char *ptr;
 		int off, size, len;
 
-		block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) + HFS_SB(sb)->part_start;
+		block = be16_to_cpu(sbi->mdb->drVBMSt) + sbi->part_start;
 		off = (block << HFS_SECTOR_SIZE_BITS) & (sb->s_blocksize - 1);
 		block >>= sb->s_blocksize_bits - HFS_SECTOR_SIZE_BITS;
-		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
-		ptr = (u8 *)HFS_SB(sb)->bitmap;
+		size = (sbi->fs_ablocks + 7) / 8;
+		ptr = (u8 *)sbi->bitmap;
 		while (size) {
 			bh = sb_bread(sb, block);
 			if (!bh) {
@@ -360,7 +374,9 @@ void hfs_mdb_commit(struct super_block *sb)
 			size -= len;
 		}
 	}
-	unlock_buffer(HFS_SB(sb)->mdb_bh);
+
+out:
+	mutex_unlock(&sbi->mdb_lock);
 }
 
 void hfs_mdb_close(struct super_block *sb)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index a4f2a2bfa6d3..60a93c31b3ff 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -339,6 +339,7 @@ static int hfs_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_op = &hfs_super_operations;
 	sb->s_xattr = hfs_xattr_handlers;
 	sb->s_flags |= SB_NODIRATIME;
+	mutex_init(&sbi->mdb_lock);
 	mutex_init(&sbi->bitmap_lock);
 
 	res = hfs_mdb_get(sb);

--- draft diff 2: hfs: detect volume bitmap overlap with MDB ---
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index e4ed9071c498..df6ed1298151 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -138,6 +138,7 @@ struct hfs_sb_info {
 #define HFS_FLG_BITMAP_DIRTY	0
 #define HFS_FLG_MDB_DIRTY	1
 #define HFS_FLG_ALT_MDB_DIRTY	2
+#define HFS_FLG_VBM_OVERLAP	3
 
 /* bitmap.c */
 extern u32 hfs_vbm_search_free(struct super_block *sb, u32 goal, u32 *num_bits);
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index 991013ae0c33..4b6f1df5e4a8 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -85,6 +85,24 @@ bool is_hfs_cnid_counts_valid(struct super_block *sb)
 	return !corrupted;
 }
 
+static bool hfs_vbm_overlaps_mdb(struct super_block *sb)
+{
+	struct hfs_sb_info *sbi = HFS_SB(sb);
+	sector_t mdb_start, vbm_start, vbm_end;
+	u32 vbm_bytes, vbm_sectors;
+
+	vbm_bytes = (sbi->fs_ablocks + 7) / 8;
+	if (!vbm_bytes)
+		return false;
+
+	mdb_start = sbi->part_start + HFS_MDB_BLK;
+	vbm_start = sbi->part_start + be16_to_cpu(sbi->mdb->drVBMSt);
+	vbm_sectors = DIV_ROUND_UP(vbm_bytes, HFS_SECTOR_SIZE);
+	vbm_end = vbm_start + vbm_sectors;
+
+	return vbm_start < mdb_start + 1 && mdb_start < vbm_end;
+}
+
 /*
  * hfs_mdb_get()
  *
@@ -185,6 +203,12 @@ int hfs_mdb_get(struct super_block *sb)
 		sb->s_flags |= SB_RDONLY;
 	}
 
+	if (hfs_vbm_overlaps_mdb(sb)) {
+		pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended. Mounting read-only.\n");
+		set_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags);
+		sb->s_flags |= SB_RDONLY;
+	}
+
 	/* TRY to get the alternate (backup) MDB. */
 	sect = part_start + part_size - 2;
 	bh = sb_bread512(sb, sect, mdb2);
@@ -296,6 +320,13 @@ void hfs_mdb_commit(struct super_block *sb)
 	mutex_lock(&sbi->mdb_lock);
 	if (sb_rdonly(sb))
 		goto out;
+	if (test_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags) ||
+	    hfs_vbm_overlaps_mdb(sb)) {
+		pr_err("volume bitmap overlaps MDB, forcing read-only\n");
+		set_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags);
+		sb->s_flags |= SB_RDONLY;
+		goto out;
+	}
 
 	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi->flags);
 	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi->flags) &&
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 60a93c31b3ff..81de9607ab24 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -133,6 +133,10 @@ static int hfs_reconfigure(struct fs_context *fc)
 			pr_warn("filesystem is marked locked, leaving read-only.\n");
 			sb->s_flags |= SB_RDONLY;
 			fc->sb_flags |= SB_RDONLY;
+		} else if (test_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags)) {
+			pr_warn("volume bitmap overlaps MDB, running fsck.hfs is recommended.  leaving read-only.\n");
+			sb->s_flags |= SB_RDONLY;
+			fc->sb_flags |= SB_RDONLY;
 		}
 	}
 	return 0;

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 4 days, 18 hours ago

On Wed, 2026-06-03 at 23:23 +0800, Yue Sun wrote:
> On Tue, Jun 2, 2026 at 2:46 AM <slava@dubeyko.com> wrote:
> > 
> > I don't quite follow what you mean here. If you take a look into
> > the
> > hfs_inode_write_fork():
> > 
> > void hfs_inode_write_fork(struct inode *inode, struct hfs_extent
> > *ext,
> >                           __be32 *log_size, __be32 *phys_size)
> > {
> >         memcpy(ext, HFS_I(inode)->first_extents,
> > sizeof(hfs_extent_rec));
> > 
> >         if (log_size)
> >                 *log_size = cpu_to_be32(inode->i_size);
> >         if (phys_size)
> >                 *phys_size = cpu_to_be32(HFS_I(inode)->alloc_blocks
> > *
> >                                          HFS_SB(inode->i_sb)-
> > > alloc_blksz);
> > }
> > 
> > You can see that it is simply extracting values into buffers and
> > nothing more. No mdb_bh->b_data operations are involved.
> 
> Thanks, I explained this part poorly.
> 
> I agree that hfs_inode_write_fork() itself does not know anything
> about
> mdb_bh. My concern is about the actual buffers passed by
> hfs_mdb_commit().
> 
> In hfs_mdb_get(), the primary MDB is set up by sb_bread512():
> 
>         bh = sb_bread512(sb, part_start + HFS_MDB_BLK, mdb);
>         ...
>         HFS_SB(sb)->mdb_bh = bh;
>         HFS_SB(sb)->mdb = mdb;
> 
> and sb_bread512() sets the data pointer like this:
> 
>         data = (void *)(__bh->b_data + __offset);
> 
> So after mount:
> 
>         sbi->mdb == sbi->mdb_bh->b_data + __offset
> 
> Then in the HFS_FLG_ALT_MDB_DIRTY path we pass MDB fields to
> hfs_inode_write_fork():
> 
>         hfs_inode_write_fork(sbi->ext_tree->inode, mdb->drXTExtRec,
>                              &mdb->drXTFlSize, NULL);
>         hfs_inode_write_fork(sbi->cat_tree->inode, mdb->drCTExtRec,
>                              &mdb->drCTFlSize, NULL);
> 
> In this call site, the ext and log_size arguments point into the
> primary
> MDB buffer. The memcpy() writes drXTExtRec/drCTExtRec, and the
> *log_size assignment writes drXTFlSize/drCTFlSize. So the function is
> only writing to the buffers passed to it, but these buffers are
> fields
> inside sbi->mdb_bh->b_data.
> 
> The race I am worried about is with generic buffer writeback, not
> with
> another hfs_mdb_commit() caller. The new mdb_lock serializes HFS MDB
> commits, but generic writeback does not take this filesystem-private
> mutex. It synchronizes with buffer users through the buffer_head
> lock.
> 
> One possible interleaving with the shortened mdb_bh lock is:
> 
> CPU0: hfs_mdb_commit()                 CPU1: generic buffer writeback
> --------------------------------       ------------------------------
> 
> mutex_lock(&sbi->mdb_lock)
> 
> if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, ...)) {
>         lock_buffer(sbi->mdb_bh)
>         update primary MDB fields
>         unlock_buffer(sbi->mdb_bh)
> 
>         mark_buffer_dirty(sbi->mdb_bh)
> }
> 
> /* about to handle HFS_FLG_ALT_MDB_DIRTY */
> <preempted here>
>                                        write_dirty_buffer(sbi-
> >mdb_bh)
> 
>                                          lock_buffer(sbi->mdb_bh)
>                                          test_clear_buffer_dirty(sbi-
> >mdb_bh)
> 
>                                          submit_bh()
> 
>                                            submit_bh_wbc()
> 
>                                              bio_add_folio_nofail(
>                                                bio, bh->b_folio,
>                                                bh->b_size,
> bh_offset(bh))
> 
>                                             
> blk_crypto_submit_bio(bio)
> 
>                                        /*
>                                         * BH_Lock stays held until
> I/O
>                                         * completion unlocks the
> buffer.
>                                         */
> 
> <resumed>
> if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, ...) && sbi->alt_mdb) {
>         hfs_inode_write_fork(..., mdb->drXTExtRec,
>                              &mdb->drXTFlSize, NULL)
> 
>           memcpy(ext, ...)
>                  ^ ext points inside sbi->mdb_bh->b_data
> 
>           *log_size = ...
>                  ^ log_size points inside sbi->mdb_bh->b_data
> 
>         hfs_inode_write_fork(..., mdb->drCTExtRec,
>                              &mdb->drCTFlSize, NULL)
> 
>           memcpy(ext, ...)
>                  ^ ext points inside sbi->mdb_bh->b_data
> 
>           *log_size = ...
>                  ^ log_size points inside sbi->mdb_bh->b_data
> }
> 
> At this point CPU1 holds the buffer lock for I/O, but CPU0 does not
> try
> to take that lock before modifying the same buffer contents. The HFS
> mutex does not help here because CPU1 does not take it. The result
> could
> be that the submitted write observes a partially updated primary MDB,
> or
> that the dirty bit is cleared before the ALT_MDB_DIRTY updates are
> made.
> 
> This is why I think the mdb_bh lock still needs to cover both the
> HFS_FLG_MDB_DIRTY field updates and the HFS_FLG_ALT_MDB_DIRTY
> hfs_inode_write_fork() calls. The lock does not need to cover the
> whole
> commit anymore; it only needs to cover direct modifications of the
> primary MDB buffer and mark_buffer_dirty(mdb_bh).
> 
> > 
> > I don't know how to review attachments. If you would like to share
> > some
> > code for review, then, please, include the diff into the body of
> > email.
> 
> Sure. Sorry about the attachment. The draft diffs are included inline
> below. They are still for discussion, not a formal v3 submission yet.
> 
> The first diff reworks the locking. The second diff handles the
> corrupted volume-bitmap layout separately by forcing read-only
> instead
> of rejecting the mount with -EIO, and by checking before consuming
> dirty
> bits in hfs_mdb_commit().
> 
> Thanks,
> Yue
> 
> --- draft diff 1: hfs: serialize MDB commits and narrow mdb_bh
> locking ---
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index ac0e83f77a0f..e4ed9071c498 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -124,6 +124,7 @@ struct hfs_sb_info {
>  
>  	int session, part;
>  	struct nls_table *nls_io, *nls_disk;
> +	struct mutex mdb_lock;			/* serializes MDB
> updates */


Could you please take a deeper look into my diff that I've shared with
you before? I don't see the point to review your suggestion if you
completely ignored my diff. From my point of view you suggested
completely the same approach but in more complicated manner.

Thanks,
Slava.

>  	struct mutex bitmap_lock;
>  	unsigned long flags;
>  	u16 blockoffset;
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..991013ae0c33 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -286,60 +286,74 @@ int hfs_mdb_get(struct super_block *sb)
>   */
>  void hfs_mdb_commit(struct super_block *sb)
>  {
> -	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> +	struct hfs_sb_info *sbi = HFS_SB(sb);
> +	struct hfs_mdb *mdb = sbi->mdb;
> +	bool mdb_dirty, alt_dirty, bitmap_dirty;
>  
>  	if (sb_rdonly(sb))
>  		return;
>  
> -	lock_buffer(HFS_SB(sb)->mdb_bh);
> -	if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)-
> >flags)) {
> +	mutex_lock(&sbi->mdb_lock);
> +	if (sb_rdonly(sb))
> +		goto out;
> +
> +	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi-
> >flags);
> +	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi-
> >flags) &&
> +		    sbi->alt_mdb;
> +	bitmap_dirty = test_and_clear_bit(HFS_FLG_BITMAP_DIRTY,
> &sbi->flags);
> +
> +	if (mdb_dirty || alt_dirty)
> +		lock_buffer(sbi->mdb_bh);
> +	if (mdb_dirty) {
>  		/* These parameters may have been modified, so write
> them back */
>  		mdb->drLsMod = hfs_mtime();
> -		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> >free_ablocks);
> +		mdb->drFreeBks = cpu_to_be16(sbi->free_ablocks);
>  		mdb->drNxtCNID =
> -			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >next_id));
> -		mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
> -		mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)-
> >root_dirs);
> +			cpu_to_be32((u32)atomic64_read(&sbi-
> >next_id));
> +		mdb->drNmFls = cpu_to_be16(sbi->root_files);
> +		mdb->drNmRtDirs = cpu_to_be16(sbi->root_dirs);
>  		mdb->drFilCnt =
> -			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >file_count));
> +			cpu_to_be32((u32)atomic64_read(&sbi-
> >file_count));
>  		mdb->drDirCnt =
> -			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >folder_count));
> -
> -		/* write MDB to disk */
> -		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> +			cpu_to_be32((u32)atomic64_read(&sbi-
> >folder_count));
>  	}
>  
>  	/* write the backup MDB, not returning until it is written.
>  	 * we only do this when either the catalog or extents
> overflow
>  	 * files grow. */
> -	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> >flags) &&
> -	    HFS_SB(sb)->alt_mdb) {
> -		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb->drXTExtRec,
> +	if (alt_dirty) {
> +		hfs_inode_write_fork(sbi->ext_tree->inode, mdb-
> >drXTExtRec,
>  				     &mdb->drXTFlSize, NULL);
> -		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb->drCTExtRec,
> +		hfs_inode_write_fork(sbi->cat_tree->inode, mdb-
> >drCTExtRec,
>  				     &mdb->drCTFlSize, NULL);
> +	}
> +	if (mdb_dirty || alt_dirty) {
> +		mark_buffer_dirty(sbi->mdb_bh);
> +		unlock_buffer(sbi->mdb_bh);
> +	}
>  
> -		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> -		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> HFS_SECTOR_SIZE);
> -		HFS_SB(sb)->alt_mdb->drAtrb |=
> cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> -		HFS_SB(sb)->alt_mdb->drAtrb &=
> cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
> -		unlock_buffer(HFS_SB(sb)->alt_mdb_bh);
> +	if (alt_dirty) {
> +		lock_buffer(sbi->alt_mdb_bh);
> +		memcpy(sbi->alt_mdb, sbi->mdb, HFS_SECTOR_SIZE);
> +		sbi->alt_mdb->drAtrb |=
> cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> +		sbi->alt_mdb->drAtrb &=
> cpu_to_be16(~HFS_SB_ATTRIB_INCNSTNT);
> +		unlock_buffer(sbi->alt_mdb_bh);
>  
> -		mark_buffer_dirty(HFS_SB(sb)->alt_mdb_bh);
> -		sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
> +		mark_buffer_dirty(sbi->alt_mdb_bh);
> +		sync_dirty_buffer(sbi->alt_mdb_bh);
>  	}
>  
> -	if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)-
> >flags)) {
> +	if (bitmap_dirty) {
>  		struct buffer_head *bh;
>  		sector_t block;
>  		char *ptr;
>  		int off, size, len;
>  
> -		block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) +
> HFS_SB(sb)->part_start;
> +		block = be16_to_cpu(sbi->mdb->drVBMSt) + sbi-
> >part_start;
>  		off = (block << HFS_SECTOR_SIZE_BITS) & (sb-
> >s_blocksize - 1);
>  		block >>= sb->s_blocksize_bits -
> HFS_SECTOR_SIZE_BITS;
> -		size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> -		ptr = (u8 *)HFS_SB(sb)->bitmap;
> +		size = (sbi->fs_ablocks + 7) / 8;
> +		ptr = (u8 *)sbi->bitmap;
>  		while (size) {
>  			bh = sb_bread(sb, block);
>  			if (!bh) {
> @@ -360,7 +374,9 @@ void hfs_mdb_commit(struct super_block *sb)
>  			size -= len;
>  		}
>  	}
> -	unlock_buffer(HFS_SB(sb)->mdb_bh);
> +
> +out:
> +	mutex_unlock(&sbi->mdb_lock);
>  }
>  
>  void hfs_mdb_close(struct super_block *sb)
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index a4f2a2bfa6d3..60a93c31b3ff 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -339,6 +339,7 @@ static int hfs_fill_super(struct super_block *sb,
> struct fs_context *fc)
>  	sb->s_op = &hfs_super_operations;
>  	sb->s_xattr = hfs_xattr_handlers;
>  	sb->s_flags |= SB_NODIRATIME;
> +	mutex_init(&sbi->mdb_lock);
>  	mutex_init(&sbi->bitmap_lock);
>  
>  	res = hfs_mdb_get(sb);
> 
> --- draft diff 2: hfs: detect volume bitmap overlap with MDB ---
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index e4ed9071c498..df6ed1298151 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -138,6 +138,7 @@ struct hfs_sb_info {
>  #define HFS_FLG_BITMAP_DIRTY	0
>  #define HFS_FLG_MDB_DIRTY	1
>  #define HFS_FLG_ALT_MDB_DIRTY	2
> +#define HFS_FLG_VBM_OVERLAP	3
>  
>  /* bitmap.c */
>  extern u32 hfs_vbm_search_free(struct super_block *sb, u32 goal, u32
> *num_bits);
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index 991013ae0c33..4b6f1df5e4a8 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -85,6 +85,24 @@ bool is_hfs_cnid_counts_valid(struct super_block
> *sb)
>  	return !corrupted;
>  }
>  
> +static bool hfs_vbm_overlaps_mdb(struct super_block *sb)
> +{
> +	struct hfs_sb_info *sbi = HFS_SB(sb);
> +	sector_t mdb_start, vbm_start, vbm_end;
> +	u32 vbm_bytes, vbm_sectors;
> +
> +	vbm_bytes = (sbi->fs_ablocks + 7) / 8;
> +	if (!vbm_bytes)
> +		return false;
> +
> +	mdb_start = sbi->part_start + HFS_MDB_BLK;
> +	vbm_start = sbi->part_start + be16_to_cpu(sbi->mdb-
> >drVBMSt);
> +	vbm_sectors = DIV_ROUND_UP(vbm_bytes, HFS_SECTOR_SIZE);
> +	vbm_end = vbm_start + vbm_sectors;
> +
> +	return vbm_start < mdb_start + 1 && mdb_start < vbm_end;
> +}
> +
>  /*
>   * hfs_mdb_get()
>   *
> @@ -185,6 +203,12 @@ int hfs_mdb_get(struct super_block *sb)
>  		sb->s_flags |= SB_RDONLY;
>  	}
>  
> +	if (hfs_vbm_overlaps_mdb(sb)) {
> +		pr_warn("volume bitmap overlaps MDB, running
> fsck.hfs is recommended. Mounting read-only.\n");
> +		set_bit(HFS_FLG_VBM_OVERLAP, &HFS_SB(sb)->flags);
> +		sb->s_flags |= SB_RDONLY;
> +	}
> +
>  	/* TRY to get the alternate (backup) MDB. */
>  	sect = part_start + part_size - 2;
>  	bh = sb_bread512(sb, sect, mdb2);
> @@ -296,6 +320,13 @@ void hfs_mdb_commit(struct super_block *sb)
>  	mutex_lock(&sbi->mdb_lock);
>  	if (sb_rdonly(sb))
>  		goto out;
> +	if (test_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags) ||
> +	    hfs_vbm_overlaps_mdb(sb)) {
> +		pr_err("volume bitmap overlaps MDB, forcing read-
> only\n");
> +		set_bit(HFS_FLG_VBM_OVERLAP, &sbi->flags);
> +		sb->s_flags |= SB_RDONLY;
> +		goto out;
> +	}
>  
>  	mdb_dirty = test_and_clear_bit(HFS_FLG_MDB_DIRTY, &sbi-
> >flags);
>  	alt_dirty = test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &sbi-
> >flags) &&
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index 60a93c31b3ff..81de9607ab24 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -133,6 +133,10 @@ static int hfs_reconfigure(struct fs_context
> *fc)
>  			pr_warn("filesystem is marked locked,
> leaving read-only.\n");
>  			sb->s_flags |= SB_RDONLY;
>  			fc->sb_flags |= SB_RDONLY;
> +		} else if (test_bit(HFS_FLG_VBM_OVERLAP,
> &HFS_SB(sb)->flags)) {
> +			pr_warn("volume bitmap overlaps MDB, running
> fsck.hfs is recommended.  leaving read-only.\n");
> +			sb->s_flags |= SB_RDONLY;
> +			fc->sb_flags |= SB_RDONLY;
>  		}
>  	}
>  	return 0;

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Yue Sun 3 days, 16 hours ago

On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
>
> Could you please take a deeper look into my diff that I've shared with
> you before? I don't see the point to review your suggestion if you
> completely ignored my diff. From my point of view you suggested
> completely the same approach but in more complicated manner.

You are right. I am sorry for the confusion.

I should have looked more carefully at your diff and based my reply on
it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh across the
whole hfs_mdb_commit() path is the right direction. My previous reply
mixed that with extra cleanups and a separate corrupted-layout check,
which made it look like I was proposing a different approach. That was
my mistake.

I am not very familiar with HFS internals, so please treat the following
only as a small suggestion. After re-reading your diff, the only detail
I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that path,
hfs_inode_write_fork() gets MDB fields as output buffers:

        hfs_inode_write_fork(..., mdb->drXTExtRec,
                             &mdb->drXTFlSize, NULL);
        hfs_inode_write_fork(..., mdb->drCTExtRec,
                             &mdb->drCTFlSize, NULL);

Since mdb points into mdb_bh->b_data, these calls seem to update fields
in the primary MDB buffer. So I was thinking that we could keep your
locking scheme, but move unlock_buffer(mdb_bh) slightly later, after
these primary MDB updates and mark_buffer_dirty(mdb_bh). The alternate
MDB copy/sync and the bitmap writeback would still happen after mdb_bh
is unlocked, so the main deadlock fix remains the same.

If this concern is not valid, or if the current HFS update rules already
make your shorter window sufficient, please ignore this. I defer to your
judgment here.

Below is your diff with only that small adjustment.

Thanks,
Yue

diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..00651f42ba38 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -66,6 +66,7 @@ struct hfs_inode_info {
  * The HFS-specific part of a Linux (struct super_block)
  */
 struct hfs_sb_info {
+	struct mutex mdb_lock;			/* MDB operations lock */
 	struct buffer_head *mdb_bh;		/* The hfs_buffer
 						   holding the real
 						   superblock (aka VIB
diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
index a97cea35ca2e..d81e3545a045 100644
--- a/fs/hfs/mdb.c
+++ b/fs/hfs/mdb.c
@@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
 void hfs_mdb_commit(struct super_block *sb)
 {
 	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
+	bool write_backup = false;
 
 	if (sb_rdonly(sb))
 		return;
@@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
 	 * files grow. */
 	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)->flags) &&
 	    HFS_SB(sb)->alt_mdb) {
+		write_backup = true;
 		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb->drXTExtRec,
 				     &mdb->drXTFlSize, NULL);
 		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb->drCTExtRec,
 				     &mdb->drCTFlSize, NULL);
 
+		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
+	}
+	unlock_buffer(HFS_SB(sb)->mdb_bh);
+
+	if (write_backup) {
 		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
 		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb, HFS_SECTOR_SIZE);
 		HFS_SB(sb)->alt_mdb->drAtrb |= cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
@@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
 			size -= len;
 		}
 	}
-	unlock_buffer(HFS_SB(sb)->mdb_bh);
 }
 
 void hfs_mdb_close(struct super_block *sb)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index a4f2a2bfa6d3..06ab904b3a42 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
 static int hfs_sync_fs(struct super_block *sb, int wait)
 {
 	is_hfs_cnid_counts_valid(sb);
+
+	mutex_lock(&HFS_SB(sb)->mdb_lock);
 	hfs_mdb_commit(sb);
+	mutex_unlock(&HFS_SB(sb)->mdb_lock);
+
 	return 0;
 }
 
@@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
 
 	is_hfs_cnid_counts_valid(sb);
 
+	mutex_lock(&sbi->mdb_lock);
 	hfs_mdb_commit(sb);
+	mutex_unlock(&sbi->mdb_lock);
 }
 
 void hfs_mark_mdb_dirty(struct super_block *sb)
@@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_op = &hfs_super_operations;
 	sb->s_xattr = hfs_xattr_handlers;
 	sb->s_flags |= SB_NODIRATIME;
+	mutex_init(&sbi->mdb_lock);
 	mutex_init(&sbi->bitmap_lock);
 
+	mutex_lock(&sbi->mdb_lock);
 	res = hfs_mdb_get(sb);
+	mutex_unlock(&sbi->mdb_lock);
 	if (res) {
 		if (!silent)
 			pr_warn("can't find a HFS filesystem on dev %s\n",

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 3 days, 13 hours ago

On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
> > 
> > Could you please take a deeper look into my diff that I've shared
> > with
> > you before? I don't see the point to review your suggestion if you
> > completely ignored my diff. From my point of view you suggested
> > completely the same approach but in more complicated manner.
> 
> You are right. I am sorry for the confusion.
> 
> I should have looked more carefully at your diff and based my reply
> on
> it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh across
> the
> whole hfs_mdb_commit() path is the right direction. My previous reply
> mixed that with extra cleanups and a separate corrupted-layout check,
> which made it look like I was proposing a different approach. That
> was
> my mistake.
> 
> I am not very familiar with HFS internals, so please treat the
> following
> only as a small suggestion. After re-reading your diff, the only
> detail
> I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that
> path,
> hfs_inode_write_fork() gets MDB fields as output buffers:
> 
>         hfs_inode_write_fork(..., mdb->drXTExtRec,
>                              &mdb->drXTFlSize, NULL);
>         hfs_inode_write_fork(..., mdb->drCTExtRec,
>                              &mdb->drCTFlSize, NULL);


I've started to think that, maybe, the location of these
hfs_inode_write_fork() calls is not correct. We are trying to save the
current size of Catalog File and Extents Overflow File. OK, we need to
update this if size is changed. But we can save the sizes during every
call of hfs_mdb_commit(). It's logically incorrect to call these
methods in the alternative MDB section because it is the content of
primary MDB. I suggest to move these two hfs_inode_write_fork() to
primary MDB writing section:

lock_buffer(HFS_SB(sb)->mdb_bh);
		/* These parameters may have been modified, so write
them back */
		mdb->drLsMod = hfs_mtime();
		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
>free_ablocks);
		mdb->drNxtCNID =
			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
>next_id));
		mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
		mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
		mdb->drFilCnt =
			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
>file_count));
		mdb->drDirCnt =
			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
>folder_count));

		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb-
>drXTExtRec,
				     &mdb->drXTFlSize, NULL);
		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb-
>drCTExtRec,
				     &mdb->drCTFlSize, NULL);

		/* write MDB to disk */
		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
unlock_buffer(HFS_SB(sb)->mdb_bh);

In this case, we don;'t need to overlap the locks. What do you think?
Sounds like reasonable solution?

Thanks,
Slava.


> Since mdb points into mdb_bh->b_data, these calls seem to update
> fields
> in the primary MDB buffer. So I was thinking that we could keep your
> locking scheme, but move unlock_buffer(mdb_bh) slightly later, after
> these primary MDB updates and mark_buffer_dirty(mdb_bh). The
> alternate
> MDB copy/sync and the bitmap writeback would still happen after
> mdb_bh
> is unlocked, so the main deadlock fix remains the same.
> 
> If this concern is not valid, or if the current HFS update rules
> already
> make your shorter window sufficient, please ignore this. I defer to
> your
> judgment here.
> 
> Below is your diff with only that small adjustment.
> 
> Thanks,
> Yue
> 
> diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> index ac0e83f77a0f..00651f42ba38 100644
> --- a/fs/hfs/hfs_fs.h
> +++ b/fs/hfs/hfs_fs.h
> @@ -66,6 +66,7 @@ struct hfs_inode_info {
>   * The HFS-specific part of a Linux (struct super_block)
>   */
>  struct hfs_sb_info {
> +	struct mutex mdb_lock;			/* MDB operations
> lock */
>  	struct buffer_head *mdb_bh;		/* The hfs_buffer
>  						   holding the real
>  						   superblock (aka
> VIB
> diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> index a97cea35ca2e..d81e3545a045 100644
> --- a/fs/hfs/mdb.c
> +++ b/fs/hfs/mdb.c
> @@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
>  void hfs_mdb_commit(struct super_block *sb)
>  {
>  	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> +	bool write_backup = false;
>  
>  	if (sb_rdonly(sb))
>  		return;
> @@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
>  	 * files grow. */
>  	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> >flags) &&
>  	    HFS_SB(sb)->alt_mdb) {
> +		write_backup = true;
>  		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb->drXTExtRec,
>  				     &mdb->drXTFlSize, NULL);
>  		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb->drCTExtRec,
>  				     &mdb->drCTFlSize, NULL);
>  
> +		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> +	}
> +	unlock_buffer(HFS_SB(sb)->mdb_bh);
> +
> +	if (write_backup) {
>  		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
>  		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> HFS_SECTOR_SIZE);
>  		HFS_SB(sb)->alt_mdb->drAtrb |=
> cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> @@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
>  			size -= len;
>  		}
>  	}
> -	unlock_buffer(HFS_SB(sb)->mdb_bh);
>  }
>  
>  void hfs_mdb_close(struct super_block *sb)
> diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> index a4f2a2bfa6d3..06ab904b3a42 100644
> --- a/fs/hfs/super.c
> +++ b/fs/hfs/super.c
> @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
>  static int hfs_sync_fs(struct super_block *sb, int wait)
>  {
>  	is_hfs_cnid_counts_valid(sb);
> +
> +	mutex_lock(&HFS_SB(sb)->mdb_lock);
>  	hfs_mdb_commit(sb);
> +	mutex_unlock(&HFS_SB(sb)->mdb_lock);
> +
>  	return 0;
>  }
>  
> @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
>  
>  	is_hfs_cnid_counts_valid(sb);
>  
> +	mutex_lock(&sbi->mdb_lock);
>  	hfs_mdb_commit(sb);
> +	mutex_unlock(&sbi->mdb_lock);
>  }
>  
>  void hfs_mark_mdb_dirty(struct super_block *sb)
> @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> *sb, struct fs_context *fc)
>  	sb->s_op = &hfs_super_operations;
>  	sb->s_xattr = hfs_xattr_handlers;
>  	sb->s_flags |= SB_NODIRATIME;
> +	mutex_init(&sbi->mdb_lock);
>  	mutex_init(&sbi->bitmap_lock);
>  
> +	mutex_lock(&sbi->mdb_lock);
>  	res = hfs_mdb_get(sb);
> +	mutex_unlock(&sbi->mdb_lock);
>  	if (res) {
>  		if (!silent)
>  			pr_warn("can't find a HFS filesystem on dev
> %s\n",

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Sam Sun 3 days, 7 hours ago

On Fri, Jun 5, 2026 at 5:25 AM Viacheslav Dubeyko <slava@dubeyko.com> wrote:
>
> On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> > On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
> > >
> > > Could you please take a deeper look into my diff that I've shared
> > > with
> > > you before? I don't see the point to review your suggestion if you
> > > completely ignored my diff. From my point of view you suggested
> > > completely the same approach but in more complicated manner.
> >
> > You are right. I am sorry for the confusion.
> >
> > I should have looked more carefully at your diff and based my reply
> > on
> > it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh across
> > the
> > whole hfs_mdb_commit() path is the right direction. My previous reply
> > mixed that with extra cleanups and a separate corrupted-layout check,
> > which made it look like I was proposing a different approach. That
> > was
> > my mistake.
> >
> > I am not very familiar with HFS internals, so please treat the
> > following
> > only as a small suggestion. After re-reading your diff, the only
> > detail
> > I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that
> > path,
> > hfs_inode_write_fork() gets MDB fields as output buffers:
> >
> >         hfs_inode_write_fork(..., mdb->drXTExtRec,
> >                              &mdb->drXTFlSize, NULL);
> >         hfs_inode_write_fork(..., mdb->drCTExtRec,
> >                              &mdb->drCTFlSize, NULL);
>
>
> I've started to think that, maybe, the location of these
> hfs_inode_write_fork() calls is not correct. We are trying to save the
> current size of Catalog File and Extents Overflow File. OK, we need to
> update this if size is changed. But we can save the sizes during every
> call of hfs_mdb_commit(). It's logically incorrect to call these
> methods in the alternative MDB section because it is the content of
> primary MDB. I suggest to move these two hfs_inode_write_fork() to
> primary MDB writing section:
>
> lock_buffer(HFS_SB(sb)->mdb_bh);
>                 /* These parameters may have been modified, so write
> them back */
>                 mdb->drLsMod = hfs_mtime();
>                 mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> >free_ablocks);
>                 mdb->drNxtCNID =
>                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >next_id));
>                 mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
>                 mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)->root_dirs);
>                 mdb->drFilCnt =
>                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >file_count));
>                 mdb->drDirCnt =
>                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> >folder_count));
>
>                 hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode, mdb-
> >drXTExtRec,
>                                      &mdb->drXTFlSize, NULL);
>                 hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode, mdb-
> >drCTExtRec,
>                                      &mdb->drCTFlSize, NULL);
>
>                 /* write MDB to disk */
>                 mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> unlock_buffer(HFS_SB(sb)->mdb_bh);
>
> In this case, we don;'t need to overlap the locks. What do you think?
> Sounds like reasonable solution?
>
> Thanks,
> Slava.
>

Yes, this makes sense to me. Moving the two hfs_inode_write_fork()
calls into the primary MDB update section looks like a cleaner
solution. Then all direct updates to the primary MDB buffer are done
while mdb_bh is locked, and the alternative MDB path can just copy the
already prepared primary MDB contents after that. I agree that your
suggested direction is better.

Thanks,
Yue

>
> > Since mdb points into mdb_bh->b_data, these calls seem to update
> > fields
> > in the primary MDB buffer. So I was thinking that we could keep your
> > locking scheme, but move unlock_buffer(mdb_bh) slightly later, after
> > these primary MDB updates and mark_buffer_dirty(mdb_bh). The
> > alternate
> > MDB copy/sync and the bitmap writeback would still happen after
> > mdb_bh
> > is unlocked, so the main deadlock fix remains the same.
> >
> > If this concern is not valid, or if the current HFS update rules
> > already
> > make your shorter window sufficient, please ignore this. I defer to
> > your
> > judgment here.
> >
> > Below is your diff with only that small adjustment.
> >
> > Thanks,
> > Yue
> >
> > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > index ac0e83f77a0f..00651f42ba38 100644
> > --- a/fs/hfs/hfs_fs.h
> > +++ b/fs/hfs/hfs_fs.h
> > @@ -66,6 +66,7 @@ struct hfs_inode_info {
> >   * The HFS-specific part of a Linux (struct super_block)
> >   */
> >  struct hfs_sb_info {
> > +     struct mutex mdb_lock;                  /* MDB operations
> > lock */
> >       struct buffer_head *mdb_bh;             /* The hfs_buffer
> >                                                  holding the real
> >                                                  superblock (aka
> > VIB
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..d81e3545a045 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
> >  void hfs_mdb_commit(struct super_block *sb)
> >  {
> >       struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> > +     bool write_backup = false;
> >
> >       if (sb_rdonly(sb))
> >               return;
> > @@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
> >        * files grow. */
> >       if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> > >flags) &&
> >           HFS_SB(sb)->alt_mdb) {
> > +             write_backup = true;
> >               hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > mdb->drXTExtRec,
> >                                    &mdb->drXTFlSize, NULL);
> >               hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > mdb->drCTExtRec,
> >                                    &mdb->drCTFlSize, NULL);
> >
> > +             mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > +     }
> > +     unlock_buffer(HFS_SB(sb)->mdb_bh);
> > +
> > +     if (write_backup) {
> >               lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> >               memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> > HFS_SECTOR_SIZE);
> >               HFS_SB(sb)->alt_mdb->drAtrb |=
> > cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> > @@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >                       size -= len;
> >               }
> >       }
> > -     unlock_buffer(HFS_SB(sb)->mdb_bh);
> >  }
> >
> >  void hfs_mdb_close(struct super_block *sb)
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index a4f2a2bfa6d3..06ab904b3a42 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> >  static int hfs_sync_fs(struct super_block *sb, int wait)
> >  {
> >       is_hfs_cnid_counts_valid(sb);
> > +
> > +     mutex_lock(&HFS_SB(sb)->mdb_lock);
> >       hfs_mdb_commit(sb);
> > +     mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > +
> >       return 0;
> >  }
> >
> > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> >
> >       is_hfs_cnid_counts_valid(sb);
> >
> > +     mutex_lock(&sbi->mdb_lock);
> >       hfs_mdb_commit(sb);
> > +     mutex_unlock(&sbi->mdb_lock);
> >  }
> >
> >  void hfs_mark_mdb_dirty(struct super_block *sb)
> > @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> > *sb, struct fs_context *fc)
> >       sb->s_op = &hfs_super_operations;
> >       sb->s_xattr = hfs_xattr_handlers;
> >       sb->s_flags |= SB_NODIRATIME;
> > +     mutex_init(&sbi->mdb_lock);
> >       mutex_init(&sbi->bitmap_lock);
> >
> > +     mutex_lock(&sbi->mdb_lock);
> >       res = hfs_mdb_get(sb);
> > +     mutex_unlock(&sbi->mdb_lock);
> >       if (res) {
> >               if (!silent)
> >                       pr_warn("can't find a HFS filesystem on dev
> > %s\n",

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 3 days, 11 hours ago

On Thu, 2026-06-04 at 14:25 -0700, Viacheslav Dubeyko wrote:
> On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> > On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
> > > 
> > > Could you please take a deeper look into my diff that I've shared
> > > with
> > > you before? I don't see the point to review your suggestion if
> > > you
> > > completely ignored my diff. From my point of view you suggested
> > > completely the same approach but in more complicated manner.
> > 
> > You are right. I am sorry for the confusion.
> > 
> > I should have looked more carefully at your diff and based my reply
> > on
> > it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh
> > across
> > the
> > whole hfs_mdb_commit() path is the right direction. My previous
> > reply
> > mixed that with extra cleanups and a separate corrupted-layout
> > check,
> > which made it look like I was proposing a different approach. That
> > was
> > my mistake.
> > 
> > I am not very familiar with HFS internals, so please treat the
> > following
> > only as a small suggestion. After re-reading your diff, the only
> > detail
> > I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that
> > path,
> > hfs_inode_write_fork() gets MDB fields as output buffers:
> > 
> >         hfs_inode_write_fork(..., mdb->drXTExtRec,
> >                              &mdb->drXTFlSize, NULL);
> >         hfs_inode_write_fork(..., mdb->drCTExtRec,
> >                              &mdb->drCTFlSize, NULL);
> 
> 
> I've started to think that, maybe, the location of these
> hfs_inode_write_fork() calls is not correct. We are trying to save
> the
> current size of Catalog File and Extents Overflow File. OK, we need
> to
> update this if size is changed. But we can save the sizes during
> every
> call of hfs_mdb_commit(). It's logically incorrect to call these
> methods in the alternative MDB section because it is the content of
> primary MDB. I suggest to move these two hfs_inode_write_fork() to
> primary MDB writing section:
> 
> lock_buffer(HFS_SB(sb)->mdb_bh);
> 		/* These parameters may have been modified, so write
> them back */
> 		mdb->drLsMod = hfs_mtime();
> 		mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > free_ablocks);
> 		mdb->drNxtCNID =
> 			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > next_id));
> 		mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
> 		mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)-
> >root_dirs);
> 		mdb->drFilCnt =
> 			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > file_count));
> 		mdb->drDirCnt =
> 			cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > folder_count));
> 
> 		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> mdb-
> > drXTExtRec,
> 				     &mdb->drXTFlSize, NULL);
> 		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> mdb-
> > drCTExtRec,
> 				     &mdb->drCTFlSize, NULL);
> 
> 		/* write MDB to disk */
> 		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> unlock_buffer(HFS_SB(sb)->mdb_bh);
> 
> In this case, we don;'t need to overlap the locks. What do you think?
> Sounds like reasonable solution?
> 
> 

Frankly speaking, I don't like this approach of assigning the pointer
on bh->b_data to HFS_SB(sb)->mdb: 

data = (void *)(__bh->b_data + __offset);

We manipulate by various fields of HFS_SB(sb)->mdb throughout of the
hfs_mdb_commit(). I think that we need to allocate buffer for struct
hfs_mdb *mdb during hfs_mdb_get() (and destroy in hfs_mdb_put()) and
synchronize its content with struct buffer_head *mdb_bh. Otherwise, we
never can resolve the problem of proper locking in hfs_mdb_commit().
What do you think?

Thanks,
Slava.

> 
> > Since mdb points into mdb_bh->b_data, these calls seem to update
> > fields
> > in the primary MDB buffer. So I was thinking that we could keep
> > your
> > locking scheme, but move unlock_buffer(mdb_bh) slightly later,
> > after
> > these primary MDB updates and mark_buffer_dirty(mdb_bh). The
> > alternate
> > MDB copy/sync and the bitmap writeback would still happen after
> > mdb_bh
> > is unlocked, so the main deadlock fix remains the same.
> > 
> > If this concern is not valid, or if the current HFS update rules
> > already
> > make your shorter window sufficient, please ignore this. I defer to
> > your
> > judgment here.
> > 
> > Below is your diff with only that small adjustment.
> > 
> > Thanks,
> > Yue
> > 
> > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > index ac0e83f77a0f..00651f42ba38 100644
> > --- a/fs/hfs/hfs_fs.h
> > +++ b/fs/hfs/hfs_fs.h
> > @@ -66,6 +66,7 @@ struct hfs_inode_info {
> >   * The HFS-specific part of a Linux (struct super_block)
> >   */
> >  struct hfs_sb_info {
> > +	struct mutex mdb_lock;			/* MDB operations
> > lock */
> >  	struct buffer_head *mdb_bh;		/* The hfs_buffer
> >  						   holding the
> > real
> >  						   superblock (aka
> > VIB
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..d81e3545a045 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
> >  void hfs_mdb_commit(struct super_block *sb)
> >  {
> >  	struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> > +	bool write_backup = false;
> >  
> >  	if (sb_rdonly(sb))
> >  		return;
> > @@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
> >  	 * files grow. */
> >  	if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> > > flags) &&
> >  	    HFS_SB(sb)->alt_mdb) {
> > +		write_backup = true;
> >  		hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > mdb->drXTExtRec,
> >  				     &mdb->drXTFlSize, NULL);
> >  		hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > mdb->drCTExtRec,
> >  				     &mdb->drCTFlSize, NULL);
> >  
> > +		mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > +	}
> > +	unlock_buffer(HFS_SB(sb)->mdb_bh);
> > +
> > +	if (write_backup) {
> >  		lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> >  		memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> > HFS_SECTOR_SIZE);
> >  		HFS_SB(sb)->alt_mdb->drAtrb |=
> > cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> > @@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >  			size -= len;
> >  		}
> >  	}
> > -	unlock_buffer(HFS_SB(sb)->mdb_bh);
> >  }
> >  
> >  void hfs_mdb_close(struct super_block *sb)
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index a4f2a2bfa6d3..06ab904b3a42 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> >  static int hfs_sync_fs(struct super_block *sb, int wait)
> >  {
> >  	is_hfs_cnid_counts_valid(sb);
> > +
> > +	mutex_lock(&HFS_SB(sb)->mdb_lock);
> >  	hfs_mdb_commit(sb);
> > +	mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> >  
> >  	is_hfs_cnid_counts_valid(sb);
> >  
> > +	mutex_lock(&sbi->mdb_lock);
> >  	hfs_mdb_commit(sb);
> > +	mutex_unlock(&sbi->mdb_lock);
> >  }
> >  
> >  void hfs_mark_mdb_dirty(struct super_block *sb)
> > @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> > *sb, struct fs_context *fc)
> >  	sb->s_op = &hfs_super_operations;
> >  	sb->s_xattr = hfs_xattr_handlers;
> >  	sb->s_flags |= SB_NODIRATIME;
> > +	mutex_init(&sbi->mdb_lock);
> >  	mutex_init(&sbi->bitmap_lock);
> >  
> > +	mutex_lock(&sbi->mdb_lock);
> >  	res = hfs_mdb_get(sb);
> > +	mutex_unlock(&sbi->mdb_lock);
> >  	if (res) {
> >  		if (!silent)
> >  			pr_warn("can't find a HFS filesystem on
> > dev
> > %s\n",

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Sam Sun 3 days, 7 hours ago

On Fri, Jun 5, 2026 at 7:52 AM Viacheslav Dubeyko <slava@dubeyko.com> wrote:
>
> On Thu, 2026-06-04 at 14:25 -0700, Viacheslav Dubeyko wrote:
> > On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> > > On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
> > > >
> > > > Could you please take a deeper look into my diff that I've shared
> > > > with
> > > > you before? I don't see the point to review your suggestion if
> > > > you
> > > > completely ignored my diff. From my point of view you suggested
> > > > completely the same approach but in more complicated manner.
> > >
> > > You are right. I am sorry for the confusion.
> > >
> > > I should have looked more carefully at your diff and based my reply
> > > on
> > > it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh
> > > across
> > > the
> > > whole hfs_mdb_commit() path is the right direction. My previous
> > > reply
> > > mixed that with extra cleanups and a separate corrupted-layout
> > > check,
> > > which made it look like I was proposing a different approach. That
> > > was
> > > my mistake.
> > >
> > > I am not very familiar with HFS internals, so please treat the
> > > following
> > > only as a small suggestion. After re-reading your diff, the only
> > > detail
> > > I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In that
> > > path,
> > > hfs_inode_write_fork() gets MDB fields as output buffers:
> > >
> > >         hfs_inode_write_fork(..., mdb->drXTExtRec,
> > >                              &mdb->drXTFlSize, NULL);
> > >         hfs_inode_write_fork(..., mdb->drCTExtRec,
> > >                              &mdb->drCTFlSize, NULL);
> >
> >
> > I've started to think that, maybe, the location of these
> > hfs_inode_write_fork() calls is not correct. We are trying to save
> > the
> > current size of Catalog File and Extents Overflow File. OK, we need
> > to
> > update this if size is changed. But we can save the sizes during
> > every
> > call of hfs_mdb_commit(). It's logically incorrect to call these
> > methods in the alternative MDB section because it is the content of
> > primary MDB. I suggest to move these two hfs_inode_write_fork() to
> > primary MDB writing section:
> >
> > lock_buffer(HFS_SB(sb)->mdb_bh);
> >               /* These parameters may have been modified, so write
> > them back */
> >               mdb->drLsMod = hfs_mtime();
> >               mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > > free_ablocks);
> >               mdb->drNxtCNID =
> >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > next_id));
> >               mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
> >               mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)-
> > >root_dirs);
> >               mdb->drFilCnt =
> >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > file_count));
> >               mdb->drDirCnt =
> >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > folder_count));
> >
> >               hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > mdb-
> > > drXTExtRec,
> >                                    &mdb->drXTFlSize, NULL);
> >               hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > mdb-
> > > drCTExtRec,
> >                                    &mdb->drCTFlSize, NULL);
> >
> >               /* write MDB to disk */
> >               mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > unlock_buffer(HFS_SB(sb)->mdb_bh);
> >
> > In this case, we don;'t need to overlap the locks. What do you think?
> > Sounds like reasonable solution?
> >
> >
>
> Frankly speaking, I don't like this approach of assigning the pointer
> on bh->b_data to HFS_SB(sb)->mdb:
>
> data = (void *)(__bh->b_data + __offset);
>
> We manipulate by various fields of HFS_SB(sb)->mdb throughout of the
> hfs_mdb_commit(). I think that we need to allocate buffer for struct
> hfs_mdb *mdb during hfs_mdb_get() (and destroy in hfs_mdb_put()) and
> synchronize its content with struct buffer_head *mdb_bh. Otherwise, we
> never can resolve the problem of proper locking in hfs_mdb_commit().
> What do you think?
>
> Thanks,
> Slava.
>

Yes, I agree. Keeping HFS_SB(sb)->mdb as a separate in-memory copy sounds
cleaner than pointing it directly into mdb_bh->b_data.

For the immediate deadlock fix, I think moving hfs_inode_write_fork() into the
primary MDB update section as you suggested sounds like a good minimal step. The
separate in-memory MDB copy sounds like a larger cleanup that could follow if
you think that is the right direction.

Thanks,
Yue

> >
> > > Since mdb points into mdb_bh->b_data, these calls seem to update
> > > fields
> > > in the primary MDB buffer. So I was thinking that we could keep
> > > your
> > > locking scheme, but move unlock_buffer(mdb_bh) slightly later,
> > > after
> > > these primary MDB updates and mark_buffer_dirty(mdb_bh). The
> > > alternate
> > > MDB copy/sync and the bitmap writeback would still happen after
> > > mdb_bh
> > > is unlocked, so the main deadlock fix remains the same.
> > >
> > > If this concern is not valid, or if the current HFS update rules
> > > already
> > > make your shorter window sufficient, please ignore this. I defer to
> > > your
> > > judgment here.
> > >
> > > Below is your diff with only that small adjustment.
> > >
> > > Thanks,
> > > Yue
> > >
> > > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > > index ac0e83f77a0f..00651f42ba38 100644
> > > --- a/fs/hfs/hfs_fs.h
> > > +++ b/fs/hfs/hfs_fs.h
> > > @@ -66,6 +66,7 @@ struct hfs_inode_info {
> > >   * The HFS-specific part of a Linux (struct super_block)
> > >   */
> > >  struct hfs_sb_info {
> > > +   struct mutex mdb_lock;                  /* MDB operations
> > > lock */
> > >     struct buffer_head *mdb_bh;             /* The hfs_buffer
> > >                                                holding the
> > > real
> > >                                                superblock (aka
> > > VIB
> > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > index a97cea35ca2e..d81e3545a045 100644
> > > --- a/fs/hfs/mdb.c
> > > +++ b/fs/hfs/mdb.c
> > > @@ -287,6 +287,7 @@ int hfs_mdb_get(struct super_block *sb)
> > >  void hfs_mdb_commit(struct super_block *sb)
> > >  {
> > >     struct hfs_mdb *mdb = HFS_SB(sb)->mdb;
> > > +   bool write_backup = false;
> > >
> > >     if (sb_rdonly(sb))
> > >             return;
> > > @@ -314,11 +315,17 @@ void hfs_mdb_commit(struct super_block *sb)
> > >      * files grow. */
> > >     if (test_and_clear_bit(HFS_FLG_ALT_MDB_DIRTY, &HFS_SB(sb)-
> > > > flags) &&
> > >         HFS_SB(sb)->alt_mdb) {
> > > +           write_backup = true;
> > >             hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > > mdb->drXTExtRec,
> > >                                  &mdb->drXTFlSize, NULL);
> > >             hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > > mdb->drCTExtRec,
> > >                                  &mdb->drCTFlSize, NULL);
> > >
> > > +           mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > > +   }
> > > +   unlock_buffer(HFS_SB(sb)->mdb_bh);
> > > +
> > > +   if (write_backup) {
> > >             lock_buffer(HFS_SB(sb)->alt_mdb_bh);
> > >             memcpy(HFS_SB(sb)->alt_mdb, HFS_SB(sb)->mdb,
> > > HFS_SECTOR_SIZE);
> > >             HFS_SB(sb)->alt_mdb->drAtrb |=
> > > cpu_to_be16(HFS_SB_ATTRIB_UNMNT);
> > > @@ -360,7 +367,6 @@ void hfs_mdb_commit(struct super_block *sb)
> > >                     size -= len;
> > >             }
> > >     }
> > > -   unlock_buffer(HFS_SB(sb)->mdb_bh);
> > >  }
> > >
> > >  void hfs_mdb_close(struct super_block *sb)
> > > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > > index a4f2a2bfa6d3..06ab904b3a42 100644
> > > --- a/fs/hfs/super.c
> > > +++ b/fs/hfs/super.c
> > > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> > >  static int hfs_sync_fs(struct super_block *sb, int wait)
> > >  {
> > >     is_hfs_cnid_counts_valid(sb);
> > > +
> > > +   mutex_lock(&HFS_SB(sb)->mdb_lock);
> > >     hfs_mdb_commit(sb);
> > > +   mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > > +
> > >     return 0;
> > >  }
> > >
> > > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> > >
> > >     is_hfs_cnid_counts_valid(sb);
> > >
> > > +   mutex_lock(&sbi->mdb_lock);
> > >     hfs_mdb_commit(sb);
> > > +   mutex_unlock(&sbi->mdb_lock);
> > >  }
> > >
> > >  void hfs_mark_mdb_dirty(struct super_block *sb)
> > > @@ -339,9 +345,12 @@ static int hfs_fill_super(struct super_block
> > > *sb, struct fs_context *fc)
> > >     sb->s_op = &hfs_super_operations;
> > >     sb->s_xattr = hfs_xattr_handlers;
> > >     sb->s_flags |= SB_NODIRATIME;
> > > +   mutex_init(&sbi->mdb_lock);
> > >     mutex_init(&sbi->bitmap_lock);
> > >
> > > +   mutex_lock(&sbi->mdb_lock);
> > >     res = hfs_mdb_get(sb);
> > > +   mutex_unlock(&sbi->mdb_lock);
> > >     if (res) {
> > >             if (!silent)
> > >                     pr_warn("can't find a HFS filesystem on
> > > dev
> > > %s\n",

Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

Posted by Viacheslav Dubeyko 2 days, 16 hours ago

On Fri, 2026-06-05 at 12:00 +0800, Sam Sun wrote:
> On Fri, Jun 5, 2026 at 7:52 AM Viacheslav Dubeyko <slava@dubeyko.com>
> wrote:
> > 
> > On Thu, 2026-06-04 at 14:25 -0700, Viacheslav Dubeyko wrote:
> > > On Fri, 2026-06-05 at 02:02 +0800, Yue Sun wrote:
> > > > On Thu, Jun 4, 2026 at 12:25 AM <slava@dubeyko.com> wrote:
> > > > > 
> > > > > Could you please take a deeper look into my diff that I've
> > > > > shared
> > > > > with
> > > > > you before? I don't see the point to review your suggestion
> > > > > if
> > > > > you
> > > > > completely ignored my diff. From my point of view you
> > > > > suggested
> > > > > completely the same approach but in more complicated manner.
> > > > 
> > > > You are right. I am sorry for the confusion.
> > > > 
> > > > I should have looked more carefully at your diff and based my
> > > > reply
> > > > on
> > > > it. Adding an HFS-level MDB mutex and avoiding holding mdb_bh
> > > > across
> > > > the
> > > > whole hfs_mdb_commit() path is the right direction. My previous
> > > > reply
> > > > mixed that with extra cleanups and a separate corrupted-layout
> > > > check,
> > > > which made it look like I was proposing a different approach.
> > > > That
> > > > was
> > > > my mistake.
> > > > 
> > > > I am not very familiar with HFS internals, so please treat the
> > > > following
> > > > only as a small suggestion. After re-reading your diff, the
> > > > only
> > > > detail
> > > > I am still unsure about is the HFS_FLG_ALT_MDB_DIRTY path. In
> > > > that
> > > > path,
> > > > hfs_inode_write_fork() gets MDB fields as output buffers:
> > > > 
> > > >         hfs_inode_write_fork(..., mdb->drXTExtRec,
> > > >                              &mdb->drXTFlSize, NULL);
> > > >         hfs_inode_write_fork(..., mdb->drCTExtRec,
> > > >                              &mdb->drCTFlSize, NULL);
> > > 
> > > 
> > > I've started to think that, maybe, the location of these
> > > hfs_inode_write_fork() calls is not correct. We are trying to
> > > save
> > > the
> > > current size of Catalog File and Extents Overflow File. OK, we
> > > need
> > > to
> > > update this if size is changed. But we can save the sizes during
> > > every
> > > call of hfs_mdb_commit(). It's logically incorrect to call these
> > > methods in the alternative MDB section because it is the content
> > > of
> > > primary MDB. I suggest to move these two hfs_inode_write_fork()
> > > to
> > > primary MDB writing section:
> > > 
> > > lock_buffer(HFS_SB(sb)->mdb_bh);
> > >               /* These parameters may have been modified, so
> > > write
> > > them back */
> > >               mdb->drLsMod = hfs_mtime();
> > >               mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > > > free_ablocks);
> > >               mdb->drNxtCNID =
> > >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > > next_id));
> > >               mdb->drNmFls = cpu_to_be16(HFS_SB(sb)->root_files);
> > >               mdb->drNmRtDirs = cpu_to_be16(HFS_SB(sb)-
> > > > root_dirs);
> > >               mdb->drFilCnt =
> > >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > > file_count));
> > >               mdb->drDirCnt =
> > >                       cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > > folder_count));
> > > 
> > >               hfs_inode_write_fork(HFS_SB(sb)->ext_tree->inode,
> > > mdb-
> > > > drXTExtRec,
> > >                                    &mdb->drXTFlSize, NULL);
> > >               hfs_inode_write_fork(HFS_SB(sb)->cat_tree->inode,
> > > mdb-
> > > > drCTExtRec,
> > >                                    &mdb->drCTFlSize, NULL);
> > > 
> > >               /* write MDB to disk */
> > >               mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > > unlock_buffer(HFS_SB(sb)->mdb_bh);
> > > 
> > > In this case, we don;'t need to overlap the locks. What do you
> > > think?
> > > Sounds like reasonable solution?
> > > 
> > > 
> > 
> > Frankly speaking, I don't like this approach of assigning the
> > pointer
> > on bh->b_data to HFS_SB(sb)->mdb:
> > 
> > data = (void *)(__bh->b_data + __offset);
> > 
> > We manipulate by various fields of HFS_SB(sb)->mdb throughout of
> > the
> > hfs_mdb_commit(). I think that we need to allocate buffer for
> > struct
> > hfs_mdb *mdb during hfs_mdb_get() (and destroy in hfs_mdb_put())
> > and
> > synchronize its content with struct buffer_head *mdb_bh. Otherwise,
> > we
> > never can resolve the problem of proper locking in
> > hfs_mdb_commit().
> > What do you think?
> > 
> > Thanks,
> > Slava.
> > 
> 
> Yes, I agree. Keeping HFS_SB(sb)->mdb as a separate in-memory copy
> sounds
> cleaner than pointing it directly into mdb_bh->b_data.
> 
> For the immediate deadlock fix, I think moving hfs_inode_write_fork()
> into the
> primary MDB update section as you suggested sounds like a good
> minimal step. The
> separate in-memory MDB copy sounds like a larger cleanup that could
> follow if
> you think that is the right direction.
> 

I don't expect that in-memory MDB copy will require significant
cleanup. We already have mdb and alt_mdb pointer that are used in the
HFS code. We simply need to change the initialization phase, freeing
phase and hfs_mdb_commit() logic. It doesn't sound like a big cleanup.
But it will be great to get rid of using the buffer heads for primary
and alternative MDB records. This is the more complicated change. But
it should be not the part of this patch. 

Thanks,
Slava.