fs/ocfs2/ocfs2.h | 2 -- fs/ocfs2/super.c | 2 -- fs/ocfs2/sysfile.c | 9 +++------ 3 files changed, 3 insertions(+), 10 deletions(-)
Since calling _ocfs2_get_system_file_inode() twice with the same
arguments returns the same address, there is no need to serialize
_ocfs2_get_system_file_inode() using osb->system_file_mutex lock.
Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock.
cmpxchg() will be sufficient for avoiding the inode refcount leak
problem which commit 43b10a20372d ("ocfs2: avoid system inode ref
confusion by adding mutex lock") tried to address.
Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: jiangyiwen <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/ocfs2.h | 2 --
fs/ocfs2/super.c | 2 --
fs/ocfs2/sysfile.c | 9 +++------
3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6aaa94c554c1..8bdeea60742a 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -494,8 +494,6 @@ struct ocfs2_super
struct rb_root osb_rf_lock_tree;
struct ocfs2_refcount_tree *osb_ref_tree_lru;
- struct mutex system_file_mutex;
-
/*
* OCFS2 needs to schedule several different types of work which
* require cluster locking, disk I/O, recovery waits, etc. Since these
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 3d2533950bae..4461daf909cf 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
spin_lock_init(&osb->osb_xattr_lock);
ocfs2_init_steal_slots(osb);
- mutex_init(&osb->system_file_mutex);
-
atomic_set(&osb->alloc_stats.moves, 0);
atomic_set(&osb->alloc_stats.local_data, 0);
atomic_set(&osb->alloc_stats.bitmap_data, 0);
diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
index 53a945da873b..b63af8d64904 100644
--- a/fs/ocfs2/sysfile.c
+++ b/fs/ocfs2/sysfile.c
@@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
} else
arr = get_local_system_inode(osb, type, slot);
- mutex_lock(&osb->system_file_mutex);
if (arr && ((inode = *arr) != NULL)) {
/* get a ref in addition to the array ref */
inode = igrab(inode);
- mutex_unlock(&osb->system_file_mutex);
BUG_ON(!inode);
return inode;
@@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
inode = _ocfs2_get_system_file_inode(osb, type, slot);
/* add one more if putting into array for first time */
- if (arr && inode) {
- *arr = igrab(inode);
- BUG_ON(!*arr);
+ if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
+ inode = igrab(inode);
+ BUG_ON(!inode);
}
- mutex_unlock(&osb->system_file_mutex);
return inode;
}
--
2.47.1
Hello, Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d did it properly. However, I don't see how your patch fixes the syzbot report [1]. Could you elaborate on the root cause analysis? My review comments are inline below. [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b On 6/21/25 23:56, Tetsuo Handa wrote: > Since calling _ocfs2_get_system_file_inode() twice with the same > arguments returns the same address, there is no need to serialize > _ocfs2_get_system_file_inode() using osb->system_file_mutex lock. > > Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock. > cmpxchg() will be sufficient for avoiding the inode refcount leak > problem which commit 43b10a20372d ("ocfs2: avoid system inode ref > confusion by adding mutex lock") tried to address. > > Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com> 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b > Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com I don't think we need 'Closes'. > Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock") > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: jiangyiwen <jiangyiwen@huawei.com> > Cc: Joseph Qi <joseph.qi@huawei.com> > Cc: Joel Becker <jlbec@evilplan.org> > Cc: Mark Fasheh <mfasheh@suse.com> The 'CC's are also useless. > --- > fs/ocfs2/ocfs2.h | 2 -- > fs/ocfs2/super.c | 2 -- > fs/ocfs2/sysfile.c | 9 +++------ > 3 files changed, 3 insertions(+), 10 deletions(-) > > diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h > index 6aaa94c554c1..8bdeea60742a 100644 > --- a/fs/ocfs2/ocfs2.h > +++ b/fs/ocfs2/ocfs2.h > @@ -494,8 +494,6 @@ struct ocfs2_super > struct rb_root osb_rf_lock_tree; > struct ocfs2_refcount_tree *osb_ref_tree_lru; > > - struct mutex system_file_mutex; > - > /* > * OCFS2 needs to schedule several different types of work which > * require cluster locking, disk I/O, recovery waits, etc. Since these > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c > index 3d2533950bae..4461daf909cf 100644 > --- a/fs/ocfs2/super.c > +++ b/fs/ocfs2/super.c > @@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb, > spin_lock_init(&osb->osb_xattr_lock); > ocfs2_init_steal_slots(osb); > > - mutex_init(&osb->system_file_mutex); > - > atomic_set(&osb->alloc_stats.moves, 0); > atomic_set(&osb->alloc_stats.local_data, 0); > atomic_set(&osb->alloc_stats.bitmap_data, 0); > diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c > index 53a945da873b..b63af8d64904 100644 > --- a/fs/ocfs2/sysfile.c > +++ b/fs/ocfs2/sysfile.c > @@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, > } else > arr = get_local_system_inode(osb, type, slot); > > - mutex_lock(&osb->system_file_mutex); > if (arr && ((inode = *arr) != NULL)) { > /* get a ref in addition to the array ref */ > inode = igrab(inode); > - mutex_unlock(&osb->system_file_mutex); > BUG_ON(!inode); > I agree the above mutex_lock and mutex_unlock is useless. we can remove it without any problem. > return inode; > @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, > inode = _ocfs2_get_system_file_inode(osb, type, slot); In my view, the key of commit 43b10a20372d is to avoid calling _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to do refcnt-1. > > /* add one more if putting into array for first time */ > - if (arr && inode) { > - *arr = igrab(inode); > - BUG_ON(!*arr); > + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) { Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1 before returning to the caller. > + inode = igrab(inode); > + BUG_ON(!inode); > } > - mutex_unlock(&osb->system_file_mutex); > return inode; > } >
On 2025/06/24 10:33, Heming Zhao wrote: >> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, >> inode = _ocfs2_get_system_file_inode(osb, type, slot); > > In my view, the key of commit 43b10a20372d is to avoid calling > _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to > do refcnt-1. My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling iput(). The problem commit 43b10a20372d fixed is that there was no mechanism to avoid concurrently calling *arr = igrab(inode); which will result in failing to call iput() for raced references when ocfs2_release_system_inodes() is called. > >> /* add one more if putting into array for first time */ >> - if (arr && inode) { >> - *arr = igrab(inode); >> - BUG_ON(!*arr); >> + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) { > > Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1 > before returning to the caller. > >> + inode = igrab(inode); We do refcnt+1 immediately after cmpxchg() succeeds, for ocfs2_release_system_inodes() which clears *arr is the place for doing refcnt-1. >> + BUG_ON(!inode); >> } >> - mutex_unlock(&osb->system_file_mutex); >> return inode; >> } >> >
On 6/24/25 10:17, Tetsuo Handa wrote: > On 2025/06/24 10:33, Heming Zhao wrote: >>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, >>> inode = _ocfs2_get_system_file_inode(osb, type, slot); >> >> In my view, the key of commit 43b10a20372d is to avoid calling >> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to >> do refcnt-1. > > My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself > is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling > iput(). We have different perspectives on calling _ocfs2_get_system_file_inode(). In the current code logic, _ocfs2_get_system_file_inode() is expected to be called only once. Subsequent local system inodes will be retrieved from the cache (via get_local_system_inode()). > > The problem commit 43b10a20372d fixed is that there was no mechanism to avoid > concurrently calling > > *arr = igrab(inode); > > which will result in failing to call iput() for raced references when > ocfs2_release_system_inodes() is called. > >> >>> /* add one more if putting into array for first time */ >>> - if (arr && inode) { >>> - *arr = igrab(inode); >>> - BUG_ON(!*arr); >>> + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) { >> >> Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1 >> before returning to the caller. >> >>> + inode = igrab(inode); > > We do refcnt+1 immediately after cmpxchg() succeeds, for > ocfs2_release_system_inodes() which clears *arr is the place for > doing refcnt-1. > >>> + BUG_ON(!inode); >>> } >>> - mutex_unlock(&osb->system_file_mutex); >>> return inode; >>> } >>> >> > In my view, your patch has logical errors - at least from my perspective, I have to vote NAK. In my view, for this syzbot bug, the better solution is to block/deny write operations during the ocfs2 mounting phase. There are many syzbot bugs related to writing data during the mounting phase. I don't believe there is any reason a user would want to write data before the filesystem is mounted.
On 2025/06/24 11:40, Heming Zhao wrote: > On 6/24/25 10:17, Tetsuo Handa wrote: >> On 2025/06/24 10:33, Heming Zhao wrote: >>>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, >>>> inode = _ocfs2_get_system_file_inode(osb, type, slot); >>> >>> In my view, the key of commit 43b10a20372d is to avoid calling >>> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to >>> do refcnt-1. >> >> My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself >> is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling >> iput(). > > We have different perspectives on calling _ocfs2_get_system_file_inode(). > In the current code logic, _ocfs2_get_system_file_inode() is expected to > be called only once. Subsequent local system inodes will be retrieved from > the cache (via get_local_system_inode()). That expectation is wrong. Since get_local_system_inode() involves memory allocation, get_local_system_inode() might return NULL. Therefore, ocfs2_get_system_file_inode(), which is the caller of get_local_system_inode(), has to be logically prepared for calling _ocfs2_get_system_file_inode() for multiple times. This cache is only for speeding lookups up. This cache does not provide "lookup only once" requirement. > > In my view, your patch has logical errors - at least from my perspective, > I have to vote NAK. If you NAK, you have to make sure that get_local_system_inode() never fails.
On 2025/06/24 10:33, Heming Zhao wrote: > Hello, > > Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d > did it properly. > However, I don't see how your patch fixes the syzbot report [1]. Could you > elaborate on the root cause analysis? > > My review comments are inline below. > > [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b My patch does not fix [1]. My patch fixes a bug which syzbot reported at https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com when testing with Diogo's patch at https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1]. >> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com> > 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b Since there is not yet a bug link for my patch, I don't choose syzbot as reporter. Diogo will post a formal patch for fixing [1] after returning from vacation.
Hello Joseph, Do you agree that we need to add a rule for the ocfs2 read/write (IO) path? - When an ocfs2 volume is in the process of mounting, all write operations must fail immediately. - Heming On 6/24/25 09:55, Tetsuo Handa wrote: > On 2025/06/24 10:33, Heming Zhao wrote: >> Hello, >> >> Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d >> did it properly. >> However, I don't see how your patch fixes the syzbot report [1]. Could you >> elaborate on the root cause analysis? >> >> My review comments are inline below. >> >> [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b > > My patch does not fix [1]. My patch fixes a bug which syzbot reported at > https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com > when testing with Diogo's patch at > https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1]. > >>> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com> >> 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b > > Since there is not yet a bug link for my patch, I don't choose syzbot as reporter. > Diogo will post a formal patch for fixing [1] after returning from vacation. > >
© 2016 - 2025 Red Hat, Inc.