fs/ocfs2/ocfs2.h | 2 -- fs/ocfs2/super.c | 2 -- fs/ocfs2/sysfile.c | 9 +++------ 3 files changed, 3 insertions(+), 10 deletions(-)
Since calling _ocfs2_get_system_file_inode() twice with the same
arguments returns the same address, there is no need to serialize
_ocfs2_get_system_file_inode() using osb->system_file_mutex lock.
Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock.
cmpxchg() will be sufficient for avoiding the inode refcount leak
problem which commit 43b10a20372d ("ocfs2: avoid system inode ref
confusion by adding mutex lock") tried to address.
Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: jiangyiwen <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
---
fs/ocfs2/ocfs2.h | 2 --
fs/ocfs2/super.c | 2 --
fs/ocfs2/sysfile.c | 9 +++------
3 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6aaa94c554c1..8bdeea60742a 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -494,8 +494,6 @@ struct ocfs2_super
struct rb_root osb_rf_lock_tree;
struct ocfs2_refcount_tree *osb_ref_tree_lru;
- struct mutex system_file_mutex;
-
/*
* OCFS2 needs to schedule several different types of work which
* require cluster locking, disk I/O, recovery waits, etc. Since these
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 3d2533950bae..4461daf909cf 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
spin_lock_init(&osb->osb_xattr_lock);
ocfs2_init_steal_slots(osb);
- mutex_init(&osb->system_file_mutex);
-
atomic_set(&osb->alloc_stats.moves, 0);
atomic_set(&osb->alloc_stats.local_data, 0);
atomic_set(&osb->alloc_stats.bitmap_data, 0);
diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
index 53a945da873b..b63af8d64904 100644
--- a/fs/ocfs2/sysfile.c
+++ b/fs/ocfs2/sysfile.c
@@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
} else
arr = get_local_system_inode(osb, type, slot);
- mutex_lock(&osb->system_file_mutex);
if (arr && ((inode = *arr) != NULL)) {
/* get a ref in addition to the array ref */
inode = igrab(inode);
- mutex_unlock(&osb->system_file_mutex);
BUG_ON(!inode);
return inode;
@@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
inode = _ocfs2_get_system_file_inode(osb, type, slot);
/* add one more if putting into array for first time */
- if (arr && inode) {
- *arr = igrab(inode);
- BUG_ON(!*arr);
+ if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
+ inode = igrab(inode);
+ BUG_ON(!inode);
}
- mutex_unlock(&osb->system_file_mutex);
return inode;
}
--
2.47.1
Hello,
Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d
did it properly.
However, I don't see how your patch fixes the syzbot report [1]. Could you
elaborate on the root cause analysis?
My review comments are inline below.
[1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b
On 6/21/25 23:56, Tetsuo Handa wrote:
> Since calling _ocfs2_get_system_file_inode() twice with the same
> arguments returns the same address, there is no need to serialize
> _ocfs2_get_system_file_inode() using osb->system_file_mutex lock.
>
> Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock.
> cmpxchg() will be sufficient for avoiding the inode refcount leak
> problem which commit 43b10a20372d ("ocfs2: avoid system inode ref
> confusion by adding mutex lock") tried to address.
>
> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b
> Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
I don't think we need 'Closes'.
> Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: jiangyiwen <jiangyiwen@huawei.com>
> Cc: Joseph Qi <joseph.qi@huawei.com>
> Cc: Joel Becker <jlbec@evilplan.org>
> Cc: Mark Fasheh <mfasheh@suse.com>
The 'CC's are also useless.
> ---
> fs/ocfs2/ocfs2.h | 2 --
> fs/ocfs2/super.c | 2 --
> fs/ocfs2/sysfile.c | 9 +++------
> 3 files changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
> index 6aaa94c554c1..8bdeea60742a 100644
> --- a/fs/ocfs2/ocfs2.h
> +++ b/fs/ocfs2/ocfs2.h
> @@ -494,8 +494,6 @@ struct ocfs2_super
> struct rb_root osb_rf_lock_tree;
> struct ocfs2_refcount_tree *osb_ref_tree_lru;
>
> - struct mutex system_file_mutex;
> -
> /*
> * OCFS2 needs to schedule several different types of work which
> * require cluster locking, disk I/O, recovery waits, etc. Since these
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 3d2533950bae..4461daf909cf 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
> spin_lock_init(&osb->osb_xattr_lock);
> ocfs2_init_steal_slots(osb);
>
> - mutex_init(&osb->system_file_mutex);
> -
> atomic_set(&osb->alloc_stats.moves, 0);
> atomic_set(&osb->alloc_stats.local_data, 0);
> atomic_set(&osb->alloc_stats.bitmap_data, 0);
> diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
> index 53a945da873b..b63af8d64904 100644
> --- a/fs/ocfs2/sysfile.c
> +++ b/fs/ocfs2/sysfile.c
> @@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
> } else
> arr = get_local_system_inode(osb, type, slot);
>
> - mutex_lock(&osb->system_file_mutex);
> if (arr && ((inode = *arr) != NULL)) {
> /* get a ref in addition to the array ref */
> inode = igrab(inode);
> - mutex_unlock(&osb->system_file_mutex);
> BUG_ON(!inode);
>
I agree the above mutex_lock and mutex_unlock is useless. we can remove it
without any problem.
> return inode;
> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
> inode = _ocfs2_get_system_file_inode(osb, type, slot);
In my view, the key of commit 43b10a20372d is to avoid calling
_ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
do refcnt-1.
>
> /* add one more if putting into array for first time */
> - if (arr && inode) {
> - *arr = igrab(inode);
> - BUG_ON(!*arr);
> + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
before returning to the caller.
> + inode = igrab(inode);
> + BUG_ON(!inode);
> }
> - mutex_unlock(&osb->system_file_mutex);
> return inode;
> }
>
On 2025/06/24 10:33, Heming Zhao wrote:
>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>> inode = _ocfs2_get_system_file_inode(osb, type, slot);
>
> In my view, the key of commit 43b10a20372d is to avoid calling
> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
> do refcnt-1.
My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself
is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling
iput().
The problem commit 43b10a20372d fixed is that there was no mechanism to avoid
concurrently calling
*arr = igrab(inode);
which will result in failing to call iput() for raced references when
ocfs2_release_system_inodes() is called.
>
>> /* add one more if putting into array for first time */
>> - if (arr && inode) {
>> - *arr = igrab(inode);
>> - BUG_ON(!*arr);
>> + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
>
> Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
> before returning to the caller.
>
>> + inode = igrab(inode);
We do refcnt+1 immediately after cmpxchg() succeeds, for
ocfs2_release_system_inodes() which clears *arr is the place for
doing refcnt-1.
>> + BUG_ON(!inode);
>> }
>> - mutex_unlock(&osb->system_file_mutex);
>> return inode;
>> }
>>
>
On 6/24/25 10:17, Tetsuo Handa wrote:
> On 2025/06/24 10:33, Heming Zhao wrote:
>>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>>> inode = _ocfs2_get_system_file_inode(osb, type, slot);
>>
>> In my view, the key of commit 43b10a20372d is to avoid calling
>> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
>> do refcnt-1.
>
> My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself
> is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling
> iput().
We have different perspectives on calling _ocfs2_get_system_file_inode().
In the current code logic, _ocfs2_get_system_file_inode() is expected to
be called only once. Subsequent local system inodes will be retrieved from
the cache (via get_local_system_inode()).
>
> The problem commit 43b10a20372d fixed is that there was no mechanism to avoid
> concurrently calling
>
> *arr = igrab(inode);
>
> which will result in failing to call iput() for raced references when
> ocfs2_release_system_inodes() is called.
>
>>
>>> /* add one more if putting into array for first time */
>>> - if (arr && inode) {
>>> - *arr = igrab(inode);
>>> - BUG_ON(!*arr);
>>> + if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
>>
>> Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
>> before returning to the caller.
>>
>>> + inode = igrab(inode);
>
> We do refcnt+1 immediately after cmpxchg() succeeds, for
> ocfs2_release_system_inodes() which clears *arr is the place for
> doing refcnt-1.
>
>>> + BUG_ON(!inode);
>>> }
>>> - mutex_unlock(&osb->system_file_mutex);
>>> return inode;
>>> }
>>>
>>
>
In my view, your patch has logical errors - at least from my perspective,
I have to vote NAK.
In my view, for this syzbot bug, the better solution is to block/deny write
operations during the ocfs2 mounting phase.
There are many syzbot bugs related to writing data during the mounting phase.
I don't believe there is any reason a user would want to write data before the
filesystem is mounted.
On 2025/06/24 11:40, Heming Zhao wrote: > On 6/24/25 10:17, Tetsuo Handa wrote: >> On 2025/06/24 10:33, Heming Zhao wrote: >>>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb, >>>> inode = _ocfs2_get_system_file_inode(osb, type, slot); >>> >>> In my view, the key of commit 43b10a20372d is to avoid calling >>> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to >>> do refcnt-1. >> >> My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself >> is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling >> iput(). > > We have different perspectives on calling _ocfs2_get_system_file_inode(). > In the current code logic, _ocfs2_get_system_file_inode() is expected to > be called only once. Subsequent local system inodes will be retrieved from > the cache (via get_local_system_inode()). That expectation is wrong. Since get_local_system_inode() involves memory allocation, get_local_system_inode() might return NULL. Therefore, ocfs2_get_system_file_inode(), which is the caller of get_local_system_inode(), has to be logically prepared for calling _ocfs2_get_system_file_inode() for multiple times. This cache is only for speeding lookups up. This cache does not provide "lookup only once" requirement. > > In my view, your patch has logical errors - at least from my perspective, > I have to vote NAK. If you NAK, you have to make sure that get_local_system_inode() never fails.
On 2025/06/24 10:33, Heming Zhao wrote: > Hello, > > Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d > did it properly. > However, I don't see how your patch fixes the syzbot report [1]. Could you > elaborate on the root cause analysis? > > My review comments are inline below. > > [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b My patch does not fix [1]. My patch fixes a bug which syzbot reported at https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com when testing with Diogo's patch at https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1]. >> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com> > 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b Since there is not yet a bug link for my patch, I don't choose syzbot as reporter. Diogo will post a formal patch for fixing [1] after returning from vacation.
Hello Joseph, Do you agree that we need to add a rule for the ocfs2 read/write (IO) path? - When an ocfs2 volume is in the process of mounting, all write operations must fail immediately. - Heming On 6/24/25 09:55, Tetsuo Handa wrote: > On 2025/06/24 10:33, Heming Zhao wrote: >> Hello, >> >> Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d >> did it properly. >> However, I don't see how your patch fixes the syzbot report [1]. Could you >> elaborate on the root cause analysis? >> >> My review comments are inline below. >> >> [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b > > My patch does not fix [1]. My patch fixes a bug which syzbot reported at > https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com > when testing with Diogo's patch at > https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1]. > >>> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com> >> 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b > > Since there is not yet a bug link for my patch, I don't choose syzbot as reporter. > Diogo will post a formal patch for fixing [1] after returning from vacation. > >
© 2016 - 2026 Red Hat, Inc.