[PATCH] ocfs2: kill osb->system_file_mutex lock

Tetsuo Handa posted 1 patch 3 months, 2 weeks ago
fs/ocfs2/ocfs2.h   | 2 --
fs/ocfs2/super.c   | 2 --
fs/ocfs2/sysfile.c | 9 +++------
3 files changed, 3 insertions(+), 10 deletions(-)
[PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Tetsuo Handa 3 months, 2 weeks ago
Since calling _ocfs2_get_system_file_inode() twice with the same
arguments returns the same address, there is no need to serialize
_ocfs2_get_system_file_inode() using osb->system_file_mutex lock.

Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock.
cmpxchg() will be sufficient for avoiding the inode refcount leak
problem which commit 43b10a20372d ("ocfs2: avoid system inode ref
confusion by adding mutex lock") tried to address.

Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: jiangyiwen <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
---
 fs/ocfs2/ocfs2.h   | 2 --
 fs/ocfs2/super.c   | 2 --
 fs/ocfs2/sysfile.c | 9 +++------
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index 6aaa94c554c1..8bdeea60742a 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -494,8 +494,6 @@ struct ocfs2_super
 	struct rb_root	osb_rf_lock_tree;
 	struct ocfs2_refcount_tree *osb_ref_tree_lru;
 
-	struct mutex system_file_mutex;
-
 	/*
 	 * OCFS2 needs to schedule several different types of work which
 	 * require cluster locking, disk I/O, recovery waits, etc. Since these
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 3d2533950bae..4461daf909cf 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
 	spin_lock_init(&osb->osb_xattr_lock);
 	ocfs2_init_steal_slots(osb);
 
-	mutex_init(&osb->system_file_mutex);
-
 	atomic_set(&osb->alloc_stats.moves, 0);
 	atomic_set(&osb->alloc_stats.local_data, 0);
 	atomic_set(&osb->alloc_stats.bitmap_data, 0);
diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
index 53a945da873b..b63af8d64904 100644
--- a/fs/ocfs2/sysfile.c
+++ b/fs/ocfs2/sysfile.c
@@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
 	} else
 		arr = get_local_system_inode(osb, type, slot);
 
-	mutex_lock(&osb->system_file_mutex);
 	if (arr && ((inode = *arr) != NULL)) {
 		/* get a ref in addition to the array ref */
 		inode = igrab(inode);
-		mutex_unlock(&osb->system_file_mutex);
 		BUG_ON(!inode);
 
 		return inode;
@@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
 	inode = _ocfs2_get_system_file_inode(osb, type, slot);
 
 	/* add one more if putting into array for first time */
-	if (arr && inode) {
-		*arr = igrab(inode);
-		BUG_ON(!*arr);
+	if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
+		inode = igrab(inode);
+		BUG_ON(!inode);
 	}
-	mutex_unlock(&osb->system_file_mutex);
 	return inode;
 }
 
-- 
2.47.1
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Heming Zhao 3 months, 2 weeks ago
Hello,

Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d
did it properly.
However, I don't see how your patch fixes the syzbot report [1]. Could you
elaborate on the root cause analysis?

My review comments are inline below.

[1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b

On 6/21/25 23:56, Tetsuo Handa wrote:
> Since calling _ocfs2_get_system_file_inode() twice with the same
> arguments returns the same address, there is no need to serialize
> _ocfs2_get_system_file_inode() using osb->system_file_mutex lock.
> 
> Kill osb->system_file_mutex lock in order to avoid AB-BA deadlock.
> cmpxchg() will be sufficient for avoiding the inode refcount leak
> problem which commit 43b10a20372d ("ocfs2: avoid system inode ref
> confusion by adding mutex lock") tried to address.
> 
> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b

> Closes: https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
I don't think we need 'Closes'.

> Fixes: 43b10a20372d ("ocfs2: avoid system inode ref confusion by adding mutex lock")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: jiangyiwen <jiangyiwen@huawei.com>
> Cc: Joseph Qi <joseph.qi@huawei.com>
> Cc: Joel Becker <jlbec@evilplan.org>
> Cc: Mark Fasheh <mfasheh@suse.com>
The 'CC's are also useless.

> ---
>   fs/ocfs2/ocfs2.h   | 2 --
>   fs/ocfs2/super.c   | 2 --
>   fs/ocfs2/sysfile.c | 9 +++------
>   3 files changed, 3 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
> index 6aaa94c554c1..8bdeea60742a 100644
> --- a/fs/ocfs2/ocfs2.h
> +++ b/fs/ocfs2/ocfs2.h
> @@ -494,8 +494,6 @@ struct ocfs2_super
>   	struct rb_root	osb_rf_lock_tree;
>   	struct ocfs2_refcount_tree *osb_ref_tree_lru;
>   
> -	struct mutex system_file_mutex;
> -
>   	/*
>   	 * OCFS2 needs to schedule several different types of work which
>   	 * require cluster locking, disk I/O, recovery waits, etc. Since these
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 3d2533950bae..4461daf909cf 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1997,8 +1997,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
>   	spin_lock_init(&osb->osb_xattr_lock);
>   	ocfs2_init_steal_slots(osb);
>   
> -	mutex_init(&osb->system_file_mutex);
> -
>   	atomic_set(&osb->alloc_stats.moves, 0);
>   	atomic_set(&osb->alloc_stats.local_data, 0);
>   	atomic_set(&osb->alloc_stats.bitmap_data, 0);
> diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c
> index 53a945da873b..b63af8d64904 100644
> --- a/fs/ocfs2/sysfile.c
> +++ b/fs/ocfs2/sysfile.c
> @@ -98,11 +98,9 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>   	} else
>   		arr = get_local_system_inode(osb, type, slot);
>   
> -	mutex_lock(&osb->system_file_mutex);
>   	if (arr && ((inode = *arr) != NULL)) {
>   		/* get a ref in addition to the array ref */
>   		inode = igrab(inode);
> -		mutex_unlock(&osb->system_file_mutex);
>   		BUG_ON(!inode);
>   

I agree the above mutex_lock and mutex_unlock is useless. we can remove it
without any problem.

>   		return inode;
> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>   	inode = _ocfs2_get_system_file_inode(osb, type, slot);

In my view, the key of commit 43b10a20372d is to avoid calling
_ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
do refcnt-1.

>   
>   	/* add one more if putting into array for first time */
> -	if (arr && inode) {
> -		*arr = igrab(inode);
> -		BUG_ON(!*arr);
> +	if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {

Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
before returning to the caller.

> +		inode = igrab(inode);
> +		BUG_ON(!inode);
>   	}
> -	mutex_unlock(&osb->system_file_mutex);
>   	return inode;
>   }
>
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Tetsuo Handa 3 months, 2 weeks ago
On 2025/06/24 10:33, Heming Zhao wrote:
>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>>       inode = _ocfs2_get_system_file_inode(osb, type, slot);
> 
> In my view, the key of commit 43b10a20372d is to avoid calling
> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
> do refcnt-1.

My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself
is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling
iput().

The problem commit 43b10a20372d fixed is that there was no mechanism to avoid
concurrently calling

  *arr = igrab(inode);

which will result in failing to call iput() for raced references when
ocfs2_release_system_inodes() is called.

> 
>>         /* add one more if putting into array for first time */
>> -    if (arr && inode) {
>> -        *arr = igrab(inode);
>> -        BUG_ON(!*arr);
>> +    if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
> 
> Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
> before returning to the caller.
> 
>> +        inode = igrab(inode);

We do refcnt+1 immediately after cmpxchg() succeeds, for
ocfs2_release_system_inodes() which clears *arr is the place for
doing refcnt-1.

>> +        BUG_ON(!inode);
>>       }
>> -    mutex_unlock(&osb->system_file_mutex);
>>       return inode;
>>   }
>>   
>
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Heming Zhao 3 months, 2 weeks ago
On 6/24/25 10:17, Tetsuo Handa wrote:
> On 2025/06/24 10:33, Heming Zhao wrote:
>>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>>>        inode = _ocfs2_get_system_file_inode(osb, type, slot);
>>
>> In my view, the key of commit 43b10a20372d is to avoid calling
>> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
>> do refcnt-1.
> 
> My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself
> is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling
> iput().

We have different perspectives on calling _ocfs2_get_system_file_inode().
In the current code logic, _ocfs2_get_system_file_inode() is expected to
be called only once. Subsequent local system inodes will be retrieved from
the cache (via get_local_system_inode()).

> 
> The problem commit 43b10a20372d fixed is that there was no mechanism to avoid
> concurrently calling
> 
>    *arr = igrab(inode);
> 
> which will result in failing to call iput() for raced references when
> ocfs2_release_system_inodes() is called.
> 
>>
>>>          /* add one more if putting into array for first time */
>>> -    if (arr && inode) {
>>> -        *arr = igrab(inode);
>>> -        BUG_ON(!*arr);
>>> +    if (inode && arr && !*arr && !cmpxchg(&(*arr), NULL, inode)) {
>>
>> Bypassing the refcnt+1 here is not a good idea. We should do refcnt+1
>> before returning to the caller.
>>
>>> +        inode = igrab(inode);
> 
> We do refcnt+1 immediately after cmpxchg() succeeds, for
> ocfs2_release_system_inodes() which clears *arr is the place for
> doing refcnt-1.
> 
>>> +        BUG_ON(!inode);
>>>        }
>>> -    mutex_unlock(&osb->system_file_mutex);
>>>        return inode;
>>>    }
>>>    
>>
> 

In my view, your patch has logical errors - at least from my perspective,
I have to vote NAK.

In my view, for this syzbot bug, the better solution is to block/deny write
operations during the ocfs2 mounting phase.
There are many syzbot bugs related to writing data during the mounting phase.
I don't believe there is any reason a user would want to write data before the
filesystem is mounted.
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Tetsuo Handa 3 months, 2 weeks ago
On 2025/06/24 11:40, Heming Zhao wrote:
> On 6/24/25 10:17, Tetsuo Handa wrote:
>> On 2025/06/24 10:33, Heming Zhao wrote:
>>>> @@ -112,11 +110,10 @@ struct inode *ocfs2_get_system_file_inode(struct ocfs2_super *osb,
>>>>        inode = _ocfs2_get_system_file_inode(osb, type, slot);
>>>
>>> In my view, the key of commit 43b10a20372d is to avoid calling
>>> _ocfs2_get_system_file_inode() twice, which lead refcnt+1 but no place to
>>> do refcnt-1.
>>
>> My understanding is that concurrently calling _ocfs2_get_system_file_inode() itself
>> is OK, for the caller of ocfs2_get_system_file_inode() is responsible for calling
>> iput().
> 
> We have different perspectives on calling _ocfs2_get_system_file_inode().
> In the current code logic, _ocfs2_get_system_file_inode() is expected to
> be called only once. Subsequent local system inodes will be retrieved from
> the cache (via get_local_system_inode()).

That expectation is wrong. Since get_local_system_inode() involves memory allocation,
get_local_system_inode() might return NULL. Therefore, ocfs2_get_system_file_inode(),
which is the caller of get_local_system_inode(), has to be logically prepared for
calling _ocfs2_get_system_file_inode() for multiple times.

This cache is only for speeding lookups up.
This cache does not provide "lookup only once" requirement.

> 
> In my view, your patch has logical errors - at least from my perspective,
> I have to vote NAK.

If you NAK, you have to make sure that get_local_system_inode() never fails.
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Tetsuo Handa 3 months, 2 weeks ago
On 2025/06/24 10:33, Heming Zhao wrote:
> Hello,
> 
> Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d
> did it properly.
> However, I don't see how your patch fixes the syzbot report [1]. Could you
> elaborate on the root cause analysis?
> 
> My review comments are inline below.
> 
> [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b

My patch does not fix [1]. My patch fixes a bug which syzbot reported at
https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
when testing with Diogo's patch at
https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1].

>> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
> 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b

Since there is not yet a bug link for my patch, I don't choose syzbot as reporter.
Diogo will post a formal patch for fixing [1] after returning from vacation.
Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
Posted by Heming Zhao 3 months, 2 weeks ago
Hello Joseph,

Do you agree that we need to add a rule for the ocfs2 read/write (IO) path?
- When an ocfs2 volume is in the process of mounting, all write
   operations must fail immediately.

- Heming

On 6/24/25 09:55, Tetsuo Handa wrote:
> On 2025/06/24 10:33, Heming Zhao wrote:
>> Hello,
>>
>> Protecting refcnt with a mutex is the right approach, and commit 43b10a20372d
>> did it properly.
>> However, I don't see how your patch fixes the syzbot report [1]. Could you
>> elaborate on the root cause analysis?
>>
>> My review comments are inline below.
>>
>> [1]: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b
> 
> My patch does not fix [1]. My patch fixes a bug which syzbot reported at
> https://lkml.kernel.org/r/000000000000ff2d7a0620381afe@google.com
> when testing with Diogo's patch at
> https://syzkaller.appspot.com/x/patch.diff?x=178f93d5980000 for [1].
> 
>>> Reported-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
>> 'Reported-by' should be: https://syzkaller.appspot.com/bug?extid=1fed2de07d8e11a3ec1b
> 
> Since there is not yet a bug link for my patch, I don't choose syzbot as reporter.
> Diogo will post a formal patch for fixing [1] after returning from vacation.
> 
>