[PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()

Yu Kuai posted 1 patch 3 months, 1 week ago
drivers/block/brd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Yu Kuai 3 months, 1 week ago
From: Yu Kuai <yukuai3@huawei.com>

__xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
memory if necessary.

Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
it still should be held before xa_unlock(), prevent returned page to be
freed by concurrent discard.

Fixes: bbcacab2e8ee ("brd: avoid extra xarray lookups on first write")
Reported-by: syzbot+ea4c8fd177a47338881a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/685ec4c9.a00a0220.129264.000c.GAE@google.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
Changes in v2:
 - fix typo in the subject
 - add review tag.

 drivers/block/brd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b1be6c510372..0c2eabe14af3 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -64,13 +64,15 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector,
 
 	rcu_read_unlock();
 	page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM);
-	rcu_read_lock();
-	if (!page)
+	if (!page) {
+		rcu_read_lock();
 		return ERR_PTR(-ENOMEM);
+	}
 
 	xa_lock(&brd->brd_pages);
 	ret = __xa_cmpxchg(&brd->brd_pages, sector >> PAGE_SECTORS_SHIFT, NULL,
 			page, gfp);
+	rcu_read_lock();
 	if (ret) {
 		xa_unlock(&brd->brd_pages);
 		__free_page(page);
-- 
2.39.2
Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Jens Axboe 3 months, 1 week ago
On Mon, 30 Jun 2025 19:28:28 +0800, Yu Kuai wrote:
> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
> memory if necessary.
> 
> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
> it still should be held before xa_unlock(), prevent returned page to be
> freed by concurrent discard.
> 
> [...]

Applied, thanks!

[1/1] brd: fix sleeping function called from invalid context in brd_insert_page()
      commit: 0d519bb0de3bf0ac9e6f401d4910fc119062d7be

Best regards,
-- 
Jens Axboe
Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Jens Axboe 3 months, 1 week ago
On 6/30/25 5:28 AM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
> memory if necessary.
> 
> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
> it still should be held before xa_unlock(), prevent returned page to be
> freed by concurrent discard.

The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
rcu read side locking protecting? Is it only needed around the lookup
and insert? We even hold it over the kmap and copy, which seems very
heavy handed.

-- 
Jens Axboe
Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Jens Axboe 3 months, 1 week ago
On 6/30/25 9:24 AM, Jens Axboe wrote:
> On 6/30/25 5:28 AM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
>> memory if necessary.
>>
>> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
>> it still should be held before xa_unlock(), prevent returned page to be
>> freed by concurrent discard.
> 
> The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
> rcu read side locking protecting? Is it only needed around the lookup
> and insert? We even hold it over the kmap and copy, which seems very
> heavy handed.

Gah it's holding the page alive too. Can't we just grab a ref to the
page when inserting it, and drop that at free time? It would be a lot
better to have only the lookup be RCU protected, having the full
copies under it seems kind of crazy.

IOW, I think there's room for some good cleanups here.

-- 
Jens Axboe
Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Yu Kuai 3 months, 1 week ago
Hi,

在 2025/06/30 23:28, Jens Axboe 写道:
> On 6/30/25 9:24 AM, Jens Axboe wrote:
>> On 6/30/25 5:28 AM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
>>> memory if necessary.
>>>
>>> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
>>> it still should be held before xa_unlock(), prevent returned page to be
>>> freed by concurrent discard.
>>
>> The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
>> rcu read side locking protecting? Is it only needed around the lookup
>> and insert? We even hold it over the kmap and copy, which seems very
>> heavy handed.
> 
> Gah it's holding the page alive too. Can't we just grab a ref to the
> page when inserting it, and drop that at free time? It would be a lot
> better to have only the lookup be RCU protected, having the full
> copies under it seems kind of crazy.

In this case, we must grab a ref to the page for each read/write as
well, I choose RCU because I think it has less performance overhead than
page ref, which is atomic. BTW, I thought copy at most one page is
lightweight, if this is not true, I agree page ref is better.

Thanks,
Kuai

> 
> IOW, I think there's room for some good cleanups here.
> 

Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Jens Axboe 3 months, 1 week ago
On 6/30/25 7:28 PM, Yu Kuai wrote:
> Hi,
> 
> ? 2025/06/30 23:28, Jens Axboe ??:
>> On 6/30/25 9:24 AM, Jens Axboe wrote:
>>> On 6/30/25 5:28 AM, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
>>>> memory if necessary.
>>>>
>>>> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
>>>> it still should be held before xa_unlock(), prevent returned page to be
>>>> freed by concurrent discard.
>>>
>>> The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
>>> rcu read side locking protecting? Is it only needed around the lookup
>>> and insert? We even hold it over the kmap and copy, which seems very
>>> heavy handed.
>>
>> Gah it's holding the page alive too. Can't we just grab a ref to the
>> page when inserting it, and drop that at free time? It would be a lot
>> better to have only the lookup be RCU protected, having the full
>> copies under it seems kind of crazy.
> 
> In this case, we must grab a ref to the page for each read/write as
> well, I choose RCU because I think it has less performance overhead than
> page ref, which is atomic. BTW, I thought copy at most one page is
> lightweight, if this is not true, I agree page ref is better.

Right, you'd need to grab a ref. I do think that is (by far) the better
solution. Yes if you microbenchmark I'm sure the current approach will
look fine, but it's a heavy section inside an rcu read lock and will
hold off the grace period.

So yeah, I do think it'd be a lot better to do proper page references on
lookup+free, and have just the lookup be behind rcu.

-- 
Jens Axboe
Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Yu Kuai 3 months, 1 week ago
Hi,

在 2025/07/01 11:00, Jens Axboe 写道:
> On 6/30/25 7:28 PM, Yu Kuai wrote:
>> Hi,
>>
>> ? 2025/06/30 23:28, Jens Axboe ??:
>>> On 6/30/25 9:24 AM, Jens Axboe wrote:
>>>> On 6/30/25 5:28 AM, Yu Kuai wrote:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
>>>>> memory if necessary.
>>>>>
>>>>> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
>>>>> it still should be held before xa_unlock(), prevent returned page to be
>>>>> freed by concurrent discard.
>>>>
>>>> The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
>>>> rcu read side locking protecting? Is it only needed around the lookup
>>>> and insert? We even hold it over the kmap and copy, which seems very
>>>> heavy handed.
>>>
>>> Gah it's holding the page alive too. Can't we just grab a ref to the
>>> page when inserting it, and drop that at free time? It would be a lot
>>> better to have only the lookup be RCU protected, having the full
>>> copies under it seems kind of crazy.
>>
>> In this case, we must grab a ref to the page for each read/write as
>> well, I choose RCU because I think it has less performance overhead than
>> page ref, which is atomic. BTW, I thought copy at most one page is
>> lightweight, if this is not true, I agree page ref is better.
> 
> Right, you'd need to grab a ref. I do think that is (by far) the better
> solution. Yes if you microbenchmark I'm sure the current approach will
> look fine, but it's a heavy section inside an rcu read lock and will
> hold off the grace period.
> 
> So yeah, I do think it'd be a lot better to do proper page references on
> lookup+free, and have just the lookup be behind rcu.
> 

Ok, and just to be sure, since the rcu is introduced before the fixed
tag, do you think it's better to do cleanups after this patch, I prefer
this way, or fix this problem directly by page ref?

Thanks,
Kuai

Re: [PATCH v2] brd: fix sleeping function called from invalid context in brd_insert_page()
Posted by Jens Axboe 3 months, 1 week ago
On 7/1/25 1:38 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/01 11:00, Jens Axboe 写道:
>> On 6/30/25 7:28 PM, Yu Kuai wrote:
>>> Hi,
>>>
>>> ? 2025/06/30 23:28, Jens Axboe ??:
>>>> On 6/30/25 9:24 AM, Jens Axboe wrote:
>>>>> On 6/30/25 5:28 AM, Yu Kuai wrote:
>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>
>>>>>> __xa_cmpxchg() is called with rcu_read_lock(), and it will allocate
>>>>>> memory if necessary.
>>>>>>
>>>>>> Fix the problem by moving rcu_read_lock() after __xa_cmpxchg(), meanwhile,
>>>>>> it still should be held before xa_unlock(), prevent returned page to be
>>>>>> freed by concurrent discard.
>>>>>
>>>>> The rcu locking in there is a bit of a mess, imho. What _exactly_ is the
>>>>> rcu read side locking protecting? Is it only needed around the lookup
>>>>> and insert? We even hold it over the kmap and copy, which seems very
>>>>> heavy handed.
>>>>
>>>> Gah it's holding the page alive too. Can't we just grab a ref to the
>>>> page when inserting it, and drop that at free time? It would be a lot
>>>> better to have only the lookup be RCU protected, having the full
>>>> copies under it seems kind of crazy.
>>>
>>> In this case, we must grab a ref to the page for each read/write as
>>> well, I choose RCU because I think it has less performance overhead than
>>> page ref, which is atomic. BTW, I thought copy at most one page is
>>> lightweight, if this is not true, I agree page ref is better.
>>
>> Right, you'd need to grab a ref. I do think that is (by far) the better
>> solution. Yes if you microbenchmark I'm sure the current approach will
>> look fine, but it's a heavy section inside an rcu read lock and will
>> hold off the grace period.
>>
>> So yeah, I do think it'd be a lot better to do proper page references on
>> lookup+free, and have just the lookup be behind rcu.
>>
> 
> Ok, and just to be sure, since the rcu is introduced before the fixed
> tag, do you think it's better to do cleanups after this patch, I prefer
> this way, or fix this problem directly by page ref?

Yeah probably best to do the simple fix, and then base the further work
on that.

-- 
Jens Axboe