[PATCH] brd: fix sleeping memory allocation in brd_insert_page()

Tetsuo Handa posted 1 patch 3 months, 1 week ago
drivers/block/brd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] brd: fix sleeping memory allocation in brd_insert_page()
Posted by Tetsuo Handa 3 months, 1 week ago
syzbot is reporting that brd_insert_page() is calling
__xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.
Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
that __xa_cmpxchg() succeeds because of preceding alloc_page().

Fixes: bbcacab2e8ee ("brd: avoid extra xarray lookups on first write")
Reported-by: syzbot+ea4c8fd177a47338881a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ea4c8fd177a47338881a
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/block/brd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b1be6c510372..ed3eb931750c 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -70,7 +70,7 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector,
 
 	xa_lock(&brd->brd_pages);
 	ret = __xa_cmpxchg(&brd->brd_pages, sector >> PAGE_SECTORS_SHIFT, NULL,
-			page, gfp);
+			page, GFP_NOWAIT | __GFP_NOWARN);
 	if (ret) {
 		xa_unlock(&brd->brd_pages);
 		__free_page(page);
-- 
2.50.0
Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
Posted by Christoph Hellwig 3 months, 1 week ago
I think the correct fix is "brd: fix leeping function called from invalid
context in brd_insert_page()" from Yu Kuai.  Please take a look at that
and double check it, though.
Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
Posted by Tetsuo Handa 3 months, 1 week ago
On 2025/06/28 17:36, Tetsuo Handa wrote:
> syzbot is reporting that brd_insert_page() is calling
> __xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.

Hmm. Holding spinlock itself is OK because xa_lock() is a requirement.

> Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
> that __xa_cmpxchg() succeeds because of preceding alloc_page().

Since this gfp flag is for allocating index array, it should use
__GFP_DIRECT_RECLAIM if possible. Then, deferring RCU lock if possible
makes sense. Then, I wonder what this RCU lock is protecting...
Re: [PATCH] brd: fix sleeping memory allocation in brd_insert_page()
Posted by Tetsuo Handa 3 months, 1 week ago
On 2025/06/28 18:39, Tetsuo Handa wrote:
> On 2025/06/28 17:36, Tetsuo Handa wrote:
>> syzbot is reporting that brd_insert_page() is calling
>> __xa_cmpxchg(__GFP_DIRECT_RECLAIM) with spinlock and RCU lock held.
> 
> Hmm. Holding spinlock itself is OK because xa_lock() is a requirement.
> 
>> Change __xa_cmpxchg() to use GFP_NOWAIT | __GFP_NOWARN, for it is likely
>> that __xa_cmpxchg() succeeds because of preceding alloc_page().
> 
> Since this gfp flag is for allocating index array, it should use
> __GFP_DIRECT_RECLAIM if possible. Then, deferring RCU lock if possible
> makes sense. Then, I wonder what this RCU lock is protecting...
> 

OK. I assume that the "concurrent discard" in
https://lkml.kernel.org/20250628011459.832760-1-yukuai1@huaweicloud.com means
brd_do_discard().

Calling rcu_read_lock() from brd_insert_page() before xa_unlock() is called prevents
__free_page() from brd_free_one_page() from call_rcu() from brd_do_discard(), even if
the page allocated by alloc_page() and stored into brd->brd_pages by __xa_cmpxchg() is
removed by __xa_erase() before brd_rw_bvec() calls memcpy_{to,from}_page()/memset();
allowing brd_rw_bvec() to continue using the page returned by brd_insert_page().

I came to worry one possibility about the above expectation, for I don't know
details of xarray.

__xa_cmpxchg() calls __xa_cmpxchg_raw() with xa_lock already held.
__xa_cmpxchg_raw() always calls __xas_nomem() with xa_lock already held.
__xas_nomem() might temporarily release xa_lock for allocating memory if
__GFP_DIRECT_RECLAIM is specified.
__xa_cmpxchg_raw() might store "entry" at xas_store() before calling __xas_nomem().

Then, is there a possibility that __xas_nomem() temporarily releases xa_lock for
allocating memory after__xa_cmpxchg_raw() already called xas_store() ?
Unless there is a guarantee that __xas_nomem() never releases xa_lock if
__xa_cmpxchg_raw() called xa_store(), there will be a race window that
the page allocated by alloc_page() and stored into brd->brd_pages by __xa_cmpxchg() is
removed by __xa_erase() from brd_do_discard() and __free_page() from brd_free_one_page()
 from call_rcu() from brd_do_discard() is fired before brd_insert_page() calls rcu_lock()
immediately after returning from __xa_cmpxchg().

Also, what serializes concurrent brd_insert_page(), for when __xas_nomem() temporarily
released xa_lock for allocating memory, two threads might concurrently call
kmem_cache_alloc_lru() from __xas_nomem() ?