fs/erofs/zdata.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
scheduling while atomic was reported as below where the schedule_timeout
comes from too_many_isolated when doing direct_reclaim. Fix this by
masking GFP_DIRECT_RECLAIM from gfp.
[ 175.610416][ T618] BUG: scheduling while atomic: kworker/u16:6/618/0x00000000
[ 175.643480][ T618] CPU: 2 PID: 618 Comm: kworker/u16:6 Tainted: G
[ 175.645791][ T618] Workqueue: loop20 loop_workfn
[ 175.646394][ T618] Call trace:
[ 175.646785][ T618] dump_backtrace+0xf4/0x140
[ 175.647345][ T618] show_stack+0x20/0x2c
[ 175.647846][ T618] dump_stack_lvl+0x60/0x84
[ 175.648394][ T618] dump_stack+0x18/0x24
[ 175.648895][ T618] __schedule_bug+0x64/0x90
[ 175.649445][ T618] __schedule+0x680/0x9b8
[ 175.649970][ T618] schedule+0x130/0x1b0
[ 175.650470][ T618] schedule_timeout+0xac/0x1d0
[ 175.651050][ T618] schedule_timeout_uninterruptible+0x24/0x34
[ 175.651789][ T618] __alloc_pages_slowpath+0x8dc/0x121c
[ 175.652455][ T618] __alloc_pages+0x294/0x2fc
[ 175.653011][ T618] erofs_allocpage+0x48/0x58
[ 175.653572][ T618] z_erofs_runqueue+0x314/0x8a4
[ 175.654161][ T618] z_erofs_readahead+0x258/0x318
[ 175.654761][ T618] read_pages+0x88/0x394
[ 175.655275][ T618] page_cache_ra_unbounded+0x1cc/0x23c
[ 175.655939][ T618] page_cache_ra_order+0x27c/0x33c
[ 175.656559][ T618] ondemand_readahead+0x224/0x334
[ 175.657169][ T618] page_cache_async_ra+0x60/0x9c
[ 175.657767][ T618] filemap_get_pages+0x19c/0x7cc
[ 175.658367][ T618] filemap_read+0xf0/0x484
[ 175.658901][ T618] generic_file_read_iter+0x4c/0x15c
[ 175.659543][ T618] do_iter_read+0x224/0x348
[ 175.660100][ T618] vfs_iter_read+0x24/0x38
[ 175.660635][ T618] loop_process_work+0x408/0xa68
[ 175.661236][ T618] loop_workfn+0x28/0x34
[ 175.661751][ T618] process_scheduled_works+0x254/0x4e8
[ 175.662417][ T618] worker_thread+0x24c/0x33c
[ 175.662974][ T618] kthread+0x110/0x1b8
[ 175.663465][ T618] ret_from_fork+0x10/0x20
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
fs/erofs/zdata.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index d6fe002a4a71..0213c66d141b 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1486,7 +1486,7 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
folio_unlock(zbv.folio);
folio_put(zbv.folio);
out_allocfolio:
- page = erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL);
+ page = erofs_allocpage(&f->pagepool, (gfp | __GFP_NOFAIL) & ~__GFP_DIRECT_RECLAIM);
spin_lock(&pcl->obj.lockref.lock);
if (pcl->compressed_bvecs[nr].folio) {
erofs_pagepool_add(&f->pagepool, page);
--
2.25.1
On 2024/7/16 13:44, zhaoyang.huang wrote: > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > scheduling while atomic was reported as below where the schedule_timeout > comes from too_many_isolated when doing direct_reclaim. Fix this by > masking GFP_DIRECT_RECLAIM from gfp. > > [ 175.610416][ T618] BUG: scheduling while atomic: kworker/u16:6/618/0x00000000 > [ 175.643480][ T618] CPU: 2 PID: 618 Comm: kworker/u16:6 Tainted: G > [ 175.645791][ T618] Workqueue: loop20 loop_workfn > [ 175.646394][ T618] Call trace: > [ 175.646785][ T618] dump_backtrace+0xf4/0x140 > [ 175.647345][ T618] show_stack+0x20/0x2c > [ 175.647846][ T618] dump_stack_lvl+0x60/0x84 > [ 175.648394][ T618] dump_stack+0x18/0x24 > [ 175.648895][ T618] __schedule_bug+0x64/0x90 > [ 175.649445][ T618] __schedule+0x680/0x9b8 > [ 175.649970][ T618] schedule+0x130/0x1b0 > [ 175.650470][ T618] schedule_timeout+0xac/0x1d0 > [ 175.651050][ T618] schedule_timeout_uninterruptible+0x24/0x34 > [ 175.651789][ T618] __alloc_pages_slowpath+0x8dc/0x121c > [ 175.652455][ T618] __alloc_pages+0x294/0x2fc > [ 175.653011][ T618] erofs_allocpage+0x48/0x58 > [ 175.653572][ T618] z_erofs_runqueue+0x314/0x8a4 > [ 175.654161][ T618] z_erofs_readahead+0x258/0x318 > [ 175.654761][ T618] read_pages+0x88/0x394 > [ 175.655275][ T618] page_cache_ra_unbounded+0x1cc/0x23c > [ 175.655939][ T618] page_cache_ra_order+0x27c/0x33c > [ 175.656559][ T618] ondemand_readahead+0x224/0x334 > [ 175.657169][ T618] page_cache_async_ra+0x60/0x9c > [ 175.657767][ T618] filemap_get_pages+0x19c/0x7cc > [ 175.658367][ T618] filemap_read+0xf0/0x484 > [ 175.658901][ T618] generic_file_read_iter+0x4c/0x15c > [ 175.659543][ T618] do_iter_read+0x224/0x348 > [ 175.660100][ T618] vfs_iter_read+0x24/0x38 > [ 175.660635][ T618] loop_process_work+0x408/0xa68 > [ 175.661236][ T618] loop_workfn+0x28/0x34 > [ 175.661751][ T618] process_scheduled_works+0x254/0x4e8 > [ 175.662417][ T618] worker_thread+0x24c/0x33c > [ 175.662974][ T618] kthread+0x110/0x1b8 > [ 175.663465][ T618] ret_from_fork+0x10/0x20 > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> I don't see why it's an atomic context, so this patch is incorrect. Thanks, Gao Xiang
On Tue, Jul 16, 2024 at 1:50 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > > > > On 2024/7/16 13:44, zhaoyang.huang wrote: > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > scheduling while atomic was reported as below where the schedule_timeout > > comes from too_many_isolated when doing direct_reclaim. Fix this by > > masking GFP_DIRECT_RECLAIM from gfp. > > > > [ 175.610416][ T618] BUG: scheduling while atomic: kworker/u16:6/618/0x00000000 > > [ 175.643480][ T618] CPU: 2 PID: 618 Comm: kworker/u16:6 Tainted: G > > [ 175.645791][ T618] Workqueue: loop20 loop_workfn > > [ 175.646394][ T618] Call trace: > > [ 175.646785][ T618] dump_backtrace+0xf4/0x140 > > [ 175.647345][ T618] show_stack+0x20/0x2c > > [ 175.647846][ T618] dump_stack_lvl+0x60/0x84 > > [ 175.648394][ T618] dump_stack+0x18/0x24 > > [ 175.648895][ T618] __schedule_bug+0x64/0x90 > > [ 175.649445][ T618] __schedule+0x680/0x9b8 > > [ 175.649970][ T618] schedule+0x130/0x1b0 > > [ 175.650470][ T618] schedule_timeout+0xac/0x1d0 > > [ 175.651050][ T618] schedule_timeout_uninterruptible+0x24/0x34 > > [ 175.651789][ T618] __alloc_pages_slowpath+0x8dc/0x121c > > [ 175.652455][ T618] __alloc_pages+0x294/0x2fc > > [ 175.653011][ T618] erofs_allocpage+0x48/0x58 > > [ 175.653572][ T618] z_erofs_runqueue+0x314/0x8a4 > > [ 175.654161][ T618] z_erofs_readahead+0x258/0x318 > > [ 175.654761][ T618] read_pages+0x88/0x394 > > [ 175.655275][ T618] page_cache_ra_unbounded+0x1cc/0x23c > > [ 175.655939][ T618] page_cache_ra_order+0x27c/0x33c > > [ 175.656559][ T618] ondemand_readahead+0x224/0x334 > > [ 175.657169][ T618] page_cache_async_ra+0x60/0x9c > > [ 175.657767][ T618] filemap_get_pages+0x19c/0x7cc > > [ 175.658367][ T618] filemap_read+0xf0/0x484 > > [ 175.658901][ T618] generic_file_read_iter+0x4c/0x15c > > [ 175.659543][ T618] do_iter_read+0x224/0x348 > > [ 175.660100][ T618] vfs_iter_read+0x24/0x38 > > [ 175.660635][ T618] loop_process_work+0x408/0xa68 > > [ 175.661236][ T618] loop_workfn+0x28/0x34 > > [ 175.661751][ T618] process_scheduled_works+0x254/0x4e8 > > [ 175.662417][ T618] worker_thread+0x24c/0x33c > > [ 175.662974][ T618] kthread+0x110/0x1b8 > > [ 175.663465][ T618] ret_from_fork+0x10/0x20 > > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > I don't see why it's an atomic context, > so this patch is incorrect. Sorry, I should provide more details. page_cache_ra_unbounded() will call filemap_invalidate_lock_shared(mapping) to ensure the integrity of page cache during readahead, which will disable preempt. > > Thanks, > Gao Xiang
On 2024/7/16 14:14, Zhaoyang Huang wrote: > On Tue, Jul 16, 2024 at 1:50 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >> >> >> >> On 2024/7/16 13:44, zhaoyang.huang wrote: >>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >>> >>> scheduling while atomic was reported as below where the schedule_timeout >>> comes from too_many_isolated when doing direct_reclaim. Fix this by >>> masking GFP_DIRECT_RECLAIM from gfp. >>> >>> [ 175.610416][ T618] BUG: scheduling while atomic: kworker/u16:6/618/0x00000000 >>> [ 175.643480][ T618] CPU: 2 PID: 618 Comm: kworker/u16:6 Tainted: G >>> [ 175.645791][ T618] Workqueue: loop20 loop_workfn >>> [ 175.646394][ T618] Call trace: >>> [ 175.646785][ T618] dump_backtrace+0xf4/0x140 >>> [ 175.647345][ T618] show_stack+0x20/0x2c >>> [ 175.647846][ T618] dump_stack_lvl+0x60/0x84 >>> [ 175.648394][ T618] dump_stack+0x18/0x24 >>> [ 175.648895][ T618] __schedule_bug+0x64/0x90 >>> [ 175.649445][ T618] __schedule+0x680/0x9b8 >>> [ 175.649970][ T618] schedule+0x130/0x1b0 >>> [ 175.650470][ T618] schedule_timeout+0xac/0x1d0 >>> [ 175.651050][ T618] schedule_timeout_uninterruptible+0x24/0x34 >>> [ 175.651789][ T618] __alloc_pages_slowpath+0x8dc/0x121c >>> [ 175.652455][ T618] __alloc_pages+0x294/0x2fc >>> [ 175.653011][ T618] erofs_allocpage+0x48/0x58 >>> [ 175.653572][ T618] z_erofs_runqueue+0x314/0x8a4 >>> [ 175.654161][ T618] z_erofs_readahead+0x258/0x318 >>> [ 175.654761][ T618] read_pages+0x88/0x394 >>> [ 175.655275][ T618] page_cache_ra_unbounded+0x1cc/0x23c >>> [ 175.655939][ T618] page_cache_ra_order+0x27c/0x33c >>> [ 175.656559][ T618] ondemand_readahead+0x224/0x334 >>> [ 175.657169][ T618] page_cache_async_ra+0x60/0x9c >>> [ 175.657767][ T618] filemap_get_pages+0x19c/0x7cc >>> [ 175.658367][ T618] filemap_read+0xf0/0x484 >>> [ 175.658901][ T618] generic_file_read_iter+0x4c/0x15c >>> [ 175.659543][ T618] do_iter_read+0x224/0x348 >>> [ 175.660100][ T618] vfs_iter_read+0x24/0x38 >>> [ 175.660635][ T618] loop_process_work+0x408/0xa68 >>> [ 175.661236][ T618] loop_workfn+0x28/0x34 >>> [ 175.661751][ T618] process_scheduled_works+0x254/0x4e8 >>> [ 175.662417][ T618] worker_thread+0x24c/0x33c >>> [ 175.662974][ T618] kthread+0x110/0x1b8 >>> [ 175.663465][ T618] ret_from_fork+0x10/0x20 >>> >>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> >> >> I don't see why it's an atomic context, >> so this patch is incorrect. > Sorry, I should provide more details. page_cache_ra_unbounded() will > call filemap_invalidate_lock_shared(mapping) to ensure the integrity > of page cache during readahead, which will disable preempt. Why a rwsem sleepable lock disable preemption? .readahead context should be always non-atomic context, which is applied to all kernel filesystems. Thanks, Gao Xiang >> >> Thanks, >> Gao Xiang
On Tue, Jul 16, 2024 at 2:20 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > > > > On 2024/7/16 14:14, Zhaoyang Huang wrote: > > On Tue, Jul 16, 2024 at 1:50 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > >> > >> > >> > >> On 2024/7/16 13:44, zhaoyang.huang wrote: > >>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >>> > >>> scheduling while atomic was reported as below where the schedule_timeout > >>> comes from too_many_isolated when doing direct_reclaim. Fix this by > >>> masking GFP_DIRECT_RECLAIM from gfp. > >>> > >>> [ 175.610416][ T618] BUG: scheduling while atomic: kworker/u16:6/618/0x00000000 > >>> [ 175.643480][ T618] CPU: 2 PID: 618 Comm: kworker/u16:6 Tainted: G > >>> [ 175.645791][ T618] Workqueue: loop20 loop_workfn > >>> [ 175.646394][ T618] Call trace: > >>> [ 175.646785][ T618] dump_backtrace+0xf4/0x140 > >>> [ 175.647345][ T618] show_stack+0x20/0x2c > >>> [ 175.647846][ T618] dump_stack_lvl+0x60/0x84 > >>> [ 175.648394][ T618] dump_stack+0x18/0x24 > >>> [ 175.648895][ T618] __schedule_bug+0x64/0x90 > >>> [ 175.649445][ T618] __schedule+0x680/0x9b8 > >>> [ 175.649970][ T618] schedule+0x130/0x1b0 > >>> [ 175.650470][ T618] schedule_timeout+0xac/0x1d0 > >>> [ 175.651050][ T618] schedule_timeout_uninterruptible+0x24/0x34 > >>> [ 175.651789][ T618] __alloc_pages_slowpath+0x8dc/0x121c > >>> [ 175.652455][ T618] __alloc_pages+0x294/0x2fc > >>> [ 175.653011][ T618] erofs_allocpage+0x48/0x58 > >>> [ 175.653572][ T618] z_erofs_runqueue+0x314/0x8a4 > >>> [ 175.654161][ T618] z_erofs_readahead+0x258/0x318 > >>> [ 175.654761][ T618] read_pages+0x88/0x394 > >>> [ 175.655275][ T618] page_cache_ra_unbounded+0x1cc/0x23c > >>> [ 175.655939][ T618] page_cache_ra_order+0x27c/0x33c > >>> [ 175.656559][ T618] ondemand_readahead+0x224/0x334 > >>> [ 175.657169][ T618] page_cache_async_ra+0x60/0x9c > >>> [ 175.657767][ T618] filemap_get_pages+0x19c/0x7cc > >>> [ 175.658367][ T618] filemap_read+0xf0/0x484 > >>> [ 175.658901][ T618] generic_file_read_iter+0x4c/0x15c > >>> [ 175.659543][ T618] do_iter_read+0x224/0x348 > >>> [ 175.660100][ T618] vfs_iter_read+0x24/0x38 > >>> [ 175.660635][ T618] loop_process_work+0x408/0xa68 > >>> [ 175.661236][ T618] loop_workfn+0x28/0x34 > >>> [ 175.661751][ T618] process_scheduled_works+0x254/0x4e8 > >>> [ 175.662417][ T618] worker_thread+0x24c/0x33c > >>> [ 175.662974][ T618] kthread+0x110/0x1b8 > >>> [ 175.663465][ T618] ret_from_fork+0x10/0x20 > >>> > >>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > >> > >> I don't see why it's an atomic context, > >> so this patch is incorrect. > > Sorry, I should provide more details. page_cache_ra_unbounded() will > > call filemap_invalidate_lock_shared(mapping) to ensure the integrity > > of page cache during readahead, which will disable preempt. > > Why a rwsem sleepable lock disable preemption? emm, that's the original design of down_read() > context should be always non-atomic context, which is applied > to all kernel filesystems. AFAICT, filemap_fault/read have added the folios of readahead to page cache which means the aops->readahead basically just need to map the block to this folios and then launch the bio. The erofs is a little bit different to others as it has to alloc_pages for decompression when doing this. > > Thanks, > Gao Xiang > > >> > >> Thanks, > >> Gao Xiang
On 2024/7/16 14:43, Zhaoyang Huang wrote: > On Tue, Jul 16, 2024 at 2:20 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >> >> >> ... >>>> >>>> I don't see why it's an atomic context, >>>> so this patch is incorrect. >>> Sorry, I should provide more details. page_cache_ra_unbounded() will >>> call filemap_invalidate_lock_shared(mapping) to ensure the integrity >>> of page cache during readahead, which will disable preempt. >> >> Why a rwsem sleepable lock disable preemption? > emm, that's the original design of down_read() No. > >> context should be always non-atomic context, which is applied >> to all kernel filesystems. > AFAICT, filemap_fault/read have added the folios of readahead to page > cache which means the aops->readahead basically just need to map the > block to this folios and then launch the bio. The erofs is a little > bit different to others as it has to alloc_pages for decompression > when doing this. Interesting. The whole .readahead is sleepable, including submit block I/Os to storage. Nacked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Thanks, Gao Xiang
On 2024/7/16 14:46, Gao Xiang wrote: > > > On 2024/7/16 14:43, Zhaoyang Huang wrote: >> On Tue, Jul 16, 2024 at 2:20 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >>> >>> >>> > > ... > >>>>> >>>>> I don't see why it's an atomic context, >>>>> so this patch is incorrect. >>>> Sorry, I should provide more details. page_cache_ra_unbounded() will >>>> call filemap_invalidate_lock_shared(mapping) to ensure the integrity >>>> of page cache during readahead, which will disable preempt. >>> >>> Why a rwsem sleepable lock disable preemption? >> emm, that's the original design of down_read() > > No. > >> >>> context should be always non-atomic context, which is applied >>> to all kernel filesystems. >> AFAICT, filemap_fault/read have added the folios of readahead to page >> cache which means the aops->readahead basically just need to map the >> block to this folios and then launch the bio. The erofs is a little >> bit different to others as it has to alloc_pages for decompression >> when doing this. > > Interesting. The whole .readahead is sleepable, including > submit block I/Os to storage. Also, please don't imagine your stack trace if it's a non-upstream kernel. > > Nacked-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > Thanks, > Gao Xiang
On Tue, Jul 16, 2024 at 2:50 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > > > > On 2024/7/16 14:46, Gao Xiang wrote: > > > > > > On 2024/7/16 14:43, Zhaoyang Huang wrote: > >> On Tue, Jul 16, 2024 at 2:20 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > >>> > >>> > >>> > > > > ... > > > >>>>> > >>>>> I don't see why it's an atomic context, > >>>>> so this patch is incorrect. > >>>> Sorry, I should provide more details. page_cache_ra_unbounded() will > >>>> call filemap_invalidate_lock_shared(mapping) to ensure the integrity > >>>> of page cache during readahead, which will disable preempt. > >>> > >>> Why a rwsem sleepable lock disable preemption? > >> emm, that's the original design of down_read() > > > > No. > > > >> > >>> context should be always non-atomic context, which is applied > >>> to all kernel filesystems. > >> AFAICT, filemap_fault/read have added the folios of readahead to page > >> cache which means the aops->readahead basically just need to map the > >> block to this folios and then launch the bio. The erofs is a little > >> bit different to others as it has to alloc_pages for decompression > >> when doing this. > > > > Interesting. The whole .readahead is sleepable, including > > submit block I/Os to storage. > > Also, please don't imagine your stack trace if it's a non-upstream > kernel. ok, it should be caused by a vendor hook function of the android system. sorry for interrupting by my stupid. > > > > > Nacked-by: Gao Xiang <hsiangkao@linux.alibaba.com> > > > > Thanks, > > Gao Xiang
On 2024/7/16 15:41, Zhaoyang Huang wrote: > On Tue, Jul 16, 2024 at 2:50 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >> >> >> >> On 2024/7/16 14:46, Gao Xiang wrote: >>> >>> >>> On 2024/7/16 14:43, Zhaoyang Huang wrote: >>>> On Tue, Jul 16, 2024 at 2:20 PM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >>>>> >>>>> >>>>> >>> >>> ... >>> >>>>>>> >>>>>>> I don't see why it's an atomic context, >>>>>>> so this patch is incorrect. >>>>>> Sorry, I should provide more details. page_cache_ra_unbounded() will >>>>>> call filemap_invalidate_lock_shared(mapping) to ensure the integrity >>>>>> of page cache during readahead, which will disable preempt. >>>>> >>>>> Why a rwsem sleepable lock disable preemption? >>>> emm, that's the original design of down_read() >>> >>> No. >>> >>>> >>>>> context should be always non-atomic context, which is applied >>>>> to all kernel filesystems. >>>> AFAICT, filemap_fault/read have added the folios of readahead to page >>>> cache which means the aops->readahead basically just need to map the >>>> block to this folios and then launch the bio. The erofs is a little >>>> bit different to others as it has to alloc_pages for decompression >>>> when doing this. >>> >>> Interesting. The whole .readahead is sleepable, including >>> submit block I/Os to storage. >> >> Also, please don't imagine your stack trace if it's a non-upstream >> kernel. > ok, it should be caused by a vendor hook function of the android > system. sorry for interrupting by my stupid. okay, thanks for confirmation. Also more words may be useful here: Note that .readahead doesn't just map the block to this folios. Even an uncompressed fs could allocate/read (submit+wait) meta folio/blocks to get the block mapping from these meta blocks and sleep in this context. Thanks, Gao Xiang
© 2016 - 2025 Red Hat, Inc.