Writeback loop can run for quite a while (depending on
wb device performance, compression algorithm and the
number of entries we writeback), so we need to do
cond_resched() there, similarly to what we do in
recompress loop.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5de7f30d0aa6..172546aa1fce 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
next:
zram_slot_unlock(zram, index);
release_pp_slot(zram, pps);
+
+ cond_resched();
}
if (blk_idx)
--
2.47.1.613.gc27f4b7a9f-goog
On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > Writeback loop can run for quite a while (depending on > wb device performance, compression algorithm and the > number of entries we writeback), so we need to do > cond_resched() there, similarly to what we do in > recompress loop. > > ... > > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev, > next: > zram_slot_unlock(zram, index); > release_pp_slot(zram, pps); > + > + cond_resched(); > } > > if (blk_idx) Should this be treated as a hotfix? With a -stable backport? If so, we'd need to explain our reasoning in the changelog. "Fixes a watchdog lockup splat when running <workload>". And a Fixes: would be nice if appropriate.
On (24/12/10 16:54), Andrew Morton wrote: > On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > > Writeback loop can run for quite a while (depending on > > wb device performance, compression algorithm and the > > number of entries we writeback), so we need to do > > cond_resched() there, similarly to what we do in > > recompress loop. > > > > ... > > > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev, > > next: > > zram_slot_unlock(zram, index); > > release_pp_slot(zram, pps); > > + > > + cond_resched(); > > } > > > > if (blk_idx) > > Should this be treated as a hotfix? With a -stable backport? Actually... can I please ask you to drop this [1] particular patch for now? The stall should not happen, because submit_bio_wait() is a rescheduling point (in blk_wait_io()). So I'm not sure why I'm seeing unhappy watchdogs. [1] https://lore.kernel.org/mm-commits/20241211005510.842DFC4CED6@smtp.kernel.org
On (24/12/11 13:11), Sergey Senozhatsky wrote:
> On (24/12/10 16:54), Andrew Morton wrote:
> > On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> >
> > > Writeback loop can run for quite a while (depending on
> > > wb device performance, compression algorithm and the
> > > number of entries we writeback), so we need to do
> > > cond_resched() there, similarly to what we do in
> > > recompress loop.
> > >
> > > ...
> > >
> > > --- a/drivers/block/zram/zram_drv.c
> > > +++ b/drivers/block/zram/zram_drv.c
> > > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
> > > next:
> > > zram_slot_unlock(zram, index);
> > > release_pp_slot(zram, pps);
> > > +
> > > + cond_resched();
> > > }
> > >
> > > if (blk_idx)
> >
> > Should this be treated as a hotfix? With a -stable backport?
>
> Actually... can I please ask you to drop this [1] particular patch for
> now? The stall should not happen, because submit_bio_wait() is a
> rescheduling point (in blk_wait_io()). So I'm not sure why I'm seeing
> unhappy watchdogs.
OK, so. submit_bio_wait() is not necessarily a rescheduling point.
By the time it calls blk_wait_io() the I/O can already be completed
so it won't schedule(). Why would I/O be completed is another story.
For instance, the backing device may have BD_HAS_SUBMIT_BIO bit set
so __submit_bio() would call disk->fops->submit_bio(bio) on the backing
device directly. So on such setups we end up in a loop
for_each (target slot) {
decompress slot
submit bio
disk->fops->submit_bio
}
without rescheduling.
On (24/12/10 16:54), Andrew Morton wrote: > > Writeback loop can run for quite a while (depending on > > wb device performance, compression algorithm and the > > number of entries we writeback), so we need to do > > cond_resched() there, similarly to what we do in > > recompress loop. > > > > ... > > > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev, > > next: > > zram_slot_unlock(zram, index); > > release_pp_slot(zram, pps); > > + > > + cond_resched(); > > } > > > > if (blk_idx) > > Should this be treated as a hotfix? With a -stable backport? > > If so, we'd need to explain our reasoning in the changelog. "Fixes a > watchdog lockup splat when running <workload>". And a Fixes: would be > nice if appropriate. Good point. This fixes commit from 2018, I guess no one runs writebacks on preempt-none systems, but I don't see why this should be in -stable. I'll send updated patch in a bit.
On (24/12/11 12:43), Sergey Senozhatsky wrote:
> > Should this be treated as a hotfix? With a -stable backport?
> >
> > If so, we'd need to explain our reasoning in the changelog. "Fixes a
> > watchdog lockup splat when running <workload>". And a Fixes: would be
> > nice if appropriate.
>
> Good point. This fixes commit from 2018, I guess no one runs writebacks
> on preempt-none systems, but I don't see why this should be in -stable.
^^ should not be in -stable
> I'll send updated patch in a bit.
© 2016 - 2025 Red Hat, Inc.