From: Yu Kuai <yukuai3@huawei.com>
While monitoring the throttle time of IO from iocost, it's found that
such time is always zero after the io_schedule() from ioc_rqos_throttle,
for example, with the following debug patch:
+ printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (wait.committed)
break;
io_schedule();
}
+ printk("%s-%d: %s exit %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
It can be observerd that blk_time_get_ns() always return the same time:
[ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.272587] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.472690] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.672656] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.872655] fio-1268: ioc_rqos_throttle exit 1067901962288
And I think the root cause is that 'PF_BLOCK_TS' is always cleared
by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
will never be called:
blk_time_get_ns
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;
io_schedule:
io_schedule_prepare
blk_flush_plug
__blk_flush_plug
/* the flag is cleared, while time is not */
current->flags &= ~PF_BLOCK_TS;
schedule
sched_update_worker
/* the flag is not set, hence plug->cur_ktime is not cleared */
if (tsk->flags & PF_BLOCK_TS)
blk_plug_invalidate_ts()
blk_time_get_ns
/* got the time stashed before schedule */
return plug->cur_ktime;
Fix the problem by clearing cached time in __blk_flush_plug().
Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e317d7bc0696 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
if (unlikely(!rq_list_empty(plug->cached_rq)))
blk_mq_free_plug_rqs(plug);
+ plug->cur_ktime = 0;
current->flags &= ~PF_BLOCK_TS;
}
--
2.39.2
On 4/10/24 9:23 PM, Yu Kuai wrote: > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..e317d7bc0696 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule) > if (unlikely(!rq_list_empty(plug->cached_rq))) > blk_mq_free_plug_rqs(plug); > > + plug->cur_ktime = 0; > current->flags &= ~PF_BLOCK_TS; > } We can just use blk_plug_invalidate_ts() here, but not really important. I think this one should go into 6.9, and patch 2 should go into 6.10, however. -- Jens Axboe
Hi, 在 2024/04/12 0:44, Jens Axboe 写道: > On 4/10/24 9:23 PM, Yu Kuai wrote: >> diff --git a/block/blk-core.c b/block/blk-core.c >> index a16b5abdbbf5..e317d7bc0696 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule) >> if (unlikely(!rq_list_empty(plug->cached_rq))) >> blk_mq_free_plug_rqs(plug); >> >> + plug->cur_ktime = 0; >> current->flags &= ~PF_BLOCK_TS; >> } > > We can just use blk_plug_invalidate_ts() here, but not really important. > I think this one should go into 6.9, and patch 2 should go into 6.10, > however. This sounds great! Do you want me to update and send them separately? Thanks, Kuai >
On 4/11/24 7:24 PM, Yu Kuai wrote: > Hi, > > ? 2024/04/12 0:44, Jens Axboe ??: >> On 4/10/24 9:23 PM, Yu Kuai wrote: >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index a16b5abdbbf5..e317d7bc0696 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -1195,6 +1195,7 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule) >>> if (unlikely(!rq_list_empty(plug->cached_rq))) >>> blk_mq_free_plug_rqs(plug); >>> + plug->cur_ktime = 0; >>> current->flags &= ~PF_BLOCK_TS; >>> } >> >> We can just use blk_plug_invalidate_ts() here, but not really important. >> I think this one should go into 6.9, and patch 2 should go into 6.10, >> however. > > This sounds great! Do you want me to update and send them separately? I've applied 1/2 separately, so just resend 2/2 when -rc4 has been tagged and I'll get that one queued for 6.10. -- Jens Axboe
© 2016 - 2026 Red Hat, Inc.