block/genhd.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
From: Yu Kuai <yukuai3@huawei.com>
While bdev_count_inflight is interating all cpus, if some IOs are issued
from traversed cpu and then completed from the cpu that is not traversed
yet:
cpu0
cpu1
bdev_count_inflight
//for_each_possible_cpu
// cpu0 is 0
infliht += 0
// issue a io
blk_account_io_start
// cpu0 inflight ++
cpu2
// the io is done
blk_account_io_done
// cpu2 inflight --
// cpu 1 is 0
inflight += 0
// cpu2 is -1
inflight += -1
...
In this case, the total inflight will be -1, causing lots of false
warning. Fix the problem by removing the warning.
Noted there is still a valid warning for nvme-mpath(From Yi) that is not
fixed yet.
Fixes: f5482ee5edb9 ("block: WARN if bdev inflight counter is negative")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#mae89155a5006463d0a21a4a2c35ae0034b26a339
Reported-and-tested-by: Calvin Owens <calvin@wbinvd.org>
Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#m1d935a00070bf95055d0ac84e6075158b08acaef
Reported-by: Dave Chinner <david@fromorbit.com>
Closes: https://lore.kernel.org/linux-block/aFuypjqCXo9-5_En@dread.disaster.area/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/genhd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index 8171a6bc3210..680fa717082f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -141,9 +141,14 @@ static void bdev_count_inflight_rw(struct block_device *part,
}
}
- if (WARN_ON_ONCE((int)inflight[READ] < 0))
+ /*
+ * While iterating all cpus, some IOs might issued from traversed cpu
+ * and then completed from the cpu that is not traversed yet, causing
+ * the inflight number to be negative.
+ */
+ if ((int)inflight[READ] < 0)
inflight[READ] = 0;
- if (WARN_ON_ONCE((int)inflight[WRITE] < 0))
+ if ((int)inflight[WRITE] < 0)
inflight[WRITE] = 0;
}
--
2.39.2
I think you want to make the inflight array a signed type instead, so that if the earlier CPUs have negative counts due to migration that gets even out later on. Which should also make the counter not trigger normally.
On 26/06/2025 09:39, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > While bdev_count_inflight is interating all cpus, if some IOs are issued > from traversed cpu and then completed from the cpu that is not traversed > yet: > > cpu0 > cpu1 > bdev_count_inflight > //for_each_possible_cpu > // cpu0 is 0 > infliht += 0 > // issue a io > blk_account_io_start > // cpu0 inflight ++ > > cpu2 > // the io is done > blk_account_io_done > // cpu2 inflight -- > // cpu 1 is 0 > inflight += 0 > // cpu2 is -1 > inflight += -1 > ... > > In this case, the total inflight will be -1, causing lots of false > warning. Fix the problem by removing the warning. Is it even safe to even use this function when not used for just informative purposes? I mean, for example, it is used by md code to check for idle state - could that check return an invalid result (and cause harm)? > > Noted there is still a valid warning for nvme-mpath(From Yi) that is not > fixed yet. > > Fixes: f5482ee5edb9 ("block: WARN if bdev inflight counter is negative") > Reported-by: Yi Zhang <yi.zhang@redhat.com> > Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/*mae89155a5006463d0a21a4a2c35ae0034b26a339__;Iw!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnq75I7_UI$ > Reported-and-tested-by: Calvin Owens <calvin@wbinvd.org> > Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/*m1d935a00070bf95055d0ac84e6075158b08acaef__;Iw!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnqYruhFG0$ > Reported-by: Dave Chinner <david@fromorbit.com> > Closes: https://urldefense.com/v3/__https://lore.kernel.org/linux-block/aFuypjqCXo9-5_En@dread.disaster.area/__;!!ACWV5N9M2RV99hQ!LLzonI0PgLV8uruViz5LkA_QGoFQSsfBMNDhb45qsRoJqxuTMcO_2BxJXhMOADfnwncgrR3o99lVDCnqj32KGls$ > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > block/genhd.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 8171a6bc3210..680fa717082f 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -141,9 +141,14 @@ static void bdev_count_inflight_rw(struct block_device *part, > } > } > > - if (WARN_ON_ONCE((int)inflight[READ] < 0)) > + /* > + * While iterating all cpus, some IOs might issued from traversed cpu > + * and then completed from the cpu that is not traversed yet, causing > + * the inflight number to be negative. > + */ > + if ((int)inflight[READ] < 0) > inflight[READ] = 0; > - if (WARN_ON_ONCE((int)inflight[WRITE] < 0)) > + if ((int)inflight[WRITE] < 0) > inflight[WRITE] = 0; > } >
Hi, 在 2025/06/26 17:34, John Garry 写道: > Is it even safe to even use this function when not used for just > informative purposes? I mean, for example, it is used by md code to > check for idle state - could that check return an invalid result (and > cause harm)? For md code, I think it's ok, md still use completed IO for idle state. More importantly, the io_ticks can be wrong due to inflight to be zero, however, it's inaccurate for a long time and there is no much we can do :( Thanks, Kuai
On 6/26/25 17:39, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > While bdev_count_inflight is interating all cpus, if some IOs are issued > from traversed cpu and then completed from the cpu that is not traversed > yet: > > cpu0 > cpu1 > bdev_count_inflight > //for_each_possible_cpu > // cpu0 is 0 > infliht += 0 > // issue a io > blk_account_io_start > // cpu0 inflight ++ > > cpu2 > // the io is done > blk_account_io_done > // cpu2 inflight -- > // cpu 1 is 0 > inflight += 0 > // cpu2 is -1 > inflight += -1 > ... > > In this case, the total inflight will be -1, causing lots of false > warning. Fix the problem by removing the warning. > > Noted there is still a valid warning for nvme-mpath(From Yi) that is not > fixed yet. > > Fixes: f5482ee5edb9 ("block: WARN if bdev inflight counter is negative") > Reported-by: Yi Zhang <yi.zhang@redhat.com> > Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#mae89155a5006463d0a21a4a2c35ae0034b26a339 > Reported-and-tested-by: Calvin Owens <calvin@wbinvd.org> > Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#m1d935a00070bf95055d0ac84e6075158b08acaef > Reported-by: Dave Chinner <david@fromorbit.com> > Closes: https://lore.kernel.org/linux-block/aFuypjqCXo9-5_En@dread.disaster.area/ > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > block/genhd.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 8171a6bc3210..680fa717082f 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -141,9 +141,14 @@ static void bdev_count_inflight_rw(struct block_device *part, > } > } > > - if (WARN_ON_ONCE((int)inflight[READ] < 0)) > + /* > + * While iterating all cpus, some IOs might issued from traversed cpu > + * and then completed from the cpu that is not traversed yet, causing > + * the inflight number to be negative. Nit (grammar): * While iterating all CPUs, some IOs may be issued from a CPU already * traversed and complete on a CPU that has not yet been traversed, * causing the inflight number to be negative. > + */ > + if ((int)inflight[READ] < 0) > inflight[READ] = 0; > - if (WARN_ON_ONCE((int)inflight[WRITE] < 0)) > + if ((int)inflight[WRITE] < 0) > inflight[WRITE] = 0; > } > -- Damien Le Moal Western Digital Research
Hi, 在 2025/06/26 16:54, Damien Le Moal 写道: > Nit (grammar): > > * While iterating all CPUs, some IOs may be issued from a CPU already > * traversed and complete on a CPU that has not yet been traversed, > * causing the inflight number to be negative. Thanks, will change to this in v2. Kuai
© 2016 - 2025 Red Hat, Inc.