mm/readahead.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
From: Yu Kuai <yukuai3@huawei.com>
We have a workload of random 4k-128k read on a HDD, from iostat we observed
that average request size is 256k+ and bandwidth is 100MB+, this is because
readahead waste lots of disk bandwidth. Hence we disable readahead and
performance from user side is indeed much better(2x+), however, from
iostat we observed request size is just 4k and bandwidth is just around
40MB.
Then we do a simple dd test and found out if readahead is disabled,
page_cache_sync_ra() will force to read one page at a time, and this
really doesn't make sense because we can just issue user requested size
request to disk.
Fix this problem by removing the limit to read one page at a time from
page_cache_sync_ra(), this way the random read workload can get better
performance with readahead disabled.
PS: I'm not sure if I miss anything, so this version is RFC
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
mm/readahead.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 20d36d6b055e..1df85ccba575 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -561,13 +561,21 @@ void page_cache_sync_ra(struct readahead_control *ractl,
* Even if readahead is disabled, issue this request as readahead
* as we'll need it to satisfy the requested range. The forced
* readahead will do the right thing and limit the read to just the
- * requested range, which we'll set to 1 page for this case.
+ * requested range.
*/
- if (!ra->ra_pages || blk_cgroup_congested()) {
+ if (blk_cgroup_congested()) {
if (!ractl->file)
return;
+ /*
+ * If the cgroup is congested, ensure to do at least 1 page of
+ * readahead to make progress on the read.
+ */
req_count = 1;
do_forced_ra = true;
+ } else if (!ra->ra_pages) {
+ if (!ractl->file)
+ return;
+ do_forced_ra = true;
}
/* be dumb */
--
2.39.2
Hi, 在 2025/07/01 19:08, Yu Kuai 写道: > From: Yu Kuai <yukuai3@huawei.com> > > We have a workload of random 4k-128k read on a HDD, from iostat we observed > that average request size is 256k+ and bandwidth is 100MB+, this is because > readahead waste lots of disk bandwidth. Hence we disable readahead and > performance from user side is indeed much better(2x+), however, from > iostat we observed request size is just 4k and bandwidth is just around > 40MB. > > Then we do a simple dd test and found out if readahead is disabled, > page_cache_sync_ra() will force to read one page at a time, and this > really doesn't make sense because we can just issue user requested size > request to disk. > > Fix this problem by removing the limit to read one page at a time from > page_cache_sync_ra(), this way the random read workload can get better > performance with readahead disabled. > > PS: I'm not sure if I miss anything, so this version is RFC > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > mm/readahead.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > Friendly ping ... > diff --git a/mm/readahead.c b/mm/readahead.c > index 20d36d6b055e..1df85ccba575 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -561,13 +561,21 @@ void page_cache_sync_ra(struct readahead_control *ractl, > * Even if readahead is disabled, issue this request as readahead > * as we'll need it to satisfy the requested range. The forced > * readahead will do the right thing and limit the read to just the > - * requested range, which we'll set to 1 page for this case. > + * requested range. > */ > - if (!ra->ra_pages || blk_cgroup_congested()) { > + if (blk_cgroup_congested()) { > if (!ractl->file) > return; > + /* > + * If the cgroup is congested, ensure to do at least 1 page of > + * readahead to make progress on the read. > + */ > req_count = 1; > do_forced_ra = true; > + } else if (!ra->ra_pages) { > + if (!ractl->file) > + return; > + do_forced_ra = true; > } > > /* be dumb */ >
Hi, 在 2025/07/14 9:42, Yu Kuai 写道: > Hi, > > 在 2025/07/01 19:08, Yu Kuai 写道: >> From: Yu Kuai <yukuai3@huawei.com> >> >> We have a workload of random 4k-128k read on a HDD, from iostat we >> observed >> that average request size is 256k+ and bandwidth is 100MB+, this is >> because >> readahead waste lots of disk bandwidth. Hence we disable readahead and >> performance from user side is indeed much better(2x+), however, from >> iostat we observed request size is just 4k and bandwidth is just around >> 40MB. >> >> Then we do a simple dd test and found out if readahead is disabled, >> page_cache_sync_ra() will force to read one page at a time, and this >> really doesn't make sense because we can just issue user requested size >> request to disk. >> >> Fix this problem by removing the limit to read one page at a time from >> page_cache_sync_ra(), this way the random read workload can get better >> performance with readahead disabled. >> >> PS: I'm not sure if I miss anything, so this version is RFC >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> mm/readahead.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> > > Friendly ping ... > Friendly ping again ... >> diff --git a/mm/readahead.c b/mm/readahead.c >> index 20d36d6b055e..1df85ccba575 100644 >> --- a/mm/readahead.c >> +++ b/mm/readahead.c >> @@ -561,13 +561,21 @@ void page_cache_sync_ra(struct readahead_control >> *ractl, >> * Even if readahead is disabled, issue this request as readahead >> * as we'll need it to satisfy the requested range. The forced >> * readahead will do the right thing and limit the read to just the >> - * requested range, which we'll set to 1 page for this case. >> + * requested range. >> */ >> - if (!ra->ra_pages || blk_cgroup_congested()) { >> + if (blk_cgroup_congested()) { >> if (!ractl->file) >> return; >> + /* >> + * If the cgroup is congested, ensure to do at least 1 page of >> + * readahead to make progress on the read. >> + */ >> req_count = 1; >> do_forced_ra = true; >> + } else if (!ra->ra_pages) { >> + if (!ractl->file) >> + return; >> + do_forced_ra = true; >> } >> /* be dumb */ >> > > . >
© 2016 - 2025 Red Hat, Inc.