include/linux/shmem_fs.h | 2 +- mm/filemap.c | 34 ++++++++++++------ mm/khugepaged.c | 2 +- mm/shmem.c | 75 +++++++++++++++++++++++++++++----------- 4 files changed, 79 insertions(+), 34 deletions(-)
From: Chi Zhiling <chizhiling@kylinos.cn> This series improves shmem read performance by implementing folio batching in the read path and reducing unnecessary xarray lookups. Changes since RFC ================= The RFC version used xas_for_each() in shmem_get_read_batch(), which introduced about a 1% regression for 4K read workloads. This v1 addresses the regression in patch 2 by switching to filemap_get_folios_contig() and optimizing it to avoid the extra xarray traversal overhead. Performance Results =================== Testing was performed with fio sequential read workloads: fio --ioengine=sync --rw=read --size=1G --runtime=180 ### THP Disabled - Normal Files ### | Block Size | Baseline | v1 | Improvement | | ---------- | --------- | --------- | ----------- | | 1M | 11.4GiB/s | 12.7GiB/s | +11.4% | | 64k | 11.2GiB/s | 12.2GiB/s | +8.9% | | 4k | 3809MiB/s | 3838MiB/s | +0.8% | ### THP Disabled - Fallocated Files ### | Block Size | Baseline | v1 | Improvement | | ---------- | --------- | --------- | ----------- | | 1M | 23.7GiB/s | 28.7GiB/s | +21.1% | | 64k | 22.6GiB/s | 27.0GiB/s | +19.5% | | 4k | 4668MiB/s | 4678MiB/s | +0.2% | ### THP Enabled - Normal Files ### | Block Size | Baseline | v1 | Improvement | | ---------- | --------- | --------- | ----------- | | 1M | 13.9GiB/s | 13.9GiB/s | 0% | | 64k | 13.4GiB/s | 13.4GiB/s | 0% | | 4k | 3818MiB/s | 3836MiB/s | +0.5% | ### THP Enabled - Fallocated Files ### | Block Size | Baseline | v1 | Improvement | | ---------- | --------- | --------- | ----------- | | 1M | 24.1GiB/s | 34.9GiB/s | +44.8% | | 64k | 22.9GiB/s | 31.3GiB/s | +36.7% | | 4k | 4721MiB/s | 4708MiB/s | -0.3% | rfc: https://lore.kernel.org/linux-fsdevel/20260515094702.1092355-1-chizhiling@163.com/ Chi Zhiling (5): mm/filemap: reduce unnecessary xarray lookups when read cached pages mm/filemap: reduce xarray lookups in filemap_get_folios_contig() mm/shmem: make SGP_NOALLOC succeed on hole like SGP_READ mm/shmem: introduce copy_zero_to_iter() for large zeroing mm/shmem: optimize file read with folio batching include/linux/shmem_fs.h | 2 +- mm/filemap.c | 34 ++++++++++++------ mm/khugepaged.c | 2 +- mm/shmem.c | 75 +++++++++++++++++++++++++++++----------- 4 files changed, 79 insertions(+), 34 deletions(-) -- 2.43.0
On Wed, 20 May 2026 18:15:33 +0800 Chi Zhiling <chizhiling@163.com> wrote: > From: Chi Zhiling <chizhiling@kylinos.cn> > > This series improves shmem read performance by implementing folio > batching in the read path and reducing unnecessary xarray lookups. > Thanks. > Performance Results > =================== > > Testing was performed with fio sequential read workloads: > > fio --ioengine=sync --rw=read --size=1G --runtime=180 > > > ### THP Disabled - Normal Files ### > > | Block Size | Baseline | v1 | Improvement | > | ---------- | --------- | --------- | ----------- | > | 1M | 11.4GiB/s | 12.7GiB/s | +11.4% | > | 64k | 11.2GiB/s | 12.2GiB/s | +8.9% | > | 4k | 3809MiB/s | 3838MiB/s | +0.8% | > > ### THP Disabled - Fallocated Files ### > > | Block Size | Baseline | v1 | Improvement | > | ---------- | --------- | --------- | ----------- | > | 1M | 23.7GiB/s | 28.7GiB/s | +21.1% | > | 64k | 22.6GiB/s | 27.0GiB/s | +19.5% | > | 4k | 4668MiB/s | 4678MiB/s | +0.2% | > > ### THP Enabled - Normal Files ### > > | Block Size | Baseline | v1 | Improvement | > | ---------- | --------- | --------- | ----------- | > | 1M | 13.9GiB/s | 13.9GiB/s | 0% | > | 64k | 13.4GiB/s | 13.4GiB/s | 0% | > | 4k | 3818MiB/s | 3836MiB/s | +0.5% | > > ### THP Enabled - Fallocated Files ### > > | Block Size | Baseline | v1 | Improvement | > | ---------- | --------- | --------- | ----------- | > | 1M | 24.1GiB/s | 34.9GiB/s | +44.8% | > | 64k | 22.9GiB/s | 31.3GiB/s | +36.7% | > | 4k | 4721MiB/s | 4708MiB/s | -0.3% | That looks nice. AI review might have found a few things: https://sashiko.dev/#/patchset/20260520101538.58745-1-chizhiling@163.com I'll skip the patchset for now (unreviewed v1!).
On 5/22/26 08:14, Andrew Morton wrote: > On Wed, 20 May 2026 18:15:33 +0800 Chi Zhiling <chizhiling@163.com> wrote: > >> From: Chi Zhiling <chizhiling@kylinos.cn> >> >> This series improves shmem read performance by implementing folio >> batching in the read path and reducing unnecessary xarray lookups. >> > > Thanks. > >> Performance Results >> =================== >> >> Testing was performed with fio sequential read workloads: >> >> fio --ioengine=sync --rw=read --size=1G --runtime=180 >> >> >> ### THP Disabled - Normal Files ### >> >> | Block Size | Baseline | v1 | Improvement | >> | ---------- | --------- | --------- | ----------- | >> | 1M | 11.4GiB/s | 12.7GiB/s | +11.4% | >> | 64k | 11.2GiB/s | 12.2GiB/s | +8.9% | >> | 4k | 3809MiB/s | 3838MiB/s | +0.8% | >> >> ### THP Disabled - Fallocated Files ### >> >> | Block Size | Baseline | v1 | Improvement | >> | ---------- | --------- | --------- | ----------- | >> | 1M | 23.7GiB/s | 28.7GiB/s | +21.1% | >> | 64k | 22.6GiB/s | 27.0GiB/s | +19.5% | >> | 4k | 4668MiB/s | 4678MiB/s | +0.2% | >> >> ### THP Enabled - Normal Files ### >> >> | Block Size | Baseline | v1 | Improvement | >> | ---------- | --------- | --------- | ----------- | >> | 1M | 13.9GiB/s | 13.9GiB/s | 0% | >> | 64k | 13.4GiB/s | 13.4GiB/s | 0% | >> | 4k | 3818MiB/s | 3836MiB/s | +0.5% | >> >> ### THP Enabled - Fallocated Files ### >> >> | Block Size | Baseline | v1 | Improvement | >> | ---------- | --------- | --------- | ----------- | >> | 1M | 24.1GiB/s | 34.9GiB/s | +44.8% | >> | 64k | 22.9GiB/s | 31.3GiB/s | +36.7% | >> | 4k | 4721MiB/s | 4708MiB/s | -0.3% | > > That looks nice. > > AI review might have found a few things: > https://sashiko.dev/#/patchset/20260520101538.58745-1-chizhiling@163.com > > I'll skip the patchset for now (unreviewed v1!). Okay, I will fix those issues in v2. Thanks!
© 2016 - 2026 Red Hat, Inc.