[RFC 3/3] block: use mm_huge_zero_folio in __blkdev_issue_zero_pages()

Pankaj Raghav posted 3 patches 6 months, 3 weeks ago
There is a newer version of this series
[RFC 3/3] block: use mm_huge_zero_folio in __blkdev_issue_zero_pages()
Posted by Pankaj Raghav 6 months, 3 weeks ago
Use mm_huge_zero_folio in __blkdev_issue_zero_pages(). Fallback to
ZERO_PAGE if mm_huge_zero_folio is not available.

On systems that allocates mm_huge_zero_folio, we will end up sending larger
bvecs instead of multiple small ones.

Noticed a 4% increase in performance on a commercial NVMe SSD which does
not support OP_WRITE_ZEROES. The device's MDTS was 128K. The performance
gains might be bigger if the device supports bigger MDTS.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 block/blk-lib.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 4c9f20a689f7..0fd55e028170 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -4,6 +4,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/mm.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
 #include <linux/scatterlist.h>
@@ -196,6 +197,12 @@ static void __blkdev_issue_zero_pages(struct block_device *bdev,
 		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
 		struct bio **biop, unsigned int flags)
 {
+	struct folio *zero_folio;
+
+	zero_folio = mm_get_huge_zero_folio(NULL);
+	if (!zero_folio)
+		zero_folio = page_folio(ZERO_PAGE(0));
+
 	while (nr_sects) {
 		unsigned int nr_vecs = __blkdev_sectors_to_bio_pages(nr_sects);
 		struct bio *bio;
@@ -208,11 +215,12 @@ static void __blkdev_issue_zero_pages(struct block_device *bdev,
 			break;
 
 		do {
-			unsigned int len, added;
+			unsigned int len, added = 0;
 
-			len = min_t(sector_t,
-				PAGE_SIZE, nr_sects << SECTOR_SHIFT);
-			added = bio_add_page(bio, ZERO_PAGE(0), len, 0);
+			len = min_t(sector_t, folio_size(zero_folio),
+				    nr_sects << SECTOR_SHIFT);
+			if (bio_add_folio(bio, zero_folio, len, 0))
+				added = len;
 			if (added < len)
 				break;
 			nr_sects -= added >> SECTOR_SHIFT;
-- 
2.47.2
Re: [RFC 3/3] block: use mm_huge_zero_folio in __blkdev_issue_zero_pages()
Posted by Christoph Hellwig 6 months, 2 weeks ago
On Tue, May 27, 2025 at 07:04:52AM +0200, Pankaj Raghav wrote:
> Noticed a 4% increase in performance on a commercial NVMe SSD which does
> not support OP_WRITE_ZEROES. The device's MDTS was 128K. The performance
> gains might be bigger if the device supports bigger MDTS.

Impressive gain on the one hand - on the other hand what is the macro
workload that does a lot of zeroing on an SSD, because avoiding that
should yield even better result while reducing wear..

> +			unsigned int len, added = 0;
>  
> +			len = min_t(sector_t, folio_size(zero_folio),
> +				    nr_sects << SECTOR_SHIFT);
> +			if (bio_add_folio(bio, zero_folio, len, 0))
> +				added = len;
>  			if (added < len)
>  				break;
>  			nr_sects -= added >> SECTOR_SHIFT;

Unless I'm missing something the added variable can go away now, and
the code using it can simply use len.
Re: [RFC 3/3] block: use mm_huge_zero_folio in __blkdev_issue_zero_pages()
Posted by Pankaj Raghav 6 months, 2 weeks ago
On 6/2/25 07:05, Christoph Hellwig wrote:
> On Tue, May 27, 2025 at 07:04:52AM +0200, Pankaj Raghav wrote:
>> Noticed a 4% increase in performance on a commercial NVMe SSD which does
>> not support OP_WRITE_ZEROES. The device's MDTS was 128K. The performance
>> gains might be bigger if the device supports bigger MDTS.
> 
> Impressive gain on the one hand - on the other hand what is the macro
> workload that does a lot of zeroing on an SSD, because avoiding that
> should yield even better result while reducing wear..
> 

Absolutely. I think it is better to use either WRITE_ZEROES or DISCARD. But I wanted
to have some measurable workload to show the benefits of using a huge page to zero out.

Interestingly, I have seen many client SSDs not implementing WRITE_ZEROES.

>> +			unsigned int len, added = 0;
>>  
>> +			len = min_t(sector_t, folio_size(zero_folio),
>> +				    nr_sects << SECTOR_SHIFT);
>> +			if (bio_add_folio(bio, zero_folio, len, 0))
>> +				added = len;
>>  			if (added < len)
>>  				break;
>>  			nr_sects -= added >> SECTOR_SHIFT;
> 
> Unless I'm missing something the added variable can go away now, and
> the code using it can simply use len.
> 

Yes. This should do it.

if (!bio_add_folio(bio, zero_folio, len, 0))
    break;

nr_sects -= len >> SECTOR_SHIFT;

--
Pankaj