[PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents

Eric Blake posted 11 patches 5 months, 1 week ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Ronnie Sahlberg <ronniesahlberg@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@dlhnet.de>, Eric Blake <eblake@redhat.com>, "Denis V. Lunev" <den@openvz.org>, Alberto Garcia <berto@igalia.com>, Ilya Dryomov <idryomov@gmail.com>, Stefan Weil <sw@weilnetz.de>, Markus Armbruster <armbru@redhat.com>
[PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
Posted by Eric Blake 5 months, 1 week ago
Some BDS drivers have a cap on how much block status they can supply
in one query (for example, NBD talking to an older server cannot
inspect more than 4G per query; and qcow2 tends to cap its answers
rather than cross a cluster boundary of an L1 table).  Although the
existing callers of bdrv_co_is_zero_fast are not passing in that large
of a 'bytes' parameter, an upcoming caller wants to query the entire
image at once, and will thus benefit from being able to treat adjacent
zero regions in a coalesced manner, rather than claiming the region is
non-zero merely because pnum was truncated and didn't match the
incoming bytes.

While refactoring this into a loop, note that there is no need to
assign pnum prior to calling bdrv_co_common_block_status_above() (it
is guaranteed to be assigned deeper in the callstack).

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: also tweak function comment
---
 block/io.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 056af8198bc..d3bd1211acf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2751,28 +2751,31 @@ int coroutine_fn bdrv_co_block_status(BlockDriverState *bs, int64_t offset,
  * by @offset and @bytes is known to read as zeroes.
  * Return 1 if that is the case, 0 otherwise and -errno on error.
  * This test is meant to be fast rather than accurate so returning 0
- * does not guarantee non-zero data.
+ * does not guarantee non-zero data; but a return of 1 is reliable.
  */
 int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
                                       int64_t bytes)
 {
     int ret;
-    int64_t pnum = bytes;
+    int64_t pnum;
     IO_CODE();

-    if (!bytes) {
-        return 1;
+    while (bytes) {
+        ret = bdrv_co_common_block_status_above(bs, NULL, false,
+                                                BDRV_WANT_ZERO, offset, bytes,
+                                                &pnum, NULL, NULL, NULL);
+
+        if (ret < 0) {
+            return ret;
+        }
+        if (!(ret & BDRV_BLOCK_ZERO)) {
+            return 0;
+        }
+        offset += pnum;
+        bytes -= pnum;
     }

-    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_WANT_ZERO,
-                                            offset, bytes, &pnum, NULL, NULL,
-                                            NULL);
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    return (pnum == bytes) && (ret & BDRV_BLOCK_ZERO);
+    return 1;
 }

 int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
-- 
2.49.0
Re: [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
Posted by Stefan Hajnoczi 5 months ago
On Thu, Apr 24, 2025 at 07:52:03PM -0500, Eric Blake wrote:
> Some BDS drivers have a cap on how much block status they can supply
> in one query (for example, NBD talking to an older server cannot
> inspect more than 4G per query; and qcow2 tends to cap its answers
> rather than cross a cluster boundary of an L1 table).  Although the
> existing callers of bdrv_co_is_zero_fast are not passing in that large
> of a 'bytes' parameter, an upcoming caller wants to query the entire
> image at once, and will thus benefit from being able to treat adjacent
> zero regions in a coalesced manner, rather than claiming the region is
> non-zero merely because pnum was truncated and didn't match the
> incoming bytes.
> 
> While refactoring this into a loop, note that there is no need to
> assign pnum prior to calling bdrv_co_common_block_status_above() (it
> is guaranteed to be assigned deeper in the callstack).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: also tweak function comment
> ---
>  block/io.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>