From nobody Sun Feb  8 20:48:19 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=fail(p=none dis=none)  header.from=redhat.com
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1531413655115678.5769695206143;
 Thu, 12 Jul 2018 09:40:55 -0700 (PDT)
Received: from localhost ([::1]:33036 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1fdee2-0002cZ-TH
	for importer@patchew.org; Thu, 12 Jul 2018 12:40:42 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56817)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1fdeVq-0004jw-1y
	for qemu-devel@nongnu.org; Thu, 12 Jul 2018 12:32:15 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1fdeVo-00069Q-HO
	for qemu-devel@nongnu.org; Thu, 12 Jul 2018 12:32:14 -0400
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:47846 helo=mx1.redhat.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <kwolf@redhat.com>)
	id 1fdeVh-00065z-W8; Thu, 12 Jul 2018 12:32:06 -0400
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com
	[10.11.54.4])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 9271C40122CD;
	Thu, 12 Jul 2018 16:32:05 +0000 (UTC)
Received: from localhost.localdomain.com (ovpn-117-16.ams2.redhat.com
	[10.36.117.16])
	by smtp.corp.redhat.com (Postfix) with ESMTP id B9A472026D76;
	Thu, 12 Jul 2018 16:32:04 +0000 (UTC)
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Date: Thu, 12 Jul 2018 18:31:52 +0200
Message-Id: <20180712163152.12521-8-kwolf@redhat.com>
In-Reply-To: <20180712163152.12521-1-kwolf@redhat.com>
References: <20180712163152.12521-1-kwolf@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.11.55.5]);
	Thu, 12 Jul 2018 16:32:05 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]);
	Thu, 12 Jul 2018 16:32:05 +0000 (UTC) for IP:'10.11.54.4'
	DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com'
	HELO:'smtp.corp.redhat.com' FROM:'kwolf@redhat.com' RCPT:''
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.187.233.73
Subject: [Qemu-devel] [PULL 7/7] qemu-img: align result of
 is_allocated_sectors
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: kwolf@redhat.com, peter.maydell@linaro.org, qemu-devel@nongnu.org
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

From: Peter Lieven <pl@kamp.de>

We currently don't enforce that the sparse segments we detect during conver=
t are
aligned. This leads to unnecessary and costly read-modify-write cycles eith=
er
internally in Qemu or in the background on the storage device as nearly all
modern filesystems or hardware have a 4k alignment internally.

This patch modifies is_allocated_sectors so that its *pnum result will alwa=
ys
end at an alignment boundary. This way all requests will end at an alignment
boundary. The start of all requests will also be aligned as long as the res=
ults
of get_block_status do not lead to an unaligned offset.

The number of RMW cycles when converting an example image [1] to a raw devi=
ce that
has 4k sector size is about 4600 4k read requests to perform a total of abo=
ut 15000
write requests. With this path the additional 4600 read requests are elimin=
ated while
the number of total write requests stays constant.

[1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-ser=
ver-cloudimg-amd64-disk1.vmdk

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c                 | 44 ++++++++++++++++++++++++++++++++++++++----=
--
 tests/qemu-iotests/122.out | 18 ++++++++----------
 2 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index f4074ebf75..4a7ce43dc9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1105,11 +1105,15 @@ static int64_t find_nonzero(const uint8_t *buf, int=
64_t n)
  *
  * 'pnum' is set to the number of sectors (including and immediately follo=
wing
  * the first one) that are known to be in the same allocated/unallocated s=
tate.
+ * The function will try to align the end offset to alignment boundaries so
+ * that the request will at least end aligned and consequtive requests will
+ * also start at an aligned offset.
  */
-static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum)
+static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
+                                int64_t sector_num, int alignment)
 {
     bool is_zero;
-    int i;
+    int i, tail;
=20
     if (n <=3D 0) {
         *pnum =3D 0;
@@ -1122,6 +1126,23 @@ static int is_allocated_sectors(const uint8_t *buf, =
int n, int *pnum)
             break;
         }
     }
+
+    tail =3D (sector_num + i) & (alignment - 1);
+    if (tail) {
+        if (is_zero && i <=3D tail) {
+            /* treat unallocated areas which only consist
+             * of a small tail as allocated. */
+            is_zero =3D false;
+        }
+        if (!is_zero) {
+            /* align up end offset of allocated areas. */
+            i +=3D alignment - tail;
+            i =3D MIN(i, n);
+        } else {
+            /* align down end offset of zero areas. */
+            i -=3D tail;
+        }
+    }
     *pnum =3D i;
     return !is_zero;
 }
@@ -1132,7 +1153,7 @@ static int is_allocated_sectors(const uint8_t *buf, i=
nt n, int *pnum)
  * breaking up write requests for only small sparse areas.
  */
 static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
-    int min)
+    int min, int64_t sector_num, int alignment)
 {
     int ret;
     int num_checked, num_used;
@@ -1141,7 +1162,7 @@ static int is_allocated_sectors_min(const uint8_t *bu=
f, int n, int *pnum,
         min =3D n;
     }
=20
-    ret =3D is_allocated_sectors(buf, n, pnum);
+    ret =3D is_allocated_sectors(buf, n, pnum, sector_num, alignment);
     if (!ret) {
         return ret;
     }
@@ -1149,13 +1170,15 @@ static int is_allocated_sectors_min(const uint8_t *=
buf, int n, int *pnum,
     num_used =3D *pnum;
     buf +=3D BDRV_SECTOR_SIZE * *pnum;
     n -=3D *pnum;
+    sector_num +=3D *pnum;
     num_checked =3D num_used;
=20
     while (n > 0) {
-        ret =3D is_allocated_sectors(buf, n, pnum);
+        ret =3D is_allocated_sectors(buf, n, pnum, sector_num, alignment);
=20
         buf +=3D BDRV_SECTOR_SIZE * *pnum;
         n -=3D *pnum;
+        sector_num +=3D *pnum;
         num_checked +=3D *pnum;
         if (ret) {
             num_used =3D num_checked;
@@ -1560,6 +1583,7 @@ typedef struct ImgConvertState {
     bool wr_in_order;
     bool copy_range;
     int min_sparse;
+    int alignment;
     size_t cluster_sectors;
     size_t buf_sectors;
     long num_coroutines;
@@ -1724,7 +1748,8 @@ static int coroutine_fn convert_co_write(ImgConvertSt=
ate *s, int64_t sector_num,
              * zeroed. */
             if (!s->min_sparse ||
                 (!s->compressed &&
-                 is_allocated_sectors_min(buf, n, &n, s->min_sparse)) ||
+                 is_allocated_sectors_min(buf, n, &n, s->min_sparse,
+                                          sector_num, s->alignment)) ||
                 (s->compressed &&
                  !buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)))
             {
@@ -2368,6 +2393,13 @@ static int img_convert(int argc, char **argv)
                                 out_bs->bl.pdiscard_alignment >>
                                 BDRV_SECTOR_BITS)));
=20
+    /* try to align the write requests to the destination to avoid unneces=
sary
+     * RMW cycles. */
+    s.alignment =3D MAX(pow2floor(s.min_sparse),
+                      DIV_ROUND_UP(out_bs->bl.request_alignment,
+                                   BDRV_SECTOR_SIZE));
+    assert(is_power_of_2(s.alignment));
+
     if (skip_create) {
         int64_t output_sectors =3D blk_nb_sectors(s.target);
         if (output_sectors < 0) {
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 6c7ee1da6c..c576705284 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -194,12 +194,12 @@ wrote 1024/1024 bytes at offset 17408
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
=20
 convert -S 4k
-[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true, "o=
ffset": OFFSET},
-{ "start": 1024, "length": 7168, "depth": 0, "zero": true, "data": false},
-{ "start": 8192, "length": 1024, "depth": 0, "zero": false, "data": true, =
"offset": OFFSET},
-{ "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false},
-{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true,=
 "offset": OFFSET},
-{ "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": fa=
lse}]
+[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "o=
ffset": OFFSET},
+{ "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, =
"offset": OFFSET},
+{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true,=
 "offset": OFFSET},
+{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": fa=
lse}]
=20
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
@@ -210,10 +210,8 @@ convert -c -S 4k
 { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": fa=
lse}]
=20
 convert -S 8k
-[{ "start": 0, "length": 9216, "depth": 0, "zero": false, "data": true, "o=
ffset": OFFSET},
-{ "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false},
-{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true,=
 "offset": OFFSET},
-{ "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": fa=
lse}]
+[{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "=
offset": OFFSET},
+{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": fa=
lse}]
=20
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
--=20
2.13.6