1
The following changes since commit c1eb2ddf0f8075faddc5f7c3d39feae3e8e9d6b4:
1
The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
2
2
3
Update version for v8.0.0 release (2023-04-19 17:27:13 +0100)
3
Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
8
8
9
for you to fetch changes up to 36e5e9b22abe56aa00ca067851555ad8127a7966:
9
for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
10
10
11
tracing: install trace events file only if necessary (2023-04-20 07:39:43 -0400)
11
iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Block patches:
15
- The SSH block driver now uses libssh instead of libssh2
16
- The VMDK block driver gets read-only support for the seSparse
17
subformat
18
- Various fixes
15
19
16
Sam Li's zoned storage work and fixes I collected during the 8.0 freeze.
20
---
21
22
v2:
23
- Squashed Pino's fix for pre-0.8 libssh into the libssh patch
17
24
18
----------------------------------------------------------------
25
----------------------------------------------------------------
26
Anton Nefedov (1):
27
iotest 134: test cluster-misaligned encrypted write
19
28
20
Carlos Santos (1):
29
Klaus Birkelund Jensen (1):
21
tracing: install trace events file only if necessary
30
nvme: do not advertise support for unsupported arbitration mechanism
22
31
23
Philippe Mathieu-Daudé (1):
32
Max Reitz (1):
24
block/dmg: Declare a type definition for DMG uncompress function
33
iotests: Fix 205 for concurrent runs
25
34
26
Sam Li (17):
35
Pino Toscano (1):
27
block/block-common: add zoned device structs
36
ssh: switch from libssh2 to libssh
28
block/file-posix: introduce helper functions for sysfs attributes
29
block/block-backend: add block layer APIs resembling Linux
30
ZonedBlockDevice ioctls
31
block/raw-format: add zone operations to pass through requests
32
block: add zoned BlockDriver check to block layer
33
iotests: test new zone operations
34
block: add some trace events for new block layer APIs
35
docs/zoned-storage: add zoned device documentation
36
file-posix: add tracking of the zone write pointers
37
block: introduce zone append write for zoned devices
38
qemu-iotests: test zone append operation
39
block: add some trace events for zone append
40
include: update virtio_blk headers to v6.3-rc1
41
virtio-blk: add zoned storage emulation for zoned devices
42
block: add accounting for zone append operation
43
virtio-blk: add some trace events for zoned emulation
44
docs/zoned-storage:add zoned emulation use case
45
37
46
Thomas De Schampheleire (1):
38
Sam Eiderman (3):
47
tracetool: use relative paths for '#line' preprocessor directives
39
vmdk: Fix comment regarding max l1_size coverage
40
vmdk: Reduce the max bound for L1 table size
41
vmdk: Add read-only support for seSparse snapshots
48
42
49
docs/devel/index-api.rst | 1 +
43
Vladimir Sementsov-Ogievskiy (1):
50
docs/devel/zoned-storage.rst | 62 ++
44
blockdev: enable non-root nodes for transaction drive-backup source
51
qapi/block-core.json | 68 +-
45
52
qapi/block.json | 4 +
46
configure | 65 +-
53
meson.build | 4 +
47
block/Makefile.objs | 6 +-
54
block/dmg.h | 8 +-
48
block/ssh.c | 652 ++++++++++--------
55
include/block/accounting.h | 1 +
49
block/vmdk.c | 372 +++++++++-
56
include/block/block-common.h | 57 ++
50
blockdev.c | 2 +-
57
include/block/block-io.h | 13 +
51
hw/block/nvme.c | 1 -
58
include/block/block_int-common.h | 37 +
52
.travis.yml | 4 +-
59
include/block/raw-aio.h | 8 +-
53
block/trace-events | 14 +-
60
include/standard-headers/drm/drm_fourcc.h | 12 +
54
docs/qemu-block-drivers.texi | 2 +-
61
include/standard-headers/linux/ethtool.h | 48 +-
55
.../dockerfiles/debian-win32-cross.docker | 1 -
62
include/standard-headers/linux/fuse.h | 45 +-
56
.../dockerfiles/debian-win64-cross.docker | 1 -
63
include/standard-headers/linux/pci_regs.h | 1 +
57
tests/docker/dockerfiles/fedora.docker | 4 +-
64
include/standard-headers/linux/vhost_types.h | 2 +
58
tests/docker/dockerfiles/ubuntu.docker | 2 +-
65
include/standard-headers/linux/virtio_blk.h | 105 +++
59
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
66
include/sysemu/block-backend-io.h | 27 +
60
tests/qemu-iotests/059.out | 2 +-
67
linux-headers/asm-arm64/kvm.h | 1 +
61
tests/qemu-iotests/134 | 9 +
68
linux-headers/asm-x86/kvm.h | 34 +-
62
tests/qemu-iotests/134.out | 10 +
69
linux-headers/linux/kvm.h | 9 +
63
tests/qemu-iotests/205 | 2 +-
70
linux-headers/linux/vfio.h | 15 +-
64
tests/qemu-iotests/207 | 54 +-
71
linux-headers/linux/vhost.h | 8 +
65
tests/qemu-iotests/207.out | 2 +-
72
block.c | 19 +
66
20 files changed, 823 insertions(+), 384 deletions(-)
73
block/block-backend.c | 193 ++++++
74
block/dmg.c | 7 +-
75
block/file-posix.c | 677 +++++++++++++++++--
76
block/io.c | 68 ++
77
block/io_uring.c | 4 +
78
block/linux-aio.c | 3 +
79
block/qapi-sysemu.c | 11 +
80
block/qapi.c | 18 +
81
block/raw-format.c | 26 +
82
hw/block/virtio-blk-common.c | 2 +
83
hw/block/virtio-blk.c | 405 +++++++++++
84
hw/virtio/virtio-qmp.c | 2 +
85
qemu-io-cmds.c | 224 ++++++
86
block/trace-events | 4 +
87
docs/system/qemu-block-drivers.rst.inc | 6 +
88
hw/block/trace-events | 7 +
89
scripts/tracetool/backend/ftrace.py | 4 +-
90
scripts/tracetool/backend/log.py | 4 +-
91
scripts/tracetool/backend/syslog.py | 4 +-
92
tests/qemu-iotests/tests/zoned | 105 +++
93
tests/qemu-iotests/tests/zoned.out | 69 ++
94
trace/meson.build | 2 +-
95
46 files changed, 2353 insertions(+), 81 deletions(-)
96
create mode 100644 docs/devel/zoned-storage.rst
97
create mode 100755 tests/qemu-iotests/tests/zoned
98
create mode 100644 tests/qemu-iotests/tests/zoned.out
99
67
100
--
68
--
101
2.39.2
69
2.21.0
102
70
103
71
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
1
From: Klaus Birkelund Jensen <klaus@birkelund.eu>
2
2
3
Introduce the BdrvDmgUncompressFunc type defintion. To emphasis
3
The device mistakenly reports that the Weighted Round Robin with Urgent
4
dmg_uncompress_bz2 and dmg_uncompress_lzfse are pointer to functions,
4
Priority Class arbitration mechanism is supported.
5
declare them using this new typedef.
6
5
7
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
6
It is not.
8
Message-id: 20230320152610.32052-1-philmd@linaro.org
7
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
9
Message-id: 20190606092530.14206-1-klaus@birkelund.eu
10
Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
11
Signed-off-by: Max Reitz <mreitz@redhat.com>
10
---
12
---
11
block/dmg.h | 8 ++++----
13
hw/block/nvme.c | 1 -
12
block/dmg.c | 7 ++-----
14
1 file changed, 1 deletion(-)
13
2 files changed, 6 insertions(+), 9 deletions(-)
14
15
15
diff --git a/block/dmg.h b/block/dmg.h
16
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/block/dmg.h
18
--- a/hw/block/nvme.c
18
+++ b/block/dmg.h
19
+++ b/hw/block/nvme.c
19
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVDMGState {
20
@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
20
z_stream zstream;
21
n->bar.cap = 0;
21
} BDRVDMGState;
22
NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
22
23
NVME_CAP_SET_CQR(n->bar.cap, 1);
23
-extern int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
24
- NVME_CAP_SET_AMS(n->bar.cap, 1);
24
- char *next_out, unsigned int avail_out);
25
NVME_CAP_SET_TO(n->bar.cap, 0xf);
25
+typedef int BdrvDmgUncompressFunc(char *next_in, unsigned int avail_in,
26
NVME_CAP_SET_CSS(n->bar.cap, 1);
26
+ char *next_out, unsigned int avail_out);
27
NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
27
28
-extern int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in,
29
- char *next_out, unsigned int avail_out);
30
+extern BdrvDmgUncompressFunc *dmg_uncompress_bz2;
31
+extern BdrvDmgUncompressFunc *dmg_uncompress_lzfse;
32
33
#endif
34
diff --git a/block/dmg.c b/block/dmg.c
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/dmg.c
37
+++ b/block/dmg.c
38
@@ -XXX,XX +XXX,XX @@
39
#include "qemu/memalign.h"
40
#include "dmg.h"
41
42
-int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
43
- char *next_out, unsigned int avail_out);
44
-
45
-int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in,
46
- char *next_out, unsigned int avail_out);
47
+BdrvDmgUncompressFunc *dmg_uncompress_bz2;
48
+BdrvDmgUncompressFunc *dmg_uncompress_lzfse;
49
50
enum {
51
/* Limit chunk sizes to prevent unreasonable amounts of memory being used
52
--
28
--
53
2.39.2
29
2.21.0
54
30
55
31
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
3
We forget to enable it for transaction .prepare, while it is already
4
allowed.
4
enabled in do_drive_backup since commit a2d665c1bc362
5
"blockdev: loosen restrictions on drive-backup source node"
5
6
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: John Snow <jsnow@redhat.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org> and clarify that the check is about zoned
14
BlockDrivers.
15
--Stefan]
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
11
---
18
include/block/block_int-common.h | 5 +++++
12
blockdev.c | 2 +-
19
block.c | 19 +++++++++++++++++++
13
1 file changed, 1 insertion(+), 1 deletion(-)
20
block/file-posix.c | 12 ++++++++++++
21
block/raw-format.c | 1 +
22
4 files changed, 37 insertions(+)
23
14
24
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
15
diff --git a/blockdev.c b/blockdev.c
25
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
26
--- a/include/block/block_int-common.h
17
--- a/blockdev.c
27
+++ b/include/block/block_int-common.h
18
+++ b/blockdev.c
28
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
19
@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
29
*/
20
assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
30
bool is_format;
21
backup = common->action->u.drive_backup.data;
31
22
32
+ /*
23
- bs = qmp_get_root_bs(backup->device, errp);
33
+ * Set to true if the BlockDriver supports zoned children.
24
+ bs = bdrv_lookup_bs(backup->device, backup->device, errp);
34
+ */
25
if (!bs) {
35
+ bool supports_zoned_children;
36
+
37
/*
38
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
39
* this field set to true, except ones that are defined only by their
40
diff --git a/block.c b/block.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/block.c
43
+++ b/block.c
44
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
45
return;
26
return;
46
}
27
}
47
48
+ /*
49
+ * Non-zoned block drivers do not follow zoned storage constraints
50
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
51
+ * drivers in a graph.
52
+ */
53
+ if (!parent_bs->drv->supports_zoned_children &&
54
+ child_bs->bl.zoned == BLK_Z_HM) {
55
+ /*
56
+ * The host-aware model allows zoned storage constraints and random
57
+ * write. Allow mixing host-aware and non-zoned drivers. Using
58
+ * host-aware device as a regular device.
59
+ */
60
+ error_setg(errp, "Cannot add a %s child to a %s parent",
61
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
62
+ parent_bs->drv->supports_zoned_children ?
63
+ "support zoned children" : "not support zoned children");
64
+ return;
65
+ }
66
+
67
if (!QLIST_EMPTY(&child_bs->parents)) {
68
error_setg(errp, "The node %s already has a parent",
69
child_bs->node_name);
70
diff --git a/block/file-posix.c b/block/file-posix.c
71
index XXXXXXX..XXXXXXX 100644
72
--- a/block/file-posix.c
73
+++ b/block/file-posix.c
74
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
75
goto fail;
76
}
77
}
78
+#ifdef CONFIG_BLKZONED
79
+ /*
80
+ * The kernel page cache does not reliably work for writes to SWR zones
81
+ * of zoned block device because it can not guarantee the order of writes.
82
+ */
83
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
84
+ (!(s->open_flags & O_DIRECT))) {
85
+ error_setg(errp, "The driver supports zoned devices, and it requires "
86
+ "cache.direct=on, which was not specified.");
87
+ return -EINVAL; /* No host kernel page cache */
88
+ }
89
+#endif
90
91
if (S_ISBLK(st.st_mode)) {
92
#ifdef __linux__
93
diff --git a/block/raw-format.c b/block/raw-format.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/block/raw-format.c
96
+++ b/block/raw-format.c
97
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
98
BlockDriver bdrv_raw = {
99
.format_name = "raw",
100
.instance_size = sizeof(BDRVRawState),
101
+ .supports_zoned_children = true,
102
.bdrv_probe = &raw_probe,
103
.bdrv_reopen_prepare = &raw_reopen_prepare,
104
.bdrv_reopen_commit = &raw_reopen_commit,
105
--
28
--
106
2.39.2
29
2.21.0
107
30
108
31
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
2
2
3
The new block layer APIs of zoned block devices can be tested by:
3
COW (even empty/zero) areas require encryption too
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
7
4
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
5
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Eric Blake <eblake@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Reviewed-by: Max Reitz <mreitz@redhat.com>
11
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
9
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
13
<philmd@linaro.org>.
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
11
---
17
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
12
tests/qemu-iotests/134 | 9 +++++++++
18
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
13
tests/qemu-iotests/134.out | 10 ++++++++++
19
2 files changed, 142 insertions(+)
14
2 files changed, 19 insertions(+)
20
create mode 100755 tests/qemu-iotests/tests/zoned
21
create mode 100644 tests/qemu-iotests/tests/zoned.out
22
15
23
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
16
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
24
new file mode 100755
17
index XXXXXXX..XXXXXXX 100755
25
index XXXXXXX..XXXXXXX
18
--- a/tests/qemu-iotests/134
26
--- /dev/null
19
+++ b/tests/qemu-iotests/134
27
+++ b/tests/qemu-iotests/tests/zoned
20
@@ -XXX,XX +XXX,XX @@ echo
28
@@ -XXX,XX +XXX,XX @@
21
echo "== reading whole image =="
29
+#!/usr/bin/env bash
22
$QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
30
+#
23
31
+# Test zone management operations.
24
+echo
32
+#
25
+echo "== rewriting cluster part =="
26
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
33
+
27
+
34
+seq="$(basename $0)"
28
+echo
35
+echo "QA output created by $seq"
29
+echo "== verify pattern =="
36
+status=1 # failure is the default!
30
+$QEMU_IO --object $SECRET -c "read -P 0 0 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
31
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
37
+
32
+
38
+_cleanup()
33
echo
39
+{
34
echo "== rewriting whole image =="
40
+ _cleanup_test_img
35
$QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
41
+ sudo -n rmmod null_blk
36
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
42
+}
37
index XXXXXXX..XXXXXXX 100644
43
+trap "_cleanup; exit \$status" 0 1 2 3 15
38
--- a/tests/qemu-iotests/134.out
39
+++ b/tests/qemu-iotests/134.out
40
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
41
read 134217728/134217728 bytes at offset 0
42
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
43
44
+== rewriting cluster part ==
45
+wrote 512/512 bytes at offset 512
46
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
44
+
47
+
45
+# get standard environment, filters and checks
48
+== verify pattern ==
46
+. ../common.rc
49
+read 512/512 bytes at offset 0
47
+. ../common.filter
50
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
48
+. ../common.qemu
51
+read 512/512 bytes at offset 512
52
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
49
+
53
+
50
+# This test only runs on Linux hosts with raw image files.
54
== rewriting whole image ==
51
+_supported_fmt raw
55
wrote 134217728/134217728 bytes at offset 0
52
+_supported_proto file
56
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
53
+_supported_os Linux
54
+
55
+sudo -n true || \
56
+ _notrun 'Password-less sudo required'
57
+
58
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
59
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
60
+
61
+echo "Testing a null_blk device:"
62
+echo "case 1: if the operations work"
63
+sudo -n modprobe null_blk nr_devices=1 zoned=1
64
+sudo -n chmod 0666 /dev/nullb0
65
+
66
+echo "(1) report the first zone:"
67
+$QEMU_IO $IMG -c "zrp 0 1"
68
+echo
69
+echo "report the first 10 zones"
70
+$QEMU_IO $IMG -c "zrp 0 10"
71
+echo
72
+echo "report the last zone:"
73
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
74
+echo
75
+echo
76
+echo "(2) opening the first zone"
77
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
78
+echo "report after:"
79
+$QEMU_IO $IMG -c "zrp 0 1"
80
+echo
81
+echo "opening the second zone"
82
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
83
+echo "report after:"
84
+$QEMU_IO $IMG -c "zrp 268435456 1"
85
+echo
86
+echo "opening the last zone"
87
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
88
+echo "report after:"
89
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
90
+echo
91
+echo
92
+echo "(3) closing the first zone"
93
+$QEMU_IO $IMG -c "zc 0 268435456"
94
+echo "report after:"
95
+$QEMU_IO $IMG -c "zrp 0 1"
96
+echo
97
+echo "closing the last zone"
98
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
99
+echo "report after:"
100
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
101
+echo
102
+echo
103
+echo "(4) finishing the second zone"
104
+$QEMU_IO $IMG -c "zf 268435456 268435456"
105
+echo "After finishing a zone:"
106
+$QEMU_IO $IMG -c "zrp 268435456 1"
107
+echo
108
+echo
109
+echo "(5) resetting the second zone"
110
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
111
+echo "After resetting a zone:"
112
+$QEMU_IO $IMG -c "zrp 268435456 1"
113
+
114
+# success, all done
115
+echo "*** done"
116
+rm -f $seq.full
117
+status=0
118
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
119
new file mode 100644
120
index XXXXXXX..XXXXXXX
121
--- /dev/null
122
+++ b/tests/qemu-iotests/tests/zoned.out
123
@@ -XXX,XX +XXX,XX @@
124
+QA output created by zoned
125
+Testing a null_blk device:
126
+case 1: if the operations work
127
+(1) report the first zone:
128
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
129
+
130
+report the first 10 zones
131
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
132
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
133
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
134
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
135
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
136
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
137
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
138
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
139
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
140
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
141
+
142
+report the last zone:
143
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
144
+
145
+
146
+(2) opening the first zone
147
+report after:
148
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
149
+
150
+opening the second zone
151
+report after:
152
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
153
+
154
+opening the last zone
155
+report after:
156
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
157
+
158
+
159
+(3) closing the first zone
160
+report after:
161
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
162
+
163
+closing the last zone
164
+report after:
165
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
166
+
167
+
168
+(4) finishing the second zone
169
+After finishing a zone:
170
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
171
+
172
+
173
+(5) resetting the second zone
174
+After resetting a zone:
175
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
176
+*** done
177
--
57
--
178
2.39.2
58
2.21.0
179
59
180
60
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
2
2
3
raw-format driver usually sits on top of file-posix driver. It needs to
3
Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
4
pass through requests of zone commands.
4
extended the l1_size check from VMDK4 to VMDK3 but did not update the
5
default coverage in the moved comment.
5
6
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
The previous vmdk4 calculation:
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
(512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
The added vmdk3 calculation:
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
12
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
13
(512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
13
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
14
14
<philmd@linaro.org>.
15
Adding the calculation of vmdk3 to the comment.
15
--Stefan]
16
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
In any case, VMware does not offer virtual disks more than 2TB for
18
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
19
not implemented yet in qemu.
20
21
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
22
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
23
Reviewed-by: Liran Alon <liran.alon@oracle.com>
24
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
25
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
26
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
27
Reviewed-by: yuchenlin <yuchenlin@synology.com>
28
Reviewed-by: Max Reitz <mreitz@redhat.com>
29
Signed-off-by: Max Reitz <mreitz@redhat.com>
17
---
30
---
18
block/raw-format.c | 17 +++++++++++++++++
31
block/vmdk.c | 11 ++++++++---
19
1 file changed, 17 insertions(+)
32
1 file changed, 8 insertions(+), 3 deletions(-)
20
33
21
diff --git a/block/raw-format.c b/block/raw-format.c
34
diff --git a/block/vmdk.c b/block/vmdk.c
22
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
23
--- a/block/raw-format.c
36
--- a/block/vmdk.c
24
+++ b/block/raw-format.c
37
+++ b/block/vmdk.c
25
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
38
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
26
return bdrv_co_pdiscard(bs->file, offset, bytes);
39
return -EFBIG;
27
}
40
}
28
41
if (l1_size > 512 * 1024 * 1024) {
29
+static int coroutine_fn GRAPH_RDLOCK
42
- /* Although with big capacity and small l1_entry_sectors, we can get a
30
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
43
+ /*
31
+ unsigned int *nr_zones,
44
+ * Although with big capacity and small l1_entry_sectors, we can get a
32
+ BlockZoneDescriptor *zones)
45
* big l1_size, we don't want unbounded value to allocate the table.
33
+{
46
- * Limit it to 512M, which is 16PB for default cluster and L2 table
34
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
47
- * size */
35
+}
48
+ * Limit it to 512M, which is:
36
+
49
+ * 16PB - for default "Hosted Sparse Extent" (VMDK4)
37
+static int coroutine_fn GRAPH_RDLOCK
50
+ * cluster size: 64KB, L2 table size: 512 entries
38
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
51
+ * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
39
+ int64_t offset, int64_t len)
52
+ * cluster size: 512B, L2 table size: 4096 entries
40
+{
53
+ */
41
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
54
error_setg(errp, "L1 size too big");
42
+}
55
return -EFBIG;
43
+
56
}
44
static int64_t coroutine_fn GRAPH_RDLOCK
45
raw_co_getlength(BlockDriverState *bs)
46
{
47
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
48
.bdrv_co_pwritev = &raw_co_pwritev,
49
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
50
.bdrv_co_pdiscard = &raw_co_pdiscard,
51
+ .bdrv_co_zone_report = &raw_co_zone_report,
52
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
53
.bdrv_co_block_status = &raw_co_block_status,
54
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
55
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
56
--
57
--
57
2.39.2
58
2.21.0
58
59
59
60
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
2
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
3
512M of L1 entries is a very loose bound, only 32M are required to store
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
the maximal supported VMDK file size of 2TB.
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
5
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
6
Fixed qemu-iotest 59# - now failure occures before on impossible L1
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
7
table size.
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
8
9
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
9
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
10
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
10
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
11
<philmd@linaro.org>.
11
Reviewed-by: Liran Alon <liran.alon@oracle.com>
12
--Stefan]
12
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
14
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
15
Reviewed-by: Max Reitz <mreitz@redhat.com>
16
Signed-off-by: Max Reitz <mreitz@redhat.com>
14
---
17
---
15
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
18
block/vmdk.c | 13 +++++++------
16
1 file changed, 43 insertions(+)
19
tests/qemu-iotests/059.out | 2 +-
20
2 files changed, 8 insertions(+), 7 deletions(-)
17
21
18
diff --git a/include/block/block-common.h b/include/block/block-common.h
22
diff --git a/block/vmdk.c b/block/vmdk.c
19
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
20
--- a/include/block/block-common.h
24
--- a/block/vmdk.c
21
+++ b/include/block/block-common.h
25
+++ b/block/vmdk.c
22
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
26
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
23
typedef struct BdrvChild BdrvChild;
27
error_setg(errp, "Invalid granularity, image may be corrupt");
24
typedef struct BdrvChildClass BdrvChildClass;
28
return -EFBIG;
25
29
}
26
+typedef enum BlockZoneOp {
30
- if (l1_size > 512 * 1024 * 1024) {
27
+ BLK_ZO_OPEN,
31
+ if (l1_size > 32 * 1024 * 1024) {
28
+ BLK_ZO_CLOSE,
32
/*
29
+ BLK_ZO_FINISH,
33
* Although with big capacity and small l1_entry_sectors, we can get a
30
+ BLK_ZO_RESET,
34
* big l1_size, we don't want unbounded value to allocate the table.
31
+} BlockZoneOp;
35
- * Limit it to 512M, which is:
32
+
36
- * 16PB - for default "Hosted Sparse Extent" (VMDK4)
33
+typedef enum BlockZoneModel {
37
- * cluster size: 64KB, L2 table size: 512 entries
34
+ BLK_Z_NONE = 0x0, /* Regular block device */
38
- * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
35
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
39
- * cluster size: 512B, L2 table size: 4096 entries
36
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
40
+ * Limit it to 32M, which is enough to store:
37
+} BlockZoneModel;
41
+ * 8TB - for both VMDK3 & VMDK4 with
38
+
42
+ * minimal cluster size: 512B
39
+typedef enum BlockZoneState {
43
+ * minimal L2 table size: 512 entries
40
+ BLK_ZS_NOT_WP = 0x0,
44
+ * 8 TB is still more than the maximal value supported for
41
+ BLK_ZS_EMPTY = 0x1,
45
+ * VMDK3 & VMDK4 which is 2TB.
42
+ BLK_ZS_IOPEN = 0x2,
46
*/
43
+ BLK_ZS_EOPEN = 0x3,
47
error_setg(errp, "L1 size too big");
44
+ BLK_ZS_CLOSED = 0x4,
48
return -EFBIG;
45
+ BLK_ZS_RDONLY = 0xD,
49
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
46
+ BLK_ZS_FULL = 0xE,
50
index XXXXXXX..XXXXXXX 100644
47
+ BLK_ZS_OFFLINE = 0xF,
51
--- a/tests/qemu-iotests/059.out
48
+} BlockZoneState;
52
+++ b/tests/qemu-iotests/059.out
49
+
53
@@ -XXX,XX +XXX,XX @@ Offset Length Mapped to File
50
+typedef enum BlockZoneType {
54
0x140000000 0x10000 0x50000 TEST_DIR/t-s003.vmdk
51
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
55
52
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
56
=== Testing afl image with a very large capacity ===
53
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
57
-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
54
+} BlockZoneType;
58
+qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
55
+
59
*** done
56
+/*
57
+ * Zone descriptor data structure.
58
+ * Provides information on a zone with all position and size values in bytes.
59
+ */
60
+typedef struct BlockZoneDescriptor {
61
+ uint64_t start;
62
+ uint64_t length;
63
+ uint64_t cap;
64
+ uint64_t wp;
65
+ BlockZoneType type;
66
+ BlockZoneState state;
67
+} BlockZoneDescriptor;
68
+
69
typedef struct BlockDriverInfo {
70
/* in bytes, 0 if irrelevant */
71
int cluster_size;
72
--
60
--
73
2.39.2
61
2.21.0
74
62
75
63
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
2
2
3
Use get_sysfs_str_val() to get the string value of device
3
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
4
zoned model. Then get_sysfs_zoned_model() can convert it to
4
QEMU).
5
BlockZoneModel type of QEMU.
5
6
6
This format was lacking in the following:
7
Use get_sysfs_long_val() to get the long value of zoned device
7
8
information.
8
* Grain directory (L1) and grain table (L2) entries were 32-bit,
9
9
allowing access to only 2TB (slightly less) of data.
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
10
* The grain size (default) was 512 bytes - leading to data
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
11
fragmentation and many grain tables.
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
* For space reclamation purposes, it was necessary to find all the
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
13
grains which are not pointed to by any grain table - so a reverse
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
14
mapping of "offset of grain in vmdk" to "grain table" must be
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
15
constructed - which takes large amounts of CPU/RAM.
16
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
16
17
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
17
The format specification can be found in VMware's documentation:
18
<philmd@linaro.org>.
18
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
19
--Stefan]
19
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
20
In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
21
introduced: SESparse (Space Efficient).
22
23
This format fixes the above issues:
24
25
* All entries are now 64-bit.
26
* The grain size (default) is 4KB.
27
* Grain directory and grain tables are now located at the beginning
28
of the file.
29
+ seSparse format reserves space for all grain tables.
30
+ Grain tables can be addressed using an index.
31
+ Grains are located in the end of the file and can also be
32
addressed with an index.
33
- seSparse vmdks of large disks (64TB) have huge preallocated
34
headers - mainly due to L2 tables, even for empty snapshots.
35
* The header contains a reverse mapping ("backmap") of "offset of
36
grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
37
specifies for each grain - whether it is allocated or not.
38
Using these data structures we can implement space reclamation
39
efficiently.
40
* Due to the fact that the header now maintains two mappings:
41
* The regular one (grain directory & grain tables)
42
* A reverse one (backmap and free bitmap)
43
These data structures can lose consistency upon crash and result
44
in a corrupted VMDK.
45
Therefore, a journal is also added to the VMDK and is replayed
46
when the VMware reopens the file after a crash.
47
48
Since ESXi 6.7 - SESparse is the only snapshot format available.
49
50
Unfortunately, VMware does not provide documentation regarding the new
51
seSparse format.
52
53
This commit is based on black-box research of the seSparse format.
54
Various in-guest block operations and their effect on the snapshot file
55
were tested.
56
57
The only VMware provided source of information (regarding the underlying
58
implementation) was a log file on the ESXi:
59
60
/var/log/hostd.log
61
62
Whenever an seSparse snapshot is created - the log is being populated
63
with seSparse records.
64
65
Relevant log records are of the form:
66
67
[...] Const Header:
68
[...] constMagic = 0xcafebabe
69
[...] version = 2.1
70
[...] capacity = 204800
71
[...] grainSize = 8
72
[...] grainTableSize = 64
73
[...] flags = 0
74
[...] Extents:
75
[...] Header : <1 : 1>
76
[...] JournalHdr : <2 : 2>
77
[...] Journal : <2048 : 2048>
78
[...] GrainDirectory : <4096 : 2048>
79
[...] GrainTables : <6144 : 2048>
80
[...] FreeBitmap : <8192 : 2048>
81
[...] BackMap : <10240 : 2048>
82
[...] Grain : <12288 : 204800>
83
[...] Volatile Header:
84
[...] volatileMagic = 0xcafecafe
85
[...] FreeGTNumber = 0
86
[...] nextTxnSeqNumber = 0
87
[...] replayJournal = 0
88
89
The sizes that are seen in the log file are in sectors.
90
Extents are of the following format: <offset : size>
91
92
This commit is a strict implementation which enforces:
93
* magics
94
* version number 2.1
95
* grain size of 8 sectors (4KB)
96
* grain table size of 64 sectors
97
* zero flags
98
* extent locations
99
100
Additionally, this commit proivdes only a subset of the functionality
101
offered by seSparse's format:
102
* Read-only
103
* No journal replay
104
* No space reclamation
105
* No unmap support
106
107
Hence, journal header, journal, free bitmap and backmap extents are
108
unused, only the "classic" (L1 -> L2 -> data) grain access is
109
implemented.
110
111
However there are several differences in the grain access itself.
112
Grain directory (L1):
113
* Grain directory entries are indexes (not offsets) to grain
114
tables.
115
* Valid grain directory entries have their highest nibble set to
116
0x1.
117
* Since grain tables are always located in the beginning of the
118
file - the index can fit into 32 bits - so we can use its low
119
part if it's valid.
120
Grain table (L2):
121
* Grain table entries are indexes (not offsets) to grains.
122
* If the highest nibble of the entry is:
123
0x0:
124
The grain in not allocated.
125
The rest of the bytes are 0.
126
0x1:
127
The grain is unmapped - guest sees a zero grain.
128
The rest of the bits point to the previously mapped grain,
129
see 0x3 case.
130
0x2:
131
The grain is zero.
132
0x3:
133
The grain is allocated - to get the index calculate:
134
((entry & 0x0fff000000000000) >> 48) |
135
((entry & 0x0000ffffffffffff) << 12)
136
* The difference between 0x1 and 0x2 is that 0x1 is an unallocated
137
grain which results from the guest using sg_unmap to unmap the
138
grain - but the grain itself still exists in the grain extent - a
139
space reclamation procedure should delete it.
140
Unmapping a zero grain has no effect (0x2 will not change to 0x1)
141
but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
142
143
In order to implement seSparse some fields had to be changed to support
144
both 32-bit and 64-bit entry sizes.
145
146
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
147
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
148
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
149
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
150
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
151
Signed-off-by: Max Reitz <mreitz@redhat.com>
21
---
152
---
22
include/block/block_int-common.h | 3 +
153
block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
23
block/file-posix.c | 130 ++++++++++++++++++++++---------
154
1 file changed, 342 insertions(+), 16 deletions(-)
24
2 files changed, 95 insertions(+), 38 deletions(-)
155
25
156
diff --git a/block/vmdk.c b/block/vmdk.c
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
27
index XXXXXXX..XXXXXXX 100644
157
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
158
--- a/block/vmdk.c
29
+++ b/include/block/block_int-common.h
159
+++ b/block/vmdk.c
30
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
160
@@ -XXX,XX +XXX,XX @@ typedef struct {
31
* an explicit monitor command to load the disk inside the guest).
161
uint16_t compressAlgorithm;
32
*/
162
} QEMU_PACKED VMDK4Header;
33
bool has_variable_length;
163
34
+
164
+typedef struct VMDKSESparseConstHeader {
35
+ /* device zone model */
165
+ uint64_t magic;
36
+ BlockZoneModel zoned;
166
+ uint64_t version;
37
} BlockLimits;
167
+ uint64_t capacity;
38
168
+ uint64_t grain_size;
39
typedef struct BdrvOpBlocker BdrvOpBlocker;
169
+ uint64_t grain_table_size;
40
diff --git a/block/file-posix.c b/block/file-posix.c
170
+ uint64_t flags;
41
index XXXXXXX..XXXXXXX 100644
171
+ uint64_t reserved1;
42
--- a/block/file-posix.c
172
+ uint64_t reserved2;
43
+++ b/block/file-posix.c
173
+ uint64_t reserved3;
44
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
174
+ uint64_t reserved4;
45
#endif
175
+ uint64_t volatile_header_offset;
176
+ uint64_t volatile_header_size;
177
+ uint64_t journal_header_offset;
178
+ uint64_t journal_header_size;
179
+ uint64_t journal_offset;
180
+ uint64_t journal_size;
181
+ uint64_t grain_dir_offset;
182
+ uint64_t grain_dir_size;
183
+ uint64_t grain_tables_offset;
184
+ uint64_t grain_tables_size;
185
+ uint64_t free_bitmap_offset;
186
+ uint64_t free_bitmap_size;
187
+ uint64_t backmap_offset;
188
+ uint64_t backmap_size;
189
+ uint64_t grains_offset;
190
+ uint64_t grains_size;
191
+ uint8_t pad[304];
192
+} QEMU_PACKED VMDKSESparseConstHeader;
193
+
194
+typedef struct VMDKSESparseVolatileHeader {
195
+ uint64_t magic;
196
+ uint64_t free_gt_number;
197
+ uint64_t next_txn_seq_number;
198
+ uint64_t replay_journal;
199
+ uint8_t pad[480];
200
+} QEMU_PACKED VMDKSESparseVolatileHeader;
201
+
202
#define L2_CACHE_SIZE 16
203
204
typedef struct VmdkExtent {
205
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
206
bool compressed;
207
bool has_marker;
208
bool has_zero_grain;
209
+ bool sesparse;
210
+ uint64_t sesparse_l2_tables_offset;
211
+ uint64_t sesparse_clusters_offset;
212
+ int32_t entry_size;
213
int version;
214
int64_t sectors;
215
int64_t end_sector;
216
int64_t flat_start_offset;
217
int64_t l1_table_offset;
218
int64_t l1_backup_table_offset;
219
- uint32_t *l1_table;
220
+ void *l1_table;
221
uint32_t *l1_backup_table;
222
unsigned int l1_size;
223
uint32_t l1_entry_sectors;
224
225
unsigned int l2_size;
226
- uint32_t *l2_cache;
227
+ void *l2_cache;
228
uint32_t l2_cache_offsets[L2_CACHE_SIZE];
229
uint32_t l2_cache_counts[L2_CACHE_SIZE];
230
231
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
232
* minimal L2 table size: 512 entries
233
* 8 TB is still more than the maximal value supported for
234
* VMDK3 & VMDK4 which is 2TB.
235
+ * 64TB - for "ESXi seSparse Extent"
236
+ * minimal cluster size: 512B (default is 4KB)
237
+ * L2 table size: 4096 entries (const).
238
+ * 64TB is more than the maximal value supported for
239
+ * seSparse VMDKs (which is slightly less than 64TB)
240
*/
241
error_setg(errp, "L1 size too big");
242
return -EFBIG;
243
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
244
extent->l2_size = l2_size;
245
extent->cluster_sectors = flat ? sectors : cluster_sectors;
246
extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
247
+ extent->entry_size = sizeof(uint32_t);
248
249
if (s->num_extents > 1) {
250
extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
251
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
252
int i;
253
254
/* read the L1 table */
255
- l1_size = extent->l1_size * sizeof(uint32_t);
256
+ l1_size = extent->l1_size * extent->entry_size;
257
extent->l1_table = g_try_malloc(l1_size);
258
if (l1_size && extent->l1_table == NULL) {
259
return -ENOMEM;
260
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
261
goto fail_l1;
262
}
263
for (i = 0; i < extent->l1_size; i++) {
264
- le32_to_cpus(&extent->l1_table[i]);
265
+ if (extent->entry_size == sizeof(uint64_t)) {
266
+ le64_to_cpus((uint64_t *)extent->l1_table + i);
267
+ } else {
268
+ assert(extent->entry_size == sizeof(uint32_t));
269
+ le32_to_cpus((uint32_t *)extent->l1_table + i);
270
+ }
271
}
272
273
if (extent->l1_backup_table_offset) {
274
+ assert(!extent->sesparse);
275
extent->l1_backup_table = g_try_malloc(l1_size);
276
if (l1_size && extent->l1_backup_table == NULL) {
277
ret = -ENOMEM;
278
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
279
}
280
281
extent->l2_cache =
282
- g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
283
+ g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
284
return 0;
285
fail_l1b:
286
g_free(extent->l1_backup_table);
287
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
288
return ret;
46
}
289
}
47
290
48
-static int hdev_get_max_segments(int fd, struct stat *st)
291
+#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
49
+/*
292
+#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
50
+ * Get a sysfs attribute value as character string.
293
+
51
+ */
294
+/* Strict checks - format not officially documented */
52
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
295
+static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
53
+ char **val) {
296
+ Error **errp)
54
+#ifdef CONFIG_LINUX
297
+{
55
+ g_autofree char *sysfspath = NULL;
298
+ header->magic = le64_to_cpu(header->magic);
299
+ header->version = le64_to_cpu(header->version);
300
+ header->grain_size = le64_to_cpu(header->grain_size);
301
+ header->grain_table_size = le64_to_cpu(header->grain_table_size);
302
+ header->flags = le64_to_cpu(header->flags);
303
+ header->reserved1 = le64_to_cpu(header->reserved1);
304
+ header->reserved2 = le64_to_cpu(header->reserved2);
305
+ header->reserved3 = le64_to_cpu(header->reserved3);
306
+ header->reserved4 = le64_to_cpu(header->reserved4);
307
+
308
+ header->volatile_header_offset =
309
+ le64_to_cpu(header->volatile_header_offset);
310
+ header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
311
+
312
+ header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
313
+ header->journal_header_size = le64_to_cpu(header->journal_header_size);
314
+
315
+ header->journal_offset = le64_to_cpu(header->journal_offset);
316
+ header->journal_size = le64_to_cpu(header->journal_size);
317
+
318
+ header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
319
+ header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
320
+
321
+ header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
322
+ header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
323
+
324
+ header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
325
+ header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
326
+
327
+ header->backmap_offset = le64_to_cpu(header->backmap_offset);
328
+ header->backmap_size = le64_to_cpu(header->backmap_size);
329
+
330
+ header->grains_offset = le64_to_cpu(header->grains_offset);
331
+ header->grains_size = le64_to_cpu(header->grains_size);
332
+
333
+ if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
334
+ error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
335
+ header->magic);
336
+ return -EINVAL;
337
+ }
338
+
339
+ if (header->version != 0x0000000200000001) {
340
+ error_setg(errp, "Unsupported version: 0x%016" PRIx64,
341
+ header->version);
342
+ return -ENOTSUP;
343
+ }
344
+
345
+ if (header->grain_size != 8) {
346
+ error_setg(errp, "Unsupported grain size: %" PRIu64,
347
+ header->grain_size);
348
+ return -ENOTSUP;
349
+ }
350
+
351
+ if (header->grain_table_size != 64) {
352
+ error_setg(errp, "Unsupported grain table size: %" PRIu64,
353
+ header->grain_table_size);
354
+ return -ENOTSUP;
355
+ }
356
+
357
+ if (header->flags != 0) {
358
+ error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
359
+ header->flags);
360
+ return -ENOTSUP;
361
+ }
362
+
363
+ if (header->reserved1 != 0 || header->reserved2 != 0 ||
364
+ header->reserved3 != 0 || header->reserved4 != 0) {
365
+ error_setg(errp, "Unsupported reserved bits:"
366
+ " 0x%016" PRIx64 " 0x%016" PRIx64
367
+ " 0x%016" PRIx64 " 0x%016" PRIx64,
368
+ header->reserved1, header->reserved2,
369
+ header->reserved3, header->reserved4);
370
+ return -ENOTSUP;
371
+ }
372
+
373
+ /* check that padding is 0 */
374
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
375
+ error_setg(errp, "Unsupported non-zero const header padding");
376
+ return -ENOTSUP;
377
+ }
378
+
379
+ return 0;
380
+}
381
+
382
+static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
383
+ Error **errp)
384
+{
385
+ header->magic = le64_to_cpu(header->magic);
386
+ header->free_gt_number = le64_to_cpu(header->free_gt_number);
387
+ header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
388
+ header->replay_journal = le64_to_cpu(header->replay_journal);
389
+
390
+ if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
391
+ error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
392
+ header->magic);
393
+ return -EINVAL;
394
+ }
395
+
396
+ if (header->replay_journal) {
397
+ error_setg(errp, "Image is dirty, Replaying journal not supported");
398
+ return -ENOTSUP;
399
+ }
400
+
401
+ /* check that padding is 0 */
402
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
403
+ error_setg(errp, "Unsupported non-zero volatile header padding");
404
+ return -ENOTSUP;
405
+ }
406
+
407
+ return 0;
408
+}
409
+
410
+static int vmdk_open_se_sparse(BlockDriverState *bs,
411
+ BdrvChild *file,
412
+ int flags, Error **errp)
413
+{
56
+ int ret;
414
+ int ret;
57
+ size_t len;
415
+ VMDKSESparseConstHeader const_header;
58
+
416
+ VMDKSESparseVolatileHeader volatile_header;
59
+ if (!S_ISBLK(st->st_mode)) {
417
+ VmdkExtent *extent;
60
+ return -ENOTSUP;
418
+
61
+ }
419
+ ret = bdrv_apply_auto_read_only(bs,
62
+
420
+ "No write support for seSparse images available", errp);
63
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
64
+ major(st->st_rdev), minor(st->st_rdev),
65
+ attribute);
66
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
67
+ if (ret == -1) {
68
+ return -ENOENT;
69
+ }
70
+
71
+ /* The file is ended with '\n' */
72
+ char *p;
73
+ p = *val;
74
+ if (*(p + len - 1) == '\n') {
75
+ *(p + len - 1) = '\0';
76
+ }
77
+ return ret;
78
+#else
79
+ return -ENOTSUP;
80
+#endif
81
+}
82
+
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
84
+{
85
+ g_autofree char *val = NULL;
86
+ int ret;
87
+
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
89
+ if (ret < 0) {
421
+ if (ret < 0) {
90
+ return ret;
422
+ return ret;
91
+ }
423
+ }
92
+
424
+
93
+ if (strcmp(val, "host-managed") == 0) {
425
+ assert(sizeof(const_header) == SECTOR_SIZE);
94
+ *zoned = BLK_Z_HM;
426
+
95
+ } else if (strcmp(val, "host-aware") == 0) {
427
+ ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
96
+ *zoned = BLK_Z_HA;
428
+ if (ret < 0) {
97
+ } else if (strcmp(val, "none") == 0) {
429
+ bdrv_refresh_filename(file->bs);
98
+ *zoned = BLK_Z_NONE;
430
+ error_setg_errno(errp, -ret,
99
+ } else {
431
+ "Could not read const header from file '%s'",
100
+ return -ENOTSUP;
432
+ file->bs->filename);
101
+ }
433
+ return ret;
102
+ return 0;
434
+ }
103
+}
435
+
104
+
436
+ /* check const header */
105
+/*
437
+ ret = check_se_sparse_const_header(&const_header, errp);
106
+ * Get a sysfs attribute value as a long integer.
107
+ */
108
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
109
{
110
#ifdef CONFIG_LINUX
111
- char buf[32];
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
119
+ if (ret < 0) {
438
+ if (ret < 0) {
120
+ return ret;
439
+ return ret;
121
+ }
440
+ }
122
+
441
+
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
442
+ assert(sizeof(volatile_header) == SECTOR_SIZE);
124
+ ret = qemu_strtol(str, &end, 10, &val);
443
+
125
+ if (ret == 0 && end && *end == '\0') {
444
+ ret = bdrv_pread(file,
126
+ ret = val;
445
+ const_header.volatile_header_offset * SECTOR_SIZE,
127
+ }
446
+ &volatile_header, sizeof(volatile_header));
447
+ if (ret < 0) {
448
+ bdrv_refresh_filename(file->bs);
449
+ error_setg_errno(errp, -ret,
450
+ "Could not read volatile header from file '%s'",
451
+ file->bs->filename);
452
+ return ret;
453
+ }
454
+
455
+ /* check volatile header */
456
+ ret = check_se_sparse_volatile_header(&volatile_header, errp);
457
+ if (ret < 0) {
458
+ return ret;
459
+ }
460
+
461
+ ret = vmdk_add_extent(bs, file, false,
462
+ const_header.capacity,
463
+ const_header.grain_dir_offset * SECTOR_SIZE,
464
+ 0,
465
+ const_header.grain_dir_size *
466
+ SECTOR_SIZE / sizeof(uint64_t),
467
+ const_header.grain_table_size *
468
+ SECTOR_SIZE / sizeof(uint64_t),
469
+ const_header.grain_size,
470
+ &extent,
471
+ errp);
472
+ if (ret < 0) {
473
+ return ret;
474
+ }
475
+
476
+ extent->sesparse = true;
477
+ extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
478
+ extent->sesparse_clusters_offset = const_header.grains_offset;
479
+ extent->entry_size = sizeof(uint64_t);
480
+
481
+ ret = vmdk_init_tables(bs, extent, errp);
482
+ if (ret) {
483
+ /* free extent allocated by vmdk_add_extent */
484
+ vmdk_free_last_extent(bs);
485
+ }
486
+
128
+ return ret;
487
+ return ret;
129
+#else
130
+ return -ENOTSUP;
131
+#endif
132
+}
488
+}
133
+
489
+
134
+static int hdev_get_max_segments(int fd, struct stat *st)
490
static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
135
+{
491
QDict *options, Error **errp);
136
+#ifdef CONFIG_LINUX
492
137
int ret;
493
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
138
- int sysfd = -1;
494
* RW [size in sectors] SPARSE "file-name.vmdk"
139
- long max_segments;
495
* RW [size in sectors] VMFS "file-name.vmdk"
140
496
* RW [size in sectors] VMFSSPARSE "file-name.vmdk"
141
if (S_ISCHR(st->st_mode)) {
497
+ * RW [size in sectors] SESPARSE "file-name.vmdk"
142
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
498
*/
143
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
499
flat_offset = -1;
500
matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
501
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
502
503
if (sectors <= 0 ||
504
(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
505
- strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
506
+ strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
507
+ strcmp(type, "SESPARSE")) ||
508
(strcmp(access, "RW"))) {
509
continue;
144
}
510
}
145
return -ENOTSUP;
511
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
146
}
512
return ret;
147
-
513
}
148
- if (!S_ISBLK(st->st_mode)) {
514
extent = &s->extents[s->num_extents - 1];
149
- return -ENOTSUP;
515
+ } else if (!strcmp(type, "SESPARSE")) {
150
- }
516
+ ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
151
-
517
+ if (ret) {
152
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
518
+ bdrv_unref_child(bs, extent_file);
153
- major(st->st_rdev), minor(st->st_rdev));
519
+ return ret;
154
- sysfd = open(sysfspath, O_RDONLY);
520
+ }
155
- if (sysfd == -1) {
521
+ extent = &s->extents[s->num_extents - 1];
156
- ret = -errno;
522
} else {
157
- goto out;
523
error_setg(errp, "Unsupported extent type '%s'", type);
158
- }
524
bdrv_unref_child(bs, extent_file);
159
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
525
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
160
- if (ret < 0) {
526
if (strcmp(ct, "monolithicFlat") &&
161
- ret = -errno;
527
strcmp(ct, "vmfs") &&
162
- goto out;
528
strcmp(ct, "vmfsSparse") &&
163
- } else if (ret == 0) {
529
+ strcmp(ct, "seSparse") &&
164
- ret = -EIO;
530
strcmp(ct, "twoGbMaxExtentSparse") &&
165
- goto out;
531
strcmp(ct, "twoGbMaxExtentFlat")) {
166
- }
532
error_setg(errp, "Unsupported image type '%s'", ct);
167
- buf[ret] = 0;
533
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
168
- /* The file is ended with '\n', pass 'end' to accept that. */
169
- ret = qemu_strtol(buf, &end, 10, &max_segments);
170
- if (ret == 0 && end && *end == '\n') {
171
- ret = max_segments;
172
- }
173
-
174
-out:
175
- if (sysfd != -1) {
176
- close(sysfd);
177
- }
178
- g_free(sysfspath);
179
- return ret;
180
+ return get_sysfs_long_val(st, "max_segments");
181
#else
182
return -ENOTSUP;
183
#endif
184
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
185
{
534
{
186
BDRVRawState *s = bs->opaque;
535
unsigned int l1_index, l2_offset, l2_index;
187
struct stat st;
536
int min_index, i, j;
188
+ int ret;
537
- uint32_t min_count, *l2_table;
189
+ BlockZoneModel zoned;
538
+ uint32_t min_count;
190
539
+ void *l2_table;
191
s->needs_alignment = raw_needs_alignment(bs);
540
bool zeroed = false;
192
raw_probe_alignment(bs, s->fd, errp);
541
int64_t ret;
193
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
542
int64_t cluster_sector;
194
bs->bl.max_hw_iov = ret;
543
+ unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
544
545
if (m_data) {
546
m_data->valid = 0;
547
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
548
if (l1_index >= extent->l1_size) {
549
return VMDK_ERROR;
550
}
551
- l2_offset = extent->l1_table[l1_index];
552
+ if (extent->sesparse) {
553
+ uint64_t l2_offset_u64;
554
+
555
+ assert(extent->entry_size == sizeof(uint64_t));
556
+
557
+ l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
558
+ if (l2_offset_u64 == 0) {
559
+ l2_offset = 0;
560
+ } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
561
+ /*
562
+ * Top most nibble is 0x1 if grain table is allocated.
563
+ * strict check - top most 4 bytes must be 0x10000000 since max
564
+ * supported size is 64TB for disk - so no more than 64TB / 16MB
565
+ * grain directories which is smaller than uint32,
566
+ * where 16MB is the only supported default grain table coverage.
567
+ */
568
+ return VMDK_ERROR;
569
+ } else {
570
+ l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
571
+ l2_offset_u64 = extent->sesparse_l2_tables_offset +
572
+ l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
573
+ if (l2_offset_u64 > 0x00000000ffffffff) {
574
+ return VMDK_ERROR;
575
+ }
576
+ l2_offset = (unsigned int)(l2_offset_u64);
577
+ }
578
+ } else {
579
+ assert(extent->entry_size == sizeof(uint32_t));
580
+ l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
581
+ }
582
if (!l2_offset) {
583
return VMDK_UNALLOC;
584
}
585
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
586
extent->l2_cache_counts[j] >>= 1;
587
}
588
}
589
- l2_table = extent->l2_cache + (i * extent->l2_size);
590
+ l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
591
goto found;
195
}
592
}
196
}
593
}
197
+
594
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
198
+ ret = get_sysfs_zoned_model(&st, &zoned);
595
min_index = i;
199
+ if (ret < 0) {
596
}
200
+ zoned = BLK_Z_NONE;
597
}
201
+ }
598
- l2_table = extent->l2_cache + (min_index * extent->l2_size);
202
+ bs->bl.zoned = zoned;
599
+ l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
203
}
600
BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
204
601
if (bdrv_pread(extent->file,
205
static int check_for_dasd(int fd)
602
(int64_t)l2_offset * 512,
603
l2_table,
604
- extent->l2_size * sizeof(uint32_t)
605
- ) != extent->l2_size * sizeof(uint32_t)) {
606
+ l2_size_bytes
607
+ ) != l2_size_bytes) {
608
return VMDK_ERROR;
609
}
610
611
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
612
extent->l2_cache_counts[min_index] = 1;
613
found:
614
l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
615
- cluster_sector = le32_to_cpu(l2_table[l2_index]);
616
617
- if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
618
- zeroed = true;
619
+ if (extent->sesparse) {
620
+ cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
621
+ switch (cluster_sector & 0xf000000000000000) {
622
+ case 0x0000000000000000:
623
+ /* unallocated grain */
624
+ if (cluster_sector != 0) {
625
+ return VMDK_ERROR;
626
+ }
627
+ break;
628
+ case 0x1000000000000000:
629
+ /* scsi-unmapped grain - fallthrough */
630
+ case 0x2000000000000000:
631
+ /* zero grain */
632
+ zeroed = true;
633
+ break;
634
+ case 0x3000000000000000:
635
+ /* allocated grain */
636
+ cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
637
+ ((cluster_sector & 0x0000ffffffffffff) << 12));
638
+ cluster_sector = extent->sesparse_clusters_offset +
639
+ cluster_sector * extent->cluster_sectors;
640
+ break;
641
+ default:
642
+ return VMDK_ERROR;
643
+ }
644
+ } else {
645
+ cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
646
+
647
+ if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
648
+ zeroed = true;
649
+ }
650
}
651
652
if (!cluster_sector || zeroed) {
653
if (!allocate) {
654
return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
655
}
656
+ assert(!extent->sesparse);
657
658
if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
659
return VMDK_ERROR;
660
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
661
m_data->l1_index = l1_index;
662
m_data->l2_index = l2_index;
663
m_data->l2_offset = l2_offset;
664
- m_data->l2_cache_entry = &l2_table[l2_index];
665
+ m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
666
}
667
}
668
*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
669
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
670
if (!extent) {
671
return -EIO;
672
}
673
+ if (extent->sesparse) {
674
+ return -ENOTSUP;
675
+ }
676
offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
677
n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
678
- offset_in_cluster);
206
--
679
--
207
2.39.2
680
2.21.0
208
681
209
682
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Pino Toscano <ptoscano@redhat.com>
2
2
3
Add zoned device option to host_device BlockDriver. It will be presented only
3
Rewrite the implementation of the ssh block driver to use libssh instead
4
for zoned host block devices. By adding zone management operations to the
4
of libssh2. The libssh library has various advantages over libssh2:
5
host_block_device BlockDriver, users can use the new block layer APIs
5
- easier API for authentication (for example for using ssh-agent)
6
including Report Zone and four zone management operations
6
- easier API for known_hosts handling
7
(open, close, finish, reset, reset_all).
7
- supports newer types of keys in known_hosts
8
8
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
9
Use APIs/features available in libssh 0.8 conditionally, to support
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
10
older versions (which are not recommended though).
11
zone_finish(zf).
12
11
13
For example, to test zone_report, use following command:
12
Adjust the iotest 207 according to the different error message, and to
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
13
find the default key type for localhost (to properly compare the
15
-c "zrp offset nr_zones"
14
fingerprint with).
15
Contributed-by: Max Reitz <mreitz@redhat.com>
16
16
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Adjust the various Docker/Travis scripts to use libssh when available
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
18
instead of libssh2. The mingw/mxe testing is dropped for now, as there
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
19
are no packages for it.
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
20
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
21
Signed-off-by: Pino Toscano <ptoscano@redhat.com>
22
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
22
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
23
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
23
Acked-by: Alex Bennée <alex.bennee@linaro.org>
24
<philmd@linaro.org> and remove spurious ret = -errno in
24
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
25
raw_co_zone_mgmt().
25
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
26
--Stefan]
26
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
27
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
27
Signed-off-by: Max Reitz <mreitz@redhat.com>
28
---
28
---
29
meson.build | 4 +
29
configure | 65 +-
30
include/block/block-io.h | 9 +
30
block/Makefile.objs | 6 +-
31
include/block/block_int-common.h | 21 ++
31
block/ssh.c | 652 ++++++++++--------
32
include/block/raw-aio.h | 6 +-
32
.travis.yml | 4 +-
33
include/sysemu/block-backend-io.h | 18 ++
33
block/trace-events | 14 +-
34
block/block-backend.c | 133 +++++++++++++
34
docs/qemu-block-drivers.texi | 2 +-
35
block/file-posix.c | 306 +++++++++++++++++++++++++++++-
35
.../dockerfiles/debian-win32-cross.docker | 1 -
36
block/io.c | 41 ++++
36
.../dockerfiles/debian-win64-cross.docker | 1 -
37
qemu-io-cmds.c | 149 +++++++++++++++
37
tests/docker/dockerfiles/fedora.docker | 4 +-
38
9 files changed, 684 insertions(+), 3 deletions(-)
38
tests/docker/dockerfiles/ubuntu.docker | 2 +-
39
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
40
tests/qemu-iotests/207 | 54 +-
41
tests/qemu-iotests/207.out | 2 +-
42
13 files changed, 449 insertions(+), 360 deletions(-)
39
43
40
diff --git a/meson.build b/meson.build
44
diff --git a/configure b/configure
45
index XXXXXXX..XXXXXXX 100755
46
--- a/configure
47
+++ b/configure
48
@@ -XXX,XX +XXX,XX @@ auth_pam=""
49
vte=""
50
virglrenderer=""
51
tpm=""
52
-libssh2=""
53
+libssh=""
54
live_block_migration="yes"
55
numa=""
56
tcmalloc="no"
57
@@ -XXX,XX +XXX,XX @@ for opt do
58
;;
59
--enable-tpm) tpm="yes"
60
;;
61
- --disable-libssh2) libssh2="no"
62
+ --disable-libssh) libssh="no"
63
;;
64
- --enable-libssh2) libssh2="yes"
65
+ --enable-libssh) libssh="yes"
66
;;
67
--disable-live-block-migration) live_block_migration="no"
68
;;
69
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
70
coroutine-pool coroutine freelist (better performance)
71
glusterfs GlusterFS backend
72
tpm TPM support
73
- libssh2 ssh block device support
74
+ libssh ssh block device support
75
numa libnuma support
76
libxml2 for Parallels image format
77
tcmalloc tcmalloc support
78
@@ -XXX,XX +XXX,XX @@ EOF
79
fi
80
81
##########################################
82
-# libssh2 probe
83
-min_libssh2_version=1.2.8
84
-if test "$libssh2" != "no" ; then
85
- if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
86
- libssh2_cflags=$($pkg_config libssh2 --cflags)
87
- libssh2_libs=$($pkg_config libssh2 --libs)
88
- libssh2=yes
89
+# libssh probe
90
+if test "$libssh" != "no" ; then
91
+ if $pkg_config --exists libssh; then
92
+ libssh_cflags=$($pkg_config libssh --cflags)
93
+ libssh_libs=$($pkg_config libssh --libs)
94
+ libssh=yes
95
else
96
- if test "$libssh2" = "yes" ; then
97
- error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
98
+ if test "$libssh" = "yes" ; then
99
+ error_exit "libssh required for --enable-libssh"
100
fi
101
- libssh2=no
102
+ libssh=no
103
fi
104
fi
105
106
##########################################
107
-# libssh2_sftp_fsync probe
108
+# Check for libssh 0.8
109
+# This is done like this instead of using the LIBSSH_VERSION_* and
110
+# SSH_VERSION_* macros because some distributions in the past shipped
111
+# snapshots of the future 0.8 from Git, and those snapshots did not
112
+# have updated version numbers (still referring to 0.7.0).
113
114
-if test "$libssh2" = "yes"; then
115
+if test "$libssh" = "yes"; then
116
cat > $TMPC <<EOF
117
-#include <stdio.h>
118
-#include <libssh2.h>
119
-#include <libssh2_sftp.h>
120
-int main(void) {
121
- LIBSSH2_SESSION *session;
122
- LIBSSH2_SFTP *sftp;
123
- LIBSSH2_SFTP_HANDLE *sftp_handle;
124
- session = libssh2_session_init ();
125
- sftp = libssh2_sftp_init (session);
126
- sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
127
- libssh2_sftp_fsync (sftp_handle);
128
- return 0;
129
-}
130
+#include <libssh/libssh.h>
131
+int main(void) { return ssh_get_server_publickey(NULL, NULL); }
132
EOF
133
- # libssh2_cflags/libssh2_libs defined in previous test.
134
- if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
135
- QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
136
+ if compile_prog "$libssh_cflags" "$libssh_libs"; then
137
+ libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
138
fi
139
fi
140
141
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
142
echo "gcov $gcov_tool"
143
echo "gcov enabled $gcov"
144
echo "TPM support $tpm"
145
-echo "libssh2 support $libssh2"
146
+echo "libssh support $libssh"
147
echo "QOM debugging $qom_cast_debug"
148
echo "Live block migration $live_block_migration"
149
echo "lzo support $lzo"
150
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
151
echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
152
fi
153
154
-if test "$libssh2" = "yes" ; then
155
- echo "CONFIG_LIBSSH2=m" >> $config_host_mak
156
- echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
157
- echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
158
+if test "$libssh" = "yes" ; then
159
+ echo "CONFIG_LIBSSH=m" >> $config_host_mak
160
+ echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
161
+ echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
162
fi
163
164
if test "$live_block_migration" = "yes" ; then
165
diff --git a/block/Makefile.objs b/block/Makefile.objs
41
index XXXXXXX..XXXXXXX 100644
166
index XXXXXXX..XXXXXXX 100644
42
--- a/meson.build
167
--- a/block/Makefile.objs
43
+++ b/meson.build
168
+++ b/block/Makefile.objs
44
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed())
169
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
45
# has_header
170
block-obj-$(CONFIG_RBD) += rbd.o
46
config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
171
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
47
config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
172
block-obj-$(CONFIG_VXHS) += vxhs.o
48
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
173
-block-obj-$(CONFIG_LIBSSH2) += ssh.o
49
config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
174
+block-obj-$(CONFIG_LIBSSH) += ssh.o
50
config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
175
block-obj-y += accounting.o dirty-bitmap.o
51
config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
176
block-obj-y += write-threshold.o
52
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
177
block-obj-y += backup.o
53
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
178
@@ -XXX,XX +XXX,XX @@ rbd.o-libs := $(RBD_LIBS)
54
cc.has_member('struct stat', 'st_atim',
179
gluster.o-cflags := $(GLUSTERFS_CFLAGS)
55
prefix: '#include <sys/stat.h>'))
180
gluster.o-libs := $(GLUSTERFS_LIBS)
56
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
181
vxhs.o-libs := $(VXHS_LIBS)
57
+ cc.has_member('struct blk_zone', 'capacity',
182
-ssh.o-cflags := $(LIBSSH2_CFLAGS)
58
+ prefix: '#include <linux/blkzoned.h>'))
183
-ssh.o-libs := $(LIBSSH2_LIBS)
59
184
+ssh.o-cflags := $(LIBSSH_CFLAGS)
60
# has_type
185
+ssh.o-libs := $(LIBSSH_LIBS)
61
config_host_data.set('CONFIG_IOVEC',
186
block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
62
diff --git a/include/block/block-io.h b/include/block/block-io.h
187
block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
188
dmg-bz2.o-libs := $(BZIP2_LIBS)
189
diff --git a/block/ssh.c b/block/ssh.c
63
index XXXXXXX..XXXXXXX 100644
190
index XXXXXXX..XXXXXXX 100644
64
--- a/include/block/block-io.h
191
--- a/block/ssh.c
65
+++ b/include/block/block-io.h
192
+++ b/block/ssh.c
66
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
67
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
68
int64_t bytes);
69
70
+/* Report zone information of zone block device. */
71
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
72
+ int64_t offset,
73
+ unsigned int *nr_zones,
74
+ BlockZoneDescriptor *zones);
75
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
76
+ BlockZoneOp op,
77
+ int64_t offset, int64_t len);
78
+
79
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
80
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
81
int64_t bytes, int64_t *pnum, int64_t *map,
82
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
83
index XXXXXXX..XXXXXXX 100644
84
--- a/include/block/block_int-common.h
85
+++ b/include/block/block_int-common.h
86
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
87
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
88
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
89
90
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
91
+ int64_t offset, unsigned int *nr_zones,
92
+ BlockZoneDescriptor *zones);
93
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
94
+ int64_t offset, int64_t len);
95
+
96
/* removable device specific */
97
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
98
BlockDriverState *bs);
99
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
100
101
/* device zone model */
102
BlockZoneModel zoned;
103
+
104
+ /* zone size expressed in bytes */
105
+ uint32_t zone_size;
106
+
107
+ /* total number of zones */
108
+ uint32_t nr_zones;
109
+
110
+ /* maximum sectors of a zone append write operation */
111
+ int64_t max_append_sectors;
112
+
113
+ /* maximum number of open zones */
114
+ int64_t max_open_zones;
115
+
116
+ /* maximum number of active zones */
117
+ int64_t max_active_zones;
118
} BlockLimits;
119
120
typedef struct BdrvOpBlocker BdrvOpBlocker;
121
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
122
index XXXXXXX..XXXXXXX 100644
123
--- a/include/block/raw-aio.h
124
+++ b/include/block/raw-aio.h
125
@@ -XXX,XX +XXX,XX @@
193
@@ -XXX,XX +XXX,XX @@
126
#define QEMU_AIO_WRITE_ZEROES 0x0020
194
127
#define QEMU_AIO_COPY_RANGE 0x0040
195
#include "qemu/osdep.h"
128
#define QEMU_AIO_TRUNCATE 0x0080
196
129
+#define QEMU_AIO_ZONE_REPORT 0x0100
197
-#include <libssh2.h>
130
+#define QEMU_AIO_ZONE_MGMT 0x0200
198
-#include <libssh2_sftp.h>
131
#define QEMU_AIO_TYPE_MASK \
199
+#include <libssh/libssh.h>
132
(QEMU_AIO_READ | \
200
+#include <libssh/sftp.h>
133
QEMU_AIO_WRITE | \
201
202
#include "block/block_int.h"
203
#include "block/qdict.h"
134
@@ -XXX,XX +XXX,XX @@
204
@@ -XXX,XX +XXX,XX @@
135
QEMU_AIO_DISCARD | \
205
#include "trace.h"
136
QEMU_AIO_WRITE_ZEROES | \
206
137
QEMU_AIO_COPY_RANGE | \
207
/*
138
- QEMU_AIO_TRUNCATE)
208
- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself. Note
139
+ QEMU_AIO_TRUNCATE | \
209
- * that this requires that libssh2 was specially compiled with the
140
+ QEMU_AIO_ZONE_REPORT | \
210
- * `./configure --enable-debug' option, so most likely you will have
141
+ QEMU_AIO_ZONE_MGMT)
211
- * to compile it yourself. The meaning of <bitmask> is described
142
212
- * here: http://www.libssh2.org/libssh2_trace.html
143
/* AIO flags */
213
+ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
144
#define QEMU_AIO_MISALIGNED 0x1000
214
+ * The meaning of <level> is described here:
145
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
215
+ * http://api.libssh.org/master/group__libssh__log.html
146
index XXXXXXX..XXXXXXX 100644
216
*/
147
--- a/include/sysemu/block-backend-io.h
217
-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
148
+++ b/include/sysemu/block-backend-io.h
218
+#define TRACE_LIBSSH 0 /* see: SSH_LOG_* */
149
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
219
150
BlockCompletionFunc *cb, void *opaque);
220
typedef struct BDRVSSHState {
151
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
221
/* Coroutine. */
152
BlockCompletionFunc *cb, void *opaque);
222
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
153
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
223
154
+ unsigned int *nr_zones,
224
/* SSH connection. */
155
+ BlockZoneDescriptor *zones,
225
int sock; /* socket */
156
+ BlockCompletionFunc *cb, void *opaque);
226
- LIBSSH2_SESSION *session; /* ssh session */
157
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
227
- LIBSSH2_SFTP *sftp; /* sftp session */
158
+ int64_t offset, int64_t len,
228
- LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
159
+ BlockCompletionFunc *cb, void *opaque);
229
+ ssh_session session; /* ssh session */
160
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
230
+ sftp_session sftp; /* sftp session */
161
BlockCompletionFunc *cb, void *opaque);
231
+ sftp_file sftp_handle; /* sftp remote file handle */
162
void blk_aio_cancel_async(BlockAIOCB *acb);
232
163
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
233
- /* See ssh_seek() function below. */
164
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
234
- int64_t offset;
165
int64_t bytes, BdrvRequestFlags flags);
235
- bool offset_op_read;
166
236
-
167
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
237
- /* File attributes at open. We try to keep the .filesize field
168
+ unsigned int *nr_zones,
238
+ /*
169
+ BlockZoneDescriptor *zones);
239
+ * File attributes at open. We try to keep the .size field
170
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
240
* updated if it changes (eg by writing at the end of the file).
171
+ unsigned int *nr_zones,
241
*/
172
+ BlockZoneDescriptor *zones);
242
- LIBSSH2_SFTP_ATTRIBUTES attrs;
173
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
243
+ sftp_attributes attrs;
174
+ int64_t offset, int64_t len);
244
175
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
245
InetSocketAddress *inet;
176
+ int64_t offset, int64_t len);
246
177
+
247
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
178
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
248
{
179
int64_t bytes);
249
memset(s, 0, sizeof *s);
180
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
250
s->sock = -1;
181
diff --git a/block/block-backend.c b/block/block-backend.c
251
- s->offset = -1;
182
index XXXXXXX..XXXXXXX 100644
252
qemu_co_mutex_init(&s->lock);
183
--- a/block/block-backend.c
253
}
184
+++ b/block/block-backend.c
254
185
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
255
@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
256
{
257
g_free(s->user);
258
259
+ if (s->attrs) {
260
+ sftp_attributes_free(s->attrs);
261
+ }
262
if (s->sftp_handle) {
263
- libssh2_sftp_close(s->sftp_handle);
264
+ sftp_close(s->sftp_handle);
265
}
266
if (s->sftp) {
267
- libssh2_sftp_shutdown(s->sftp);
268
+ sftp_free(s->sftp);
269
}
270
if (s->session) {
271
- libssh2_session_disconnect(s->session,
272
- "from qemu ssh client: "
273
- "user closed the connection");
274
- libssh2_session_free(s->session);
275
- }
276
- if (s->sock >= 0) {
277
- close(s->sock);
278
+ ssh_disconnect(s->session);
279
+ ssh_free(s->session); /* This frees s->sock */
280
}
281
}
282
283
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
284
va_end(args);
285
286
if (s->session) {
287
- char *ssh_err;
288
+ const char *ssh_err;
289
int ssh_err_code;
290
291
- /* This is not an errno. See <libssh2.h>. */
292
- ssh_err_code = libssh2_session_last_error(s->session,
293
- &ssh_err, NULL, 0);
294
- error_setg(errp, "%s: %s (libssh2 error code: %d)",
295
+ /* This is not an errno. See <libssh/libssh.h>. */
296
+ ssh_err = ssh_get_error(s->session);
297
+ ssh_err_code = ssh_get_error_code(s->session);
298
+ error_setg(errp, "%s: %s (libssh error code: %d)",
299
msg, ssh_err, ssh_err_code);
300
} else {
301
error_setg(errp, "%s", msg);
302
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
303
va_end(args);
304
305
if (s->sftp) {
306
- char *ssh_err;
307
+ const char *ssh_err;
308
int ssh_err_code;
309
- unsigned long sftp_err_code;
310
+ int sftp_err_code;
311
312
- /* This is not an errno. See <libssh2.h>. */
313
- ssh_err_code = libssh2_session_last_error(s->session,
314
- &ssh_err, NULL, 0);
315
- /* See <libssh2_sftp.h>. */
316
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
317
+ /* This is not an errno. See <libssh/libssh.h>. */
318
+ ssh_err = ssh_get_error(s->session);
319
+ ssh_err_code = ssh_get_error_code(s->session);
320
+ /* See <libssh/sftp.h>. */
321
+ sftp_err_code = sftp_get_error(s->sftp);
322
323
error_setg(errp,
324
- "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
325
+ "%s: %s (libssh error code: %d, sftp error code: %d)",
326
msg, ssh_err, ssh_err_code, sftp_err_code);
327
} else {
328
error_setg(errp, "%s", msg);
329
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
330
331
static void sftp_error_trace(BDRVSSHState *s, const char *op)
332
{
333
- char *ssh_err;
334
+ const char *ssh_err;
335
int ssh_err_code;
336
- unsigned long sftp_err_code;
337
+ int sftp_err_code;
338
339
- /* This is not an errno. See <libssh2.h>. */
340
- ssh_err_code = libssh2_session_last_error(s->session,
341
- &ssh_err, NULL, 0);
342
- /* See <libssh2_sftp.h>. */
343
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
344
+ /* This is not an errno. See <libssh/libssh.h>. */
345
+ ssh_err = ssh_get_error(s->session);
346
+ ssh_err_code = ssh_get_error_code(s->session);
347
+ /* See <libssh/sftp.h>. */
348
+ sftp_err_code = sftp_get_error(s->sftp);
349
350
trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
351
}
352
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
353
parse_uri(filename, options, errp);
354
}
355
356
-static int check_host_key_knownhosts(BDRVSSHState *s,
357
- const char *host, int port, Error **errp)
358
+static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
359
{
360
- const char *home;
361
- char *knh_file = NULL;
362
- LIBSSH2_KNOWNHOSTS *knh = NULL;
363
- struct libssh2_knownhost *found;
364
- int ret, r;
365
- const char *hostkey;
366
- size_t len;
367
- int type;
368
-
369
- hostkey = libssh2_session_hostkey(s->session, &len, &type);
370
- if (!hostkey) {
371
+ int ret;
372
+#ifdef HAVE_LIBSSH_0_8
373
+ enum ssh_known_hosts_e state;
374
+ int r;
375
+ ssh_key pubkey;
376
+ enum ssh_keytypes_e pubkey_type;
377
+ unsigned char *server_hash = NULL;
378
+ size_t server_hash_len;
379
+ char *fingerprint = NULL;
380
+
381
+ state = ssh_session_is_known_server(s->session);
382
+ trace_ssh_server_status(state);
383
+
384
+ switch (state) {
385
+ case SSH_KNOWN_HOSTS_OK:
386
+ /* OK */
387
+ trace_ssh_check_host_key_knownhosts();
388
+ break;
389
+ case SSH_KNOWN_HOSTS_CHANGED:
390
ret = -EINVAL;
391
- session_error_setg(errp, s, "failed to read remote host key");
392
+ r = ssh_get_server_publickey(s->session, &pubkey);
393
+ if (r == 0) {
394
+ r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
395
+ &server_hash, &server_hash_len);
396
+ pubkey_type = ssh_key_type(pubkey);
397
+ ssh_key_free(pubkey);
398
+ }
399
+ if (r == 0) {
400
+ fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
401
+ server_hash,
402
+ server_hash_len);
403
+ ssh_clean_pubkey_hash(&server_hash);
404
+ }
405
+ if (fingerprint) {
406
+ error_setg(errp,
407
+ "host key (%s key with fingerprint %s) does not match "
408
+ "the one in known_hosts; this may be a possible attack",
409
+ ssh_key_type_to_char(pubkey_type), fingerprint);
410
+ ssh_string_free_char(fingerprint);
411
+ } else {
412
+ error_setg(errp,
413
+ "host key does not match the one in known_hosts; this "
414
+ "may be a possible attack");
415
+ }
416
goto out;
417
- }
418
-
419
- knh = libssh2_knownhost_init(s->session);
420
- if (!knh) {
421
+ case SSH_KNOWN_HOSTS_OTHER:
422
ret = -EINVAL;
423
- session_error_setg(errp, s,
424
- "failed to initialize known hosts support");
425
+ error_setg(errp,
426
+ "host key for this server not found, another type exists");
427
+ goto out;
428
+ case SSH_KNOWN_HOSTS_UNKNOWN:
429
+ ret = -EINVAL;
430
+ error_setg(errp, "no host key was found in known_hosts");
431
+ goto out;
432
+ case SSH_KNOWN_HOSTS_NOT_FOUND:
433
+ ret = -ENOENT;
434
+ error_setg(errp, "known_hosts file not found");
435
+ goto out;
436
+ case SSH_KNOWN_HOSTS_ERROR:
437
+ ret = -EINVAL;
438
+ error_setg(errp, "error while checking the host");
439
+ goto out;
440
+ default:
441
+ ret = -EINVAL;
442
+ error_setg(errp, "error while checking for known server (%d)", state);
443
goto out;
444
}
445
+#else /* !HAVE_LIBSSH_0_8 */
446
+ int state;
447
448
- home = getenv("HOME");
449
- if (home) {
450
- knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
451
- } else {
452
- knh_file = g_strdup_printf("/root/.ssh/known_hosts");
453
- }
454
-
455
- /* Read all known hosts from OpenSSH-style known_hosts file. */
456
- libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
457
+ state = ssh_is_server_known(s->session);
458
+ trace_ssh_server_status(state);
459
460
- r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
461
- LIBSSH2_KNOWNHOST_TYPE_PLAIN|
462
- LIBSSH2_KNOWNHOST_KEYENC_RAW,
463
- &found);
464
- switch (r) {
465
- case LIBSSH2_KNOWNHOST_CHECK_MATCH:
466
+ switch (state) {
467
+ case SSH_SERVER_KNOWN_OK:
468
/* OK */
469
- trace_ssh_check_host_key_knownhosts(found->key);
470
+ trace_ssh_check_host_key_knownhosts();
471
break;
472
- case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
473
+ case SSH_SERVER_KNOWN_CHANGED:
474
ret = -EINVAL;
475
- session_error_setg(errp, s,
476
- "host key does not match the one in known_hosts"
477
- " (found key %s)", found->key);
478
+ error_setg(errp,
479
+ "host key does not match the one in known_hosts; this "
480
+ "may be a possible attack");
481
goto out;
482
- case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
483
+ case SSH_SERVER_FOUND_OTHER:
484
ret = -EINVAL;
485
- session_error_setg(errp, s, "no host key was found in known_hosts");
486
+ error_setg(errp,
487
+ "host key for this server not found, another type exists");
488
+ goto out;
489
+ case SSH_SERVER_FILE_NOT_FOUND:
490
+ ret = -ENOENT;
491
+ error_setg(errp, "known_hosts file not found");
492
goto out;
493
- case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
494
+ case SSH_SERVER_NOT_KNOWN:
495
ret = -EINVAL;
496
- session_error_setg(errp, s,
497
- "failure matching the host key with known_hosts");
498
+ error_setg(errp, "no host key was found in known_hosts");
499
+ goto out;
500
+ case SSH_SERVER_ERROR:
501
+ ret = -EINVAL;
502
+ error_setg(errp, "server error");
503
goto out;
504
default:
505
ret = -EINVAL;
506
- session_error_setg(errp, s, "unknown error matching the host key"
507
- " with known_hosts (%d)", r);
508
+ error_setg(errp, "error while checking for known server (%d)", state);
509
goto out;
510
}
511
+#endif /* !HAVE_LIBSSH_0_8 */
512
513
/* known_hosts checking successful. */
514
ret = 0;
515
516
out:
517
- if (knh != NULL) {
518
- libssh2_knownhost_free(knh);
519
- }
520
- g_free(knh_file);
186
return ret;
521
return ret;
187
}
522
}
188
523
189
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
524
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
190
+{
525
191
+ BlkAioEmAIOCB *acb = opaque;
526
static int
192
+ BlkRwCo *rwco = &acb->rwco;
527
check_host_key_hash(BDRVSSHState *s, const char *hash,
193
+
528
- int hash_type, size_t fingerprint_len, Error **errp)
194
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
529
+ enum ssh_publickey_hash_type type, Error **errp)
195
+ (unsigned int*)acb->bytes,rwco->iobuf);
530
{
196
+ blk_aio_complete(acb);
531
- const char *fingerprint;
197
+}
532
-
198
+
533
- fingerprint = libssh2_hostkey_hash(s->session, hash_type);
199
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
534
- if (!fingerprint) {
200
+ unsigned int *nr_zones,
535
+ int r;
201
+ BlockZoneDescriptor *zones,
536
+ ssh_key pubkey;
202
+ BlockCompletionFunc *cb, void *opaque)
537
+ unsigned char *server_hash;
203
+{
538
+ size_t server_hash_len;
204
+ BlkAioEmAIOCB *acb;
539
+
205
+ Coroutine *co;
540
+#ifdef HAVE_LIBSSH_0_8
206
+ IO_CODE();
541
+ r = ssh_get_server_publickey(s->session, &pubkey);
207
+
542
+#else
208
+ blk_inc_in_flight(blk);
543
+ r = ssh_get_publickey(s->session, &pubkey);
209
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
544
+#endif
210
+ acb->rwco = (BlkRwCo) {
545
+ if (r != SSH_OK) {
211
+ .blk = blk,
546
session_error_setg(errp, s, "failed to read remote host key");
212
+ .offset = offset,
547
return -EINVAL;
213
+ .iobuf = zones,
548
}
214
+ .ret = NOT_DONE,
549
215
+ };
550
- if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
216
+ acb->bytes = (int64_t)nr_zones,
551
- hash) != 0) {
217
+ acb->has_returned = false;
552
+ r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
218
+
553
+ ssh_key_free(pubkey);
219
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
554
+ if (r != 0) {
220
+ aio_co_enter(blk_get_aio_context(blk), co);
555
+ session_error_setg(errp, s,
221
+
556
+ "failed reading the hash of the server SSH key");
222
+ acb->has_returned = true;
557
+ return -EINVAL;
223
+ if (acb->rwco.ret != NOT_DONE) {
224
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
225
+ blk_aio_complete_bh, acb);
226
+ }
558
+ }
227
+
559
+
228
+ return &acb->common;
560
+ r = compare_fingerprint(server_hash, server_hash_len, hash);
229
+}
561
+ ssh_clean_pubkey_hash(&server_hash);
230
+
562
+ if (r != 0) {
231
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
563
error_setg(errp, "remote host key does not match host_key_check '%s'",
232
+{
564
hash);
233
+ BlkAioEmAIOCB *acb = opaque;
565
return -EPERM;
234
+ BlkRwCo *rwco = &acb->rwco;
566
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
235
+
567
return 0;
236
+ rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf,
568
}
237
+ rwco->offset, acb->bytes);
569
238
+ blk_aio_complete(acb);
570
-static int check_host_key(BDRVSSHState *s, const char *host, int port,
239
+}
571
- SshHostKeyCheck *hkc, Error **errp)
240
+
572
+static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
241
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
573
{
242
+ int64_t offset, int64_t len,
574
SshHostKeyCheckMode mode;
243
+ BlockCompletionFunc *cb, void *opaque) {
575
244
+ BlkAioEmAIOCB *acb;
576
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
245
+ Coroutine *co;
577
case SSH_HOST_KEY_CHECK_MODE_HASH:
246
+ IO_CODE();
578
if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
247
+
579
return check_host_key_hash(s, hkc->u.hash.hash,
248
+ blk_inc_in_flight(blk);
580
- LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
249
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
581
+ SSH_PUBLICKEY_HASH_MD5, errp);
250
+ acb->rwco = (BlkRwCo) {
582
} else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
251
+ .blk = blk,
583
return check_host_key_hash(s, hkc->u.hash.hash,
252
+ .offset = offset,
584
- LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
253
+ .iobuf = (void *)op,
585
+ SSH_PUBLICKEY_HASH_SHA1, errp);
254
+ .ret = NOT_DONE,
586
}
255
+ };
587
g_assert_not_reached();
256
+ acb->bytes = len;
588
break;
257
+ acb->has_returned = false;
589
case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
258
+
590
- return check_host_key_knownhosts(s, host, port, errp);
259
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
591
+ return check_host_key_knownhosts(s, errp);
260
+ aio_co_enter(blk_get_aio_context(blk), co);
592
default:
261
+
593
g_assert_not_reached();
262
+ acb->has_returned = true;
594
}
263
+ if (acb->rwco.ret != NOT_DONE) {
595
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
264
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
596
return -EINVAL;
265
+ blk_aio_complete_bh, acb);
597
}
598
599
-static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
600
+static int authenticate(BDRVSSHState *s, Error **errp)
601
{
602
int r, ret;
603
- const char *userauthlist;
604
- LIBSSH2_AGENT *agent = NULL;
605
- struct libssh2_agent_publickey *identity;
606
- struct libssh2_agent_publickey *prev_identity = NULL;
607
+ int method;
608
609
- userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
610
- if (strstr(userauthlist, "publickey") == NULL) {
611
+ /* Try to authenticate with the "none" method. */
612
+ r = ssh_userauth_none(s->session, NULL);
613
+ if (r == SSH_AUTH_ERROR) {
614
ret = -EPERM;
615
- error_setg(errp,
616
- "remote server does not support \"publickey\" authentication");
617
+ session_error_setg(errp, s, "failed to authenticate using none "
618
+ "authentication");
619
goto out;
620
- }
621
-
622
- /* Connect to ssh-agent and try each identity in turn. */
623
- agent = libssh2_agent_init(s->session);
624
- if (!agent) {
625
- ret = -EINVAL;
626
- session_error_setg(errp, s, "failed to initialize ssh-agent support");
627
- goto out;
628
- }
629
- if (libssh2_agent_connect(agent)) {
630
- ret = -ECONNREFUSED;
631
- session_error_setg(errp, s, "failed to connect to ssh-agent");
632
- goto out;
633
- }
634
- if (libssh2_agent_list_identities(agent)) {
635
- ret = -EINVAL;
636
- session_error_setg(errp, s,
637
- "failed requesting identities from ssh-agent");
638
+ } else if (r == SSH_AUTH_SUCCESS) {
639
+ /* Authenticated! */
640
+ ret = 0;
641
goto out;
642
}
643
644
- for(;;) {
645
- r = libssh2_agent_get_identity(agent, &identity, prev_identity);
646
- if (r == 1) { /* end of list */
647
- break;
648
- }
649
- if (r < 0) {
650
+ method = ssh_userauth_list(s->session, NULL);
651
+ trace_ssh_auth_methods(method);
652
+
653
+ /*
654
+ * Try to authenticate with publickey, using the ssh-agent
655
+ * if available.
656
+ */
657
+ if (method & SSH_AUTH_METHOD_PUBLICKEY) {
658
+ r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
659
+ if (r == SSH_AUTH_ERROR) {
660
ret = -EINVAL;
661
- session_error_setg(errp, s,
662
- "failed to obtain identity from ssh-agent");
663
+ session_error_setg(errp, s, "failed to authenticate using "
664
+ "publickey authentication");
665
goto out;
666
- }
667
- r = libssh2_agent_userauth(agent, user, identity);
668
- if (r == 0) {
669
+ } else if (r == SSH_AUTH_SUCCESS) {
670
/* Authenticated! */
671
ret = 0;
672
goto out;
673
}
674
- /* Failed to authenticate with this identity, try the next one. */
675
- prev_identity = identity;
676
}
677
678
ret = -EPERM;
679
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
680
"and the identities held by your ssh-agent");
681
682
out:
683
- if (agent != NULL) {
684
- /* Note: libssh2 implementation implicitly calls
685
- * libssh2_agent_disconnect if necessary.
686
- */
687
- libssh2_agent_free(agent);
688
- }
689
-
690
return ret;
691
}
692
693
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
694
int ssh_flags, int creat_mode, Error **errp)
695
{
696
int r, ret;
697
- long port = 0;
698
+ unsigned int port = 0;
699
+ int new_sock = -1;
700
701
if (opts->has_user) {
702
s->user = g_strdup(opts->user);
703
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
704
s->inet = opts->server;
705
opts->server = NULL;
706
707
- if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
708
+ if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
709
error_setg(errp, "Use only numeric port value");
710
ret = -EINVAL;
711
goto err;
712
}
713
714
/* Open the socket and connect. */
715
- s->sock = inet_connect_saddr(s->inet, errp);
716
- if (s->sock < 0) {
717
+ new_sock = inet_connect_saddr(s->inet, errp);
718
+ if (new_sock < 0) {
719
ret = -EIO;
720
goto err;
721
}
722
723
+ /*
724
+ * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
725
+ * but do not fail if it cannot be disabled.
726
+ */
727
+ r = socket_set_nodelay(new_sock);
728
+ if (r < 0) {
729
+ warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
730
+ s->inet->host, strerror(errno));
266
+ }
731
+ }
267
+
732
+
268
+ return &acb->common;
733
/* Create SSH session. */
269
+}
734
- s->session = libssh2_session_init();
270
+
735
+ s->session = ssh_new();
271
+/*
736
if (!s->session) {
272
+ * Send a zone_report command.
737
ret = -EINVAL;
273
+ * offset is a byte offset from the start of the device. No alignment
738
- session_error_setg(errp, s, "failed to initialize libssh2 session");
274
+ * required for offset.
739
+ session_error_setg(errp, s, "failed to initialize libssh session");
275
+ * nr_zones represents IN maximum and OUT actual.
740
goto err;
276
+ */
741
}
277
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
742
278
+ unsigned int *nr_zones,
743
-#if TRACE_LIBSSH2 != 0
279
+ BlockZoneDescriptor *zones)
744
- libssh2_trace(s->session, TRACE_LIBSSH2);
280
+{
745
-#endif
281
+ int ret;
746
+ /*
282
+ IO_CODE();
747
+ * Make sure we are in blocking mode during the connection and
283
+
748
+ * authentication phases.
284
+ blk_inc_in_flight(blk); /* increase before waiting */
749
+ */
285
+ blk_wait_while_drained(blk);
750
+ ssh_set_blocking(s->session, 1);
286
+ if (!blk_is_available(blk)) {
751
287
+ blk_dec_in_flight(blk);
752
- r = libssh2_session_handshake(s->session, s->sock);
288
+ return -ENOMEDIUM;
753
- if (r != 0) {
754
+ r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
755
+ if (r < 0) {
756
+ ret = -EINVAL;
757
+ session_error_setg(errp, s,
758
+ "failed to set the user in the libssh session");
759
+ goto err;
289
+ }
760
+ }
290
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
761
+
291
+ blk_dec_in_flight(blk);
762
+ r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
292
+ return ret;
763
+ if (r < 0) {
293
+}
764
+ ret = -EINVAL;
294
+
765
+ session_error_setg(errp, s,
295
+/*
766
+ "failed to set the host in the libssh session");
296
+ * Send a zone_management command.
767
+ goto err;
297
+ * op is the zone operation;
298
+ * offset is the byte offset from the start of the zoned device;
299
+ * len is the maximum number of bytes the command should operate on. It
300
+ * should be aligned with the device zone size.
301
+ */
302
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
303
+ int64_t offset, int64_t len)
304
+{
305
+ int ret;
306
+ IO_CODE();
307
+
308
+ blk_inc_in_flight(blk);
309
+ blk_wait_while_drained(blk);
310
+
311
+ ret = blk_check_byte_request(blk, offset, len);
312
+ if (ret < 0) {
313
+ blk_dec_in_flight(blk);
314
+ return ret;
315
+ }
768
+ }
316
+
769
+
317
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
770
+ if (port > 0) {
318
+ blk_dec_in_flight(blk);
771
+ r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
319
+ return ret;
772
+ if (r < 0) {
320
+}
773
+ ret = -EINVAL;
321
+
774
+ session_error_setg(errp, s,
322
void blk_drain(BlockBackend *blk)
775
+ "failed to set the port in the libssh session");
323
{
776
+ goto err;
324
BlockDriverState *bs = blk_bs(blk);
325
diff --git a/block/file-posix.c b/block/file-posix.c
326
index XXXXXXX..XXXXXXX 100644
327
--- a/block/file-posix.c
328
+++ b/block/file-posix.c
329
@@ -XXX,XX +XXX,XX @@
330
#include <sys/param.h>
331
#include <sys/syscall.h>
332
#include <sys/vfs.h>
333
+#if defined(CONFIG_BLKZONED)
334
+#include <linux/blkzoned.h>
335
+#endif
336
#include <linux/cdrom.h>
337
#include <linux/fd.h>
338
#include <linux/fs.h>
339
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
340
PreallocMode prealloc;
341
Error **errp;
342
} truncate;
343
+ struct {
344
+ unsigned int *nr_zones;
345
+ BlockZoneDescriptor *zones;
346
+ } zone_report;
347
+ struct {
348
+ unsigned long op;
349
+ } zone_mgmt;
350
};
351
} RawPosixAIOData;
352
353
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
354
zoned = BLK_Z_NONE;
355
}
356
bs->bl.zoned = zoned;
357
+ if (zoned != BLK_Z_NONE) {
358
+ /*
359
+ * The zoned device must at least have zone size and nr_zones fields.
360
+ */
361
+ ret = get_sysfs_long_val(&st, "chunk_sectors");
362
+ if (ret < 0) {
363
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
364
+ "sysfs attribute");
365
+ goto out;
366
+ } else if (!ret) {
367
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
368
+ goto out;
369
+ }
370
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
371
+
372
+ ret = get_sysfs_long_val(&st, "nr_zones");
373
+ if (ret < 0) {
374
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
375
+ "sysfs attribute");
376
+ goto out;
377
+ } else if (!ret) {
378
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
379
+ goto out;
380
+ }
381
+ bs->bl.nr_zones = ret;
382
+
383
+ ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
384
+ if (ret > 0) {
385
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
386
+ }
387
+
388
+ ret = get_sysfs_long_val(&st, "max_open_zones");
389
+ if (ret >= 0) {
390
+ bs->bl.max_open_zones = ret;
391
+ }
392
+
393
+ ret = get_sysfs_long_val(&st, "max_active_zones");
394
+ if (ret >= 0) {
395
+ bs->bl.max_active_zones = ret;
396
+ }
397
+ return;
398
+ }
399
+out:
400
+ bs->bl.zoned = BLK_Z_NONE;
401
}
402
403
static int check_for_dasd(int fd)
404
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
405
BDRVRawState *s = bs->opaque;
406
int ret;
407
408
- /* If DASD, get blocksizes */
409
+ /* If DASD or zoned devices, get blocksizes */
410
if (check_for_dasd(s->fd) < 0) {
411
- return -ENOTSUP;
412
+ /* zoned devices are not DASD */
413
+ if (bs->bl.zoned == BLK_Z_NONE) {
414
+ return -ENOTSUP;
415
+ }
416
}
417
ret = probe_logical_blocksize(s->fd, &bsz->log);
418
if (ret < 0) {
419
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
420
}
421
#endif
422
423
+/*
424
+ * parse_zone - Fill a zone descriptor
425
+ */
426
+#if defined(CONFIG_BLKZONED)
427
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
428
+ const struct blk_zone *blkz) {
429
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
430
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
431
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
432
+
433
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
434
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
435
+#else
436
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
437
+#endif
438
+
439
+ switch (blkz->type) {
440
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
441
+ zone->type = BLK_ZT_SWR;
442
+ break;
443
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
444
+ zone->type = BLK_ZT_SWP;
445
+ break;
446
+ case BLK_ZONE_TYPE_CONVENTIONAL:
447
+ zone->type = BLK_ZT_CONV;
448
+ break;
449
+ default:
450
+ error_report("Unsupported zone type: 0x%x", blkz->type);
451
+ return -ENOTSUP;
452
+ }
453
+
454
+ switch (blkz->cond) {
455
+ case BLK_ZONE_COND_NOT_WP:
456
+ zone->state = BLK_ZS_NOT_WP;
457
+ break;
458
+ case BLK_ZONE_COND_EMPTY:
459
+ zone->state = BLK_ZS_EMPTY;
460
+ break;
461
+ case BLK_ZONE_COND_IMP_OPEN:
462
+ zone->state = BLK_ZS_IOPEN;
463
+ break;
464
+ case BLK_ZONE_COND_EXP_OPEN:
465
+ zone->state = BLK_ZS_EOPEN;
466
+ break;
467
+ case BLK_ZONE_COND_CLOSED:
468
+ zone->state = BLK_ZS_CLOSED;
469
+ break;
470
+ case BLK_ZONE_COND_READONLY:
471
+ zone->state = BLK_ZS_RDONLY;
472
+ break;
473
+ case BLK_ZONE_COND_FULL:
474
+ zone->state = BLK_ZS_FULL;
475
+ break;
476
+ case BLK_ZONE_COND_OFFLINE:
477
+ zone->state = BLK_ZS_OFFLINE;
478
+ break;
479
+ default:
480
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
481
+ return -ENOTSUP;
482
+ }
483
+ return 0;
484
+}
485
+#endif
486
+
487
+#if defined(CONFIG_BLKZONED)
488
+static int handle_aiocb_zone_report(void *opaque)
489
+{
490
+ RawPosixAIOData *aiocb = opaque;
491
+ int fd = aiocb->aio_fildes;
492
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
493
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
494
+ /* zoned block devices use 512-byte sectors */
495
+ uint64_t sector = aiocb->aio_offset / 512;
496
+
497
+ struct blk_zone *blkz;
498
+ size_t rep_size;
499
+ unsigned int nrz;
500
+ int ret, n = 0, i = 0;
501
+
502
+ nrz = *nr_zones;
503
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
504
+ g_autofree struct blk_zone_report *rep = NULL;
505
+ rep = g_malloc(rep_size);
506
+
507
+ blkz = (struct blk_zone *)(rep + 1);
508
+ while (n < nrz) {
509
+ memset(rep, 0, rep_size);
510
+ rep->sector = sector;
511
+ rep->nr_zones = nrz - n;
512
+
513
+ do {
514
+ ret = ioctl(fd, BLKREPORTZONE, rep);
515
+ } while (ret != 0 && errno == EINTR);
516
+ if (ret != 0) {
517
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
518
+ fd, sector, errno);
519
+ return -errno;
520
+ }
521
+
522
+ if (!rep->nr_zones) {
523
+ break;
524
+ }
525
+
526
+ for (i = 0; i < rep->nr_zones; i++, n++) {
527
+ ret = parse_zone(&zones[n], &blkz[i]);
528
+ if (ret != 0) {
529
+ return ret;
530
+ }
531
+
532
+ /* The next report should start after the last zone reported */
533
+ sector = blkz[i].start + blkz[i].len;
534
+ }
777
+ }
535
+ }
778
+ }
536
+
779
+
537
+ *nr_zones = n;
780
+ r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
538
+ return 0;
781
+ if (r < 0) {
539
+}
782
+ ret = -EINVAL;
783
+ session_error_setg(errp, s,
784
+ "failed to disable the compression in the libssh "
785
+ "session");
786
+ goto err;
787
+ }
788
+
789
+ /* Read ~/.ssh/config. */
790
+ r = ssh_options_parse_config(s->session, NULL);
791
+ if (r < 0) {
792
+ ret = -EINVAL;
793
+ session_error_setg(errp, s, "failed to parse ~/.ssh/config");
794
+ goto err;
795
+ }
796
+
797
+ r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
798
+ if (r < 0) {
799
+ ret = -EINVAL;
800
+ session_error_setg(errp, s,
801
+ "failed to set the socket in the libssh session");
802
+ goto err;
803
+ }
804
+ /* libssh took ownership of the socket. */
805
+ s->sock = new_sock;
806
+ new_sock = -1;
807
+
808
+ /* Connect. */
809
+ r = ssh_connect(s->session);
810
+ if (r != SSH_OK) {
811
ret = -EINVAL;
812
session_error_setg(errp, s, "failed to establish SSH session");
813
goto err;
814
}
815
816
/* Check the remote host's key against known_hosts. */
817
- ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
818
+ ret = check_host_key(s, opts->host_key_check, errp);
819
if (ret < 0) {
820
goto err;
821
}
822
823
/* Authenticate. */
824
- ret = authenticate(s, s->user, errp);
825
+ ret = authenticate(s, errp);
826
if (ret < 0) {
827
goto err;
828
}
829
830
/* Start SFTP. */
831
- s->sftp = libssh2_sftp_init(s->session);
832
+ s->sftp = sftp_new(s->session);
833
if (!s->sftp) {
834
- session_error_setg(errp, s, "failed to initialize sftp handle");
835
+ session_error_setg(errp, s, "failed to create sftp handle");
836
+ ret = -EINVAL;
837
+ goto err;
838
+ }
839
+
840
+ r = sftp_init(s->sftp);
841
+ if (r < 0) {
842
+ sftp_error_setg(errp, s, "failed to initialize sftp handle");
843
ret = -EINVAL;
844
goto err;
845
}
846
847
/* Open the remote file. */
848
trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
849
- s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
850
- creat_mode);
851
+ s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
852
if (!s->sftp_handle) {
853
- session_error_setg(errp, s, "failed to open remote file '%s'",
854
- opts->path);
855
+ sftp_error_setg(errp, s, "failed to open remote file '%s'",
856
+ opts->path);
857
ret = -EINVAL;
858
goto err;
859
}
860
861
- r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
862
- if (r < 0) {
863
+ /* Make sure the SFTP file is handled in blocking mode. */
864
+ sftp_file_set_blocking(s->sftp_handle);
865
+
866
+ s->attrs = sftp_fstat(s->sftp_handle);
867
+ if (!s->attrs) {
868
sftp_error_setg(errp, s, "failed to read file attributes");
869
return -EINVAL;
870
}
871
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
872
return 0;
873
874
err:
875
+ if (s->attrs) {
876
+ sftp_attributes_free(s->attrs);
877
+ }
878
+ s->attrs = NULL;
879
if (s->sftp_handle) {
880
- libssh2_sftp_close(s->sftp_handle);
881
+ sftp_close(s->sftp_handle);
882
}
883
s->sftp_handle = NULL;
884
if (s->sftp) {
885
- libssh2_sftp_shutdown(s->sftp);
886
+ sftp_free(s->sftp);
887
}
888
s->sftp = NULL;
889
if (s->session) {
890
- libssh2_session_disconnect(s->session,
891
- "from qemu ssh client: "
892
- "error opening connection");
893
- libssh2_session_free(s->session);
894
+ ssh_disconnect(s->session);
895
+ ssh_free(s->session);
896
}
897
s->session = NULL;
898
+ s->sock = -1;
899
+ if (new_sock >= 0) {
900
+ close(new_sock);
901
+ }
902
903
return ret;
904
}
905
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
906
907
ssh_state_init(s);
908
909
- ssh_flags = LIBSSH2_FXF_READ;
910
+ ssh_flags = 0;
911
if (bdrv_flags & BDRV_O_RDWR) {
912
- ssh_flags |= LIBSSH2_FXF_WRITE;
913
+ ssh_flags |= O_RDWR;
914
+ } else {
915
+ ssh_flags |= O_RDONLY;
916
}
917
918
opts = ssh_parse_options(options, errp);
919
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
920
}
921
922
/* Go non-blocking. */
923
- libssh2_session_set_blocking(s->session, 0);
924
+ ssh_set_blocking(s->session, 0);
925
926
qapi_free_BlockdevOptionsSsh(opts);
927
928
return 0;
929
930
err:
931
- if (s->sock >= 0) {
932
- close(s->sock);
933
- }
934
- s->sock = -1;
935
-
936
qapi_free_BlockdevOptionsSsh(opts);
937
938
return ret;
939
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
940
{
941
ssize_t ret;
942
char c[1] = { '\0' };
943
- int was_blocking = libssh2_session_get_blocking(s->session);
944
+ int was_blocking = ssh_is_blocking(s->session);
945
946
/* offset must be strictly greater than the current size so we do
947
* not overwrite anything */
948
- assert(offset > 0 && offset > s->attrs.filesize);
949
+ assert(offset > 0 && offset > s->attrs->size);
950
951
- libssh2_session_set_blocking(s->session, 1);
952
+ ssh_set_blocking(s->session, 1);
953
954
- libssh2_sftp_seek64(s->sftp_handle, offset - 1);
955
- ret = libssh2_sftp_write(s->sftp_handle, c, 1);
956
+ sftp_seek64(s->sftp_handle, offset - 1);
957
+ ret = sftp_write(s->sftp_handle, c, 1);
958
959
- libssh2_session_set_blocking(s->session, was_blocking);
960
+ ssh_set_blocking(s->session, was_blocking);
961
962
if (ret < 0) {
963
sftp_error_setg(errp, s, "Failed to grow file");
964
return -EIO;
965
}
966
967
- s->attrs.filesize = offset;
968
+ s->attrs->size = offset;
969
return 0;
970
}
971
972
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
973
ssh_state_init(&s);
974
975
ret = connect_to_ssh(&s, opts->location,
976
- LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
977
- LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
978
+ O_RDWR | O_CREAT | O_TRUNC,
979
0644, errp);
980
if (ret < 0) {
981
goto fail;
982
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
983
/* Assume false, unless we can positively prove it's true. */
984
int has_zero_init = 0;
985
986
- if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
987
- if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
988
- has_zero_init = 1;
989
- }
990
+ if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
991
+ has_zero_init = 1;
992
}
993
994
return has_zero_init;
995
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
996
.co = qemu_coroutine_self()
997
};
998
999
- r = libssh2_session_block_directions(s->session);
1000
+ r = ssh_get_poll_flags(s->session);
1001
1002
- if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
1003
+ if (r & SSH_READ_PENDING) {
1004
rd_handler = restart_coroutine;
1005
}
1006
- if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
1007
+ if (r & SSH_WRITE_PENDING) {
1008
wr_handler = restart_coroutine;
1009
}
1010
1011
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
1012
trace_ssh_co_yield_back(s->sock);
1013
}
1014
1015
-/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
1016
- * in the remote file. Notice that it just updates a field in the
1017
- * sftp_handle structure, so there is no network traffic and it cannot
1018
- * fail.
1019
- *
1020
- * However, `libssh2_sftp_seek64' does have a catastrophic effect on
1021
- * performance since it causes the handle to throw away all in-flight
1022
- * reads and buffered readahead data. Therefore this function tries
1023
- * to be intelligent about when to call the underlying libssh2 function.
1024
- */
1025
-#define SSH_SEEK_WRITE 0
1026
-#define SSH_SEEK_READ 1
1027
-#define SSH_SEEK_FORCE 2
1028
-
1029
-static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
1030
-{
1031
- bool op_read = (flags & SSH_SEEK_READ) != 0;
1032
- bool force = (flags & SSH_SEEK_FORCE) != 0;
1033
-
1034
- if (force || op_read != s->offset_op_read || offset != s->offset) {
1035
- trace_ssh_seek(offset);
1036
- libssh2_sftp_seek64(s->sftp_handle, offset);
1037
- s->offset = offset;
1038
- s->offset_op_read = op_read;
1039
- }
1040
-}
1041
-
1042
static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1043
int64_t offset, size_t size,
1044
QEMUIOVector *qiov)
1045
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1046
1047
trace_ssh_read(offset, size);
1048
1049
- ssh_seek(s, offset, SSH_SEEK_READ);
1050
+ trace_ssh_seek(offset);
1051
+ sftp_seek64(s->sftp_handle, offset);
1052
1053
/* This keeps track of the current iovec element ('i'), where we
1054
* will write to next ('buf'), and the end of the current iovec
1055
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1056
buf = i->iov_base;
1057
end_of_vec = i->iov_base + i->iov_len;
1058
1059
- /* libssh2 has a hard-coded limit of 2000 bytes per request,
1060
- * although it will also do readahead behind our backs. Therefore
1061
- * we may have to do repeated reads here until we have read 'size'
1062
- * bytes.
1063
- */
1064
for (got = 0; got < size; ) {
1065
+ size_t request_read_size;
1066
again:
1067
- trace_ssh_read_buf(buf, end_of_vec - buf);
1068
- r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
1069
- trace_ssh_read_return(r);
1070
+ /*
1071
+ * The size of SFTP packets is limited to 32K bytes, so limit
1072
+ * the amount of data requested to 16K, as libssh currently
1073
+ * does not handle multiple requests on its own.
1074
+ */
1075
+ request_read_size = MIN(end_of_vec - buf, 16384);
1076
+ trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
1077
+ r = sftp_read(s->sftp_handle, buf, request_read_size);
1078
+ trace_ssh_read_return(r, sftp_get_error(s->sftp));
1079
1080
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1081
+ if (r == SSH_AGAIN) {
1082
co_yield(s, bs);
1083
goto again;
1084
}
1085
- if (r < 0) {
1086
- sftp_error_trace(s, "read");
1087
- s->offset = -1;
1088
- return -EIO;
1089
- }
1090
- if (r == 0) {
1091
+ if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
1092
/* EOF: Short read so pad the buffer with zeroes and return it. */
1093
qemu_iovec_memset(qiov, got, 0, size - got);
1094
return 0;
1095
}
1096
+ if (r <= 0) {
1097
+ sftp_error_trace(s, "read");
1098
+ return -EIO;
1099
+ }
1100
1101
got += r;
1102
buf += r;
1103
- s->offset += r;
1104
if (buf >= end_of_vec && got < size) {
1105
i++;
1106
buf = i->iov_base;
1107
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1108
1109
trace_ssh_write(offset, size);
1110
1111
- ssh_seek(s, offset, SSH_SEEK_WRITE);
1112
+ trace_ssh_seek(offset);
1113
+ sftp_seek64(s->sftp_handle, offset);
1114
1115
/* This keeps track of the current iovec element ('i'), where we
1116
* will read from next ('buf'), and the end of the current iovec
1117
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1118
end_of_vec = i->iov_base + i->iov_len;
1119
1120
for (written = 0; written < size; ) {
1121
+ size_t request_write_size;
1122
again:
1123
- trace_ssh_write_buf(buf, end_of_vec - buf);
1124
- r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
1125
- trace_ssh_write_return(r);
1126
+ /*
1127
+ * Avoid too large data packets, as libssh currently does not
1128
+ * handle multiple requests on its own.
1129
+ */
1130
+ request_write_size = MIN(end_of_vec - buf, 131072);
1131
+ trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
1132
+ r = sftp_write(s->sftp_handle, buf, request_write_size);
1133
+ trace_ssh_write_return(r, sftp_get_error(s->sftp));
1134
1135
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1136
+ if (r == SSH_AGAIN) {
1137
co_yield(s, bs);
1138
goto again;
1139
}
1140
if (r < 0) {
1141
sftp_error_trace(s, "write");
1142
- s->offset = -1;
1143
return -EIO;
1144
}
1145
- /* The libssh2 API is very unclear about this. A comment in
1146
- * the code says "nothing was acked, and no EAGAIN was
1147
- * received!" which apparently means that no data got sent
1148
- * out, and the underlying channel didn't return any EAGAIN
1149
- * indication. I think this is a bug in either libssh2 or
1150
- * OpenSSH (server-side). In any case, forcing a seek (to
1151
- * discard libssh2 internal buffers), and then trying again
1152
- * works for me.
1153
- */
1154
- if (r == 0) {
1155
- ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
1156
- co_yield(s, bs);
1157
- goto again;
1158
- }
1159
1160
written += r;
1161
buf += r;
1162
- s->offset += r;
1163
if (buf >= end_of_vec && written < size) {
1164
i++;
1165
buf = i->iov_base;
1166
end_of_vec = i->iov_base + i->iov_len;
1167
}
1168
1169
- if (offset + written > s->attrs.filesize)
1170
- s->attrs.filesize = offset + written;
1171
+ if (offset + written > s->attrs->size) {
1172
+ s->attrs->size = offset + written;
1173
+ }
1174
}
1175
1176
return 0;
1177
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
1178
}
1179
}
1180
1181
-#ifdef HAS_LIBSSH2_SFTP_FSYNC
1182
+#ifdef HAVE_LIBSSH_0_8
1183
1184
static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
1185
{
1186
int r;
1187
1188
trace_ssh_flush();
1189
+
1190
+ if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
1191
+ unsafe_flush_warning(s, "OpenSSH >= 6.3");
1192
+ return 0;
1193
+ }
1194
again:
1195
- r = libssh2_sftp_fsync(s->sftp_handle);
1196
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1197
+ r = sftp_fsync(s->sftp_handle);
1198
+ if (r == SSH_AGAIN) {
1199
co_yield(s, bs);
1200
goto again;
1201
}
1202
- if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
1203
- libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
1204
- unsafe_flush_warning(s, "OpenSSH >= 6.3");
1205
- return 0;
1206
- }
1207
if (r < 0) {
1208
sftp_error_trace(s, "fsync");
1209
return -EIO;
1210
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
1211
return ret;
1212
}
1213
1214
-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
1215
+#else /* !HAVE_LIBSSH_0_8 */
1216
1217
static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
1218
{
1219
BDRVSSHState *s = bs->opaque;
1220
1221
- unsafe_flush_warning(s, "libssh2 >= 1.4.4");
1222
+ unsafe_flush_warning(s, "libssh >= 0.8.0");
1223
return 0;
1224
}
1225
1226
-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
1227
+#endif /* !HAVE_LIBSSH_0_8 */
1228
1229
static int64_t ssh_getlength(BlockDriverState *bs)
1230
{
1231
BDRVSSHState *s = bs->opaque;
1232
int64_t length;
1233
1234
- /* Note we cannot make a libssh2 call here. */
1235
- length = (int64_t) s->attrs.filesize;
1236
+ /* Note we cannot make a libssh call here. */
1237
+ length = (int64_t) s->attrs->size;
1238
trace_ssh_getlength(length);
1239
1240
return length;
1241
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
1242
return -ENOTSUP;
1243
}
1244
1245
- if (offset < s->attrs.filesize) {
1246
+ if (offset < s->attrs->size) {
1247
error_setg(errp, "ssh driver does not support shrinking files");
1248
return -ENOTSUP;
1249
}
1250
1251
- if (offset == s->attrs.filesize) {
1252
+ if (offset == s->attrs->size) {
1253
return 0;
1254
}
1255
1256
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
1257
{
1258
int r;
1259
1260
- r = libssh2_init(0);
1261
+ r = ssh_init();
1262
if (r != 0) {
1263
- fprintf(stderr, "libssh2 initialization failed, %d\n", r);
1264
+ fprintf(stderr, "libssh initialization failed, %d\n", r);
1265
exit(EXIT_FAILURE);
1266
}
1267
1268
+#if TRACE_LIBSSH != 0
1269
+ ssh_set_log_level(TRACE_LIBSSH);
540
+#endif
1270
+#endif
541
+
1271
+
542
+#if defined(CONFIG_BLKZONED)
1272
bdrv_register(&bdrv_ssh);
543
+static int handle_aiocb_zone_mgmt(void *opaque)
544
+{
545
+ RawPosixAIOData *aiocb = opaque;
546
+ int fd = aiocb->aio_fildes;
547
+ uint64_t sector = aiocb->aio_offset / 512;
548
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
549
+ struct blk_zone_range range;
550
+ int ret;
551
+
552
+ /* Execute the operation */
553
+ range.sector = sector;
554
+ range.nr_sectors = nr_sectors;
555
+ do {
556
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
557
+ } while (ret != 0 && errno == EINTR);
558
+
559
+ return ret;
560
+}
561
+#endif
562
+
563
static int handle_aiocb_copy_range(void *opaque)
564
{
565
RawPosixAIOData *aiocb = opaque;
566
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
567
}
568
}
1273
}
569
1274
570
+/*
1275
diff --git a/.travis.yml b/.travis.yml
571
+ * zone report - Get a zone block device's information in the form
572
+ * of an array of zone descriptors.
573
+ * zones is an array of zone descriptors to hold zone information on reply;
574
+ * offset can be any byte within the entire size of the device;
575
+ * nr_zones is the maxium number of sectors the command should operate on.
576
+ */
577
+#if defined(CONFIG_BLKZONED)
578
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
579
+ unsigned int *nr_zones,
580
+ BlockZoneDescriptor *zones) {
581
+ BDRVRawState *s = bs->opaque;
582
+ RawPosixAIOData acb = (RawPosixAIOData) {
583
+ .bs = bs,
584
+ .aio_fildes = s->fd,
585
+ .aio_type = QEMU_AIO_ZONE_REPORT,
586
+ .aio_offset = offset,
587
+ .zone_report = {
588
+ .nr_zones = nr_zones,
589
+ .zones = zones,
590
+ },
591
+ };
592
+
593
+ return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
594
+}
595
+#endif
596
+
597
+/*
598
+ * zone management operations - Execute an operation on a zone
599
+ */
600
+#if defined(CONFIG_BLKZONED)
601
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
602
+ int64_t offset, int64_t len) {
603
+ BDRVRawState *s = bs->opaque;
604
+ RawPosixAIOData acb;
605
+ int64_t zone_size, zone_size_mask;
606
+ const char *op_name;
607
+ unsigned long zo;
608
+ int ret;
609
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
610
+
611
+ zone_size = bs->bl.zone_size;
612
+ zone_size_mask = zone_size - 1;
613
+ if (offset & zone_size_mask) {
614
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
615
+ "%" PRId64 "", offset / 512, zone_size / 512);
616
+ return -EINVAL;
617
+ }
618
+
619
+ if (((offset + len) < capacity && len & zone_size_mask) ||
620
+ offset + len > capacity) {
621
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
622
+ " %" PRId64 "", len / 512, zone_size / 512);
623
+ return -EINVAL;
624
+ }
625
+
626
+ switch (op) {
627
+ case BLK_ZO_OPEN:
628
+ op_name = "BLKOPENZONE";
629
+ zo = BLKOPENZONE;
630
+ break;
631
+ case BLK_ZO_CLOSE:
632
+ op_name = "BLKCLOSEZONE";
633
+ zo = BLKCLOSEZONE;
634
+ break;
635
+ case BLK_ZO_FINISH:
636
+ op_name = "BLKFINISHZONE";
637
+ zo = BLKFINISHZONE;
638
+ break;
639
+ case BLK_ZO_RESET:
640
+ op_name = "BLKRESETZONE";
641
+ zo = BLKRESETZONE;
642
+ break;
643
+ default:
644
+ error_report("Unsupported zone op: 0x%x", op);
645
+ return -ENOTSUP;
646
+ }
647
+
648
+ acb = (RawPosixAIOData) {
649
+ .bs = bs,
650
+ .aio_fildes = s->fd,
651
+ .aio_type = QEMU_AIO_ZONE_MGMT,
652
+ .aio_offset = offset,
653
+ .aio_nbytes = len,
654
+ .zone_mgmt = {
655
+ .op = zo,
656
+ },
657
+ };
658
+
659
+ ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
660
+ if (ret != 0) {
661
+ error_report("ioctl %s failed %d", op_name, ret);
662
+ }
663
+
664
+ return ret;
665
+}
666
+#endif
667
+
668
static coroutine_fn int
669
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
670
bool blkdev)
671
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
672
#ifdef __linux__
673
.bdrv_co_ioctl = hdev_co_ioctl,
674
#endif
675
+
676
+ /* zoned device */
677
+#if defined(CONFIG_BLKZONED)
678
+ /* zone management operations */
679
+ .bdrv_co_zone_report = raw_co_zone_report,
680
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
681
+#endif
682
};
683
684
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
685
diff --git a/block/io.c b/block/io.c
686
index XXXXXXX..XXXXXXX 100644
1276
index XXXXXXX..XXXXXXX 100644
687
--- a/block/io.c
1277
--- a/.travis.yml
688
+++ b/block/io.c
1278
+++ b/.travis.yml
689
@@ -XXX,XX +XXX,XX @@ out:
1279
@@ -XXX,XX +XXX,XX @@ addons:
690
return co.ret;
1280
- libseccomp-dev
691
}
1281
- libspice-protocol-dev
692
1282
- libspice-server-dev
693
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
1283
- - libssh2-1-dev
694
+ unsigned int *nr_zones,
1284
+ - libssh-dev
695
+ BlockZoneDescriptor *zones)
1285
- liburcu-dev
696
+{
1286
- libusb-1.0-0-dev
697
+ BlockDriver *drv = bs->drv;
1287
- libvte-2.91-dev
698
+ CoroutineIOCompletion co = {
1288
@@ -XXX,XX +XXX,XX @@ matrix:
699
+ .coroutine = qemu_coroutine_self(),
1289
- libseccomp-dev
700
+ };
1290
- libspice-protocol-dev
701
+ IO_CODE();
1291
- libspice-server-dev
702
+
1292
- - libssh2-1-dev
703
+ bdrv_inc_in_flight(bs);
1293
+ - libssh-dev
704
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
1294
- liburcu-dev
705
+ co.ret = -ENOTSUP;
1295
- libusb-1.0-0-dev
706
+ goto out;
1296
- libvte-2.91-dev
707
+ }
1297
diff --git a/block/trace-events b/block/trace-events
708
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
709
+out:
710
+ bdrv_dec_in_flight(bs);
711
+ return co.ret;
712
+}
713
+
714
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
715
+ int64_t offset, int64_t len)
716
+{
717
+ BlockDriver *drv = bs->drv;
718
+ CoroutineIOCompletion co = {
719
+ .coroutine = qemu_coroutine_self(),
720
+ };
721
+ IO_CODE();
722
+
723
+ bdrv_inc_in_flight(bs);
724
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
725
+ co.ret = -ENOTSUP;
726
+ goto out;
727
+ }
728
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
729
+out:
730
+ bdrv_dec_in_flight(bs);
731
+ return co.ret;
732
+}
733
+
734
void *qemu_blockalign(BlockDriverState *bs, size_t size)
735
{
736
IO_CODE();
737
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
738
index XXXXXXX..XXXXXXX 100644
1298
index XXXXXXX..XXXXXXX 100644
739
--- a/qemu-io-cmds.c
1299
--- a/block/trace-events
740
+++ b/qemu-io-cmds.c
1300
+++ b/block/trace-events
741
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
1301
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
742
.oneline = "flush all in-core file state to disk",
1302
# ssh.c
743
};
1303
ssh_restart_coroutine(void *co) "co=%p"
744
1304
ssh_flush(void) "fsync"
745
+static inline int64_t tosector(int64_t bytes)
1305
-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
746
+{
1306
+ssh_check_host_key_knownhosts(void) "host key OK"
747
+ return bytes >> BDRV_SECTOR_BITS;
1307
ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
748
+}
1308
ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
749
+
1309
ssh_co_yield_back(int sock) "s->sock=%d - back"
750
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
1310
ssh_getlength(int64_t length) "length=%" PRIi64
751
+{
1311
ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
752
+ int ret;
1312
ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
753
+ int64_t offset;
1313
-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
754
+ unsigned int nr_zones;
1314
-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
755
+
1315
+ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
756
+ ++optind;
1316
+ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
757
+ offset = cvtnum(argv[optind]);
1317
ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
758
+ ++optind;
1318
-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
759
+ nr_zones = cvtnum(argv[optind]);
1319
-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
760
+
1320
+ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
761
+ g_autofree BlockZoneDescriptor *zones = NULL;
1321
+ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
762
+ zones = g_new(BlockZoneDescriptor, nr_zones);
1322
ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
763
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
1323
+ssh_auth_methods(int methods) "auth methods=0x%x"
764
+ if (ret < 0) {
1324
+ssh_server_status(int status) "server status=%d"
765
+ printf("zone report failed: %s\n", strerror(-ret));
1325
766
+ } else {
1326
# curl.c
767
+ for (int i = 0; i < nr_zones; ++i) {
1327
curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
768
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
1328
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
769
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
1329
sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
770
+ "zcond:%u, [type: %u]\n",
1330
771
+ tosector(zones[i].start), tosector(zones[i].length),
1331
# ssh.c
772
+ tosector(zones[i].cap), tosector(zones[i].wp),
1332
-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
773
+ zones[i].state, zones[i].type);
1333
+sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
774
+ }
1334
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
775
+ }
1335
index XXXXXXX..XXXXXXX 100644
776
+ return ret;
1336
--- a/docs/qemu-block-drivers.texi
777
+}
1337
+++ b/docs/qemu-block-drivers.texi
778
+
1338
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
779
+static const cmdinfo_t zone_report_cmd = {
1339
780
+ .name = "zone_report",
1340
warning: ssh server @code{ssh.example.com:22} does not support fsync
781
+ .altname = "zrp",
1341
782
+ .cfunc = zone_report_f,
1342
-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
783
+ .argmin = 2,
1343
+With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
784
+ .argmax = 2,
1344
supported.
785
+ .args = "offset number",
1345
786
+ .oneline = "report zone information",
1346
@node disk_images_nvme
787
+};
1347
diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
788
+
1348
index XXXXXXX..XXXXXXX 100644
789
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
1349
--- a/tests/docker/dockerfiles/debian-win32-cross.docker
790
+{
1350
+++ b/tests/docker/dockerfiles/debian-win32-cross.docker
791
+ int ret;
1351
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
792
+ int64_t offset, len;
1352
mxe-$TARGET-w64-mingw32.shared-curl \
793
+ ++optind;
1353
mxe-$TARGET-w64-mingw32.shared-glib \
794
+ offset = cvtnum(argv[optind]);
1354
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
795
+ ++optind;
1355
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
796
+ len = cvtnum(argv[optind]);
1356
mxe-$TARGET-w64-mingw32.shared-libusb1 \
797
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
1357
mxe-$TARGET-w64-mingw32.shared-lzo \
798
+ if (ret < 0) {
1358
mxe-$TARGET-w64-mingw32.shared-nettle \
799
+ printf("zone open failed: %s\n", strerror(-ret));
1359
diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
800
+ }
1360
index XXXXXXX..XXXXXXX 100644
801
+ return ret;
1361
--- a/tests/docker/dockerfiles/debian-win64-cross.docker
802
+}
1362
+++ b/tests/docker/dockerfiles/debian-win64-cross.docker
803
+
1363
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
804
+static const cmdinfo_t zone_open_cmd = {
1364
mxe-$TARGET-w64-mingw32.shared-curl \
805
+ .name = "zone_open",
1365
mxe-$TARGET-w64-mingw32.shared-glib \
806
+ .altname = "zo",
1366
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
807
+ .cfunc = zone_open_f,
1367
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
808
+ .argmin = 2,
1368
mxe-$TARGET-w64-mingw32.shared-libusb1 \
809
+ .argmax = 2,
1369
mxe-$TARGET-w64-mingw32.shared-lzo \
810
+ .args = "offset len",
1370
mxe-$TARGET-w64-mingw32.shared-nettle \
811
+ .oneline = "explicit open a range of zones in zone block device",
1371
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
812
+};
1372
index XXXXXXX..XXXXXXX 100644
813
+
1373
--- a/tests/docker/dockerfiles/fedora.docker
814
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
1374
+++ b/tests/docker/dockerfiles/fedora.docker
815
+{
1375
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
816
+ int ret;
1376
libpng-devel \
817
+ int64_t offset, len;
1377
librbd-devel \
818
+ ++optind;
1378
libseccomp-devel \
819
+ offset = cvtnum(argv[optind]);
1379
- libssh2-devel \
820
+ ++optind;
1380
+ libssh-devel \
821
+ len = cvtnum(argv[optind]);
1381
libubsan \
822
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
1382
libusbx-devel \
823
+ if (ret < 0) {
1383
libxml2-devel \
824
+ printf("zone close failed: %s\n", strerror(-ret));
1384
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
825
+ }
1385
mingw32-gtk3 \
826
+ return ret;
1386
mingw32-libjpeg-turbo \
827
+}
1387
mingw32-libpng \
828
+
1388
- mingw32-libssh2 \
829
+static const cmdinfo_t zone_close_cmd = {
1389
mingw32-libtasn1 \
830
+ .name = "zone_close",
1390
mingw32-nettle \
831
+ .altname = "zc",
1391
mingw32-pixman \
832
+ .cfunc = zone_close_f,
1392
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
833
+ .argmin = 2,
1393
mingw64-gtk3 \
834
+ .argmax = 2,
1394
mingw64-libjpeg-turbo \
835
+ .args = "offset len",
1395
mingw64-libpng \
836
+ .oneline = "close a range of zones in zone block device",
1396
- mingw64-libssh2 \
837
+};
1397
mingw64-libtasn1 \
838
+
1398
mingw64-nettle \
839
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
1399
mingw64-pixman \
840
+{
1400
diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
841
+ int ret;
1401
index XXXXXXX..XXXXXXX 100644
842
+ int64_t offset, len;
1402
--- a/tests/docker/dockerfiles/ubuntu.docker
843
+ ++optind;
1403
+++ b/tests/docker/dockerfiles/ubuntu.docker
844
+ offset = cvtnum(argv[optind]);
1404
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
845
+ ++optind;
1405
libsnappy-dev \
846
+ len = cvtnum(argv[optind]);
1406
libspice-protocol-dev \
847
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
1407
libspice-server-dev \
848
+ if (ret < 0) {
1408
- libssh2-1-dev \
849
+ printf("zone finish failed: %s\n", strerror(-ret));
1409
+ libssh-dev \
850
+ }
1410
libusb-1.0-0-dev \
851
+ return ret;
1411
libusbredirhost-dev \
852
+}
1412
libvdeplug-dev \
853
+
1413
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
854
+static const cmdinfo_t zone_finish_cmd = {
1414
index XXXXXXX..XXXXXXX 100644
855
+ .name = "zone_finish",
1415
--- a/tests/docker/dockerfiles/ubuntu1804.docker
856
+ .altname = "zf",
1416
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
857
+ .cfunc = zone_finish_f,
1417
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
858
+ .argmin = 2,
1418
libsnappy-dev \
859
+ .argmax = 2,
1419
libspice-protocol-dev \
860
+ .args = "offset len",
1420
libspice-server-dev \
861
+ .oneline = "finish a range of zones in zone block device",
1421
- libssh2-1-dev \
862
+};
1422
+ libssh-dev \
863
+
1423
libusb-1.0-0-dev \
864
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
1424
libusbredirhost-dev \
865
+{
1425
libvdeplug-dev \
866
+ int ret;
1426
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
867
+ int64_t offset, len;
1427
index XXXXXXX..XXXXXXX 100755
868
+ ++optind;
1428
--- a/tests/qemu-iotests/207
869
+ offset = cvtnum(argv[optind]);
1429
+++ b/tests/qemu-iotests/207
870
+ ++optind;
1430
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
871
+ len = cvtnum(argv[optind]);
1431
872
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
1432
iotests.img_info_log(remote_path)
873
+ if (ret < 0) {
1433
874
+ printf("zone reset failed: %s\n", strerror(-ret));
1434
- md5_key = subprocess.check_output(
875
+ }
1435
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
876
+ return ret;
1436
- 'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
877
+}
1437
- shell=True).rstrip().decode('ascii')
878
+
1438
+ keys = subprocess.check_output(
879
+static const cmdinfo_t zone_reset_cmd = {
1439
+ 'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
880
+ .name = "zone_reset",
1440
+ 'cut -d" " -f3',
881
+ .altname = "zrs",
1441
+ shell=True).rstrip().decode('ascii').split('\n')
882
+ .cfunc = zone_reset_f,
1442
+
883
+ .argmin = 2,
1443
+ # Mappings of base64 representations to digests
884
+ .argmax = 2,
1444
+ md5_keys = {}
885
+ .args = "offset len",
1445
+ sha1_keys = {}
886
+ .oneline = "reset a zone write pointer in zone block device",
1446
+
887
+};
1447
+ for key in keys:
888
+
1448
+ md5_keys[key] = subprocess.check_output(
889
static int truncate_f(BlockBackend *blk, int argc, char **argv);
1449
+ 'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
890
static const cmdinfo_t truncate_cmd = {
1450
+ shell=True).rstrip().decode('ascii')
891
.name = "truncate",
1451
+
892
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
1452
+ sha1_keys[key] = subprocess.check_output(
893
qemuio_add_command(&aio_write_cmd);
1453
+ 'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
894
qemuio_add_command(&aio_flush_cmd);
1454
+ shell=True).rstrip().decode('ascii')
895
qemuio_add_command(&flush_cmd);
1455
896
+ qemuio_add_command(&zone_report_cmd);
1456
vm.launch()
897
+ qemuio_add_command(&zone_open_cmd);
1457
+
898
+ qemuio_add_command(&zone_close_cmd);
1458
+ # Find correct key first
899
+ qemuio_add_command(&zone_finish_cmd);
1459
+ matching_key = None
900
+ qemuio_add_command(&zone_reset_cmd);
1460
+ for key in keys:
901
qemuio_add_command(&truncate_cmd);
1461
+ result = vm.qmp('blockdev-add',
902
qemuio_add_command(&length_cmd);
1462
+ driver='ssh', node_name='node0', path=disk_path,
903
qemuio_add_command(&info_cmd);
1463
+ server={
1464
+ 'host': '127.0.0.1',
1465
+ 'port': '22',
1466
+ }, host_key_check={
1467
+ 'mode': 'hash',
1468
+ 'type': 'md5',
1469
+ 'hash': md5_keys[key],
1470
+ })
1471
+
1472
+ if 'error' not in result:
1473
+ vm.qmp('blockdev-del', node_name='node0')
1474
+ matching_key = key
1475
+ break
1476
+
1477
+ if matching_key is None:
1478
+ vm.shutdown()
1479
+ iotests.notrun('Did not find a key that fits 127.0.0.1')
1480
+
1481
blockdev_create(vm, { 'driver': 'ssh',
1482
'location': {
1483
'path': disk_path,
1484
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1485
'host-key-check': {
1486
'mode': 'hash',
1487
'type': 'md5',
1488
- 'hash': md5_key,
1489
+ 'hash': md5_keys[matching_key],
1490
}
1491
},
1492
'size': 8388608 })
1493
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1494
1495
iotests.img_info_log(remote_path)
1496
1497
- sha1_key = subprocess.check_output(
1498
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
1499
- 'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
1500
- shell=True).rstrip().decode('ascii')
1501
-
1502
vm.launch()
1503
blockdev_create(vm, { 'driver': 'ssh',
1504
'location': {
1505
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1506
'host-key-check': {
1507
'mode': 'hash',
1508
'type': 'sha1',
1509
- 'hash': sha1_key,
1510
+ 'hash': sha1_keys[matching_key],
1511
}
1512
},
1513
'size': 4194304 })
1514
diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
1515
index XXXXXXX..XXXXXXX 100644
1516
--- a/tests/qemu-iotests/207.out
1517
+++ b/tests/qemu-iotests/207.out
1518
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
1519
1520
{"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
1521
{"return": {}}
1522
-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
1523
+Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
1524
{"execute": "job-dismiss", "arguments": {"id": "job0"}}
1525
{"return": {}}
1526
904
--
1527
--
905
2.39.2
1528
2.21.0
906
1529
907
1530
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
10
block/file-posix.c | 3 +++
11
block/trace-events | 2 ++
12
2 files changed, 5 insertions(+)
13
14
diff --git a/block/file-posix.c b/block/file-posix.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/block/file-posix.c
17
+++ b/block/file-posix.c
18
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
19
},
20
};
21
22
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
23
return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
24
}
25
#endif
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
27
},
28
};
29
30
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
31
+ len >> BDRV_SECTOR_BITS);
32
ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
33
if (ret != 0) {
34
error_report("ioctl %s failed %d", op_name, ret);
35
diff --git a/block/trace-events b/block/trace-events
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/trace-events
38
+++ b/block/trace-events
39
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
40
file_setup_cdrom(const char *partition) "Using %s as optical disc"
41
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
42
file_flush_fdatasync_failed(int err) "errno %d"
43
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
44
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
45
46
# ssh.c
47
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
48
--
49
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
12
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
13
any toctree" error.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
docs/devel/index-api.rst | 1 +
18
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
19
docs/system/qemu-block-drivers.rst.inc | 6 ++++
20
3 files changed, 50 insertions(+)
21
create mode 100644 docs/devel/zoned-storage.rst
22
23
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
24
index XXXXXXX..XXXXXXX 100644
25
--- a/docs/devel/index-api.rst
26
+++ b/docs/devel/index-api.rst
27
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
28
memory
29
modules
30
ui
31
+ zoned-storage
32
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
33
new file mode 100644
34
index XXXXXXX..XXXXXXX
35
--- /dev/null
36
+++ b/docs/devel/zoned-storage.rst
37
@@ -XXX,XX +XXX,XX @@
38
+=============
39
+zoned-storage
40
+=============
41
+
42
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
43
+that are larger than the LBA size. They can only allow sequential writes, which
44
+can reduce write amplification in SSDs, and potentially lead to higher
45
+throughput and increased capacity. More details about ZBDs can be found at:
46
+
47
+https://zonedstorage.io/docs/introduction/zoned-storage
48
+
49
+1. Block layer APIs for zoned storage
50
+-------------------------------------
51
+QEMU block layer supports three zoned storage models:
52
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
53
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
54
+manage the zones of a device.
55
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
56
+zones, making it backward compatible with regular block devices.
57
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
58
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
59
+supported.
60
+
61
+The block device information resides inside BlockDriverState. QEMU uses
62
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
63
+block layer while processing I/O requests. A BlockBackend has a root pointer to
64
+a BlockDriverState graph(for example, raw format on top of file-posix). The
65
+zoned storage information can be propagated from the leaf BlockDriverState all
66
+the way up to the BlockBackend. If the zoned storage model in file-posix is
67
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
68
+
69
+The block layer APIs support commands needed for zoned storage devices,
70
+including report zones, four zone operations, and zone append.
71
+
72
+2. Emulating zoned storage controllers
73
+--------------------------------------
74
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
75
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
76
+APIs for zoned storage emulation or testing.
77
+
78
+For example, to test zone_report on a null_blk device using qemu-io is:
79
+$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
80
+-c "zrp offset nr_zones"
81
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
82
index XXXXXXX..XXXXXXX 100644
83
--- a/docs/system/qemu-block-drivers.rst.inc
84
+++ b/docs/system/qemu-block-drivers.rst.inc
85
@@ -XXX,XX +XXX,XX @@ Hard disks
86
you may corrupt your host data (use the ``-snapshot`` command
87
line option or modify the device permissions accordingly).
88
89
+Zoned block devices
90
+ Zoned block devices can be passed through to the guest if the emulated storage
91
+ controller supports zoned storage. Use ``--blockdev host_device,
92
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
93
+ ``/dev/nullb0`` as ``drive0``.
94
+
95
Windows
96
^^^^^^^
97
98
--
99
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
2
1
3
The event filename is an absolute path. Convert it to a relative path when
4
writing '#line' directives, to preserve reproducibility of the generated
5
output when different base paths are used.
6
7
Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-Id: <20230406080045.21696-1-thomas.de_schampheleire@nokia.com>
10
---
11
scripts/tracetool/backend/ftrace.py | 4 +++-
12
scripts/tracetool/backend/log.py | 4 +++-
13
scripts/tracetool/backend/syslog.py | 4 +++-
14
3 files changed, 9 insertions(+), 3 deletions(-)
15
16
diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
17
index XXXXXXX..XXXXXXX 100644
18
--- a/scripts/tracetool/backend/ftrace.py
19
+++ b/scripts/tracetool/backend/ftrace.py
20
@@ -XXX,XX +XXX,XX @@
21
__email__ = "stefanha@redhat.com"
22
23
24
+import os.path
25
+
26
from tracetool import out
27
28
29
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
30
args=event.args,
31
event_id="TRACE_" + event.name.upper(),
32
event_lineno=event.lineno,
33
- event_filename=event.filename,
34
+ event_filename=os.path.relpath(event.filename),
35
fmt=event.fmt.rstrip("\n"),
36
argnames=argnames)
37
38
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
39
index XXXXXXX..XXXXXXX 100644
40
--- a/scripts/tracetool/backend/log.py
41
+++ b/scripts/tracetool/backend/log.py
42
@@ -XXX,XX +XXX,XX @@
43
__email__ = "stefanha@redhat.com"
44
45
46
+import os.path
47
+
48
from tracetool import out
49
50
51
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
52
' }',
53
cond=cond,
54
event_lineno=event.lineno,
55
- event_filename=event.filename,
56
+ event_filename=os.path.relpath(event.filename),
57
name=event.name,
58
fmt=event.fmt.rstrip("\n"),
59
argnames=argnames)
60
diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
61
index XXXXXXX..XXXXXXX 100644
62
--- a/scripts/tracetool/backend/syslog.py
63
+++ b/scripts/tracetool/backend/syslog.py
64
@@ -XXX,XX +XXX,XX @@
65
__email__ = "stefanha@redhat.com"
66
67
68
+import os.path
69
+
70
from tracetool import out
71
72
73
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
74
' }',
75
cond=cond,
76
event_lineno=event.lineno,
77
- event_filename=event.filename,
78
+ event_filename=os.path.relpath(event.filename),
79
name=event.name,
80
fmt=event.fmt.rstrip("\n"),
81
argnames=argnames)
82
--
83
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Since Linux doesn't have a user API to issue zone append operations to
4
zoned devices from user space, the file-posix driver is modified to add
5
zone append emulation using regular writes. To do this, the file-posix
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
9
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230407081657.17947-2-faithilikerun@gmail.com
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
include/block/block-common.h | 14 +++
21
include/block/block_int-common.h | 5 +
22
block/file-posix.c | 173 ++++++++++++++++++++++++++++++-
23
3 files changed, 189 insertions(+), 3 deletions(-)
24
25
diff --git a/include/block/block-common.h b/include/block/block-common.h
26
index XXXXXXX..XXXXXXX 100644
27
--- a/include/block/block-common.h
28
+++ b/include/block/block-common.h
29
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
30
BlockZoneState state;
31
} BlockZoneDescriptor;
32
33
+/*
34
+ * Track write pointers of a zone in bytes.
35
+ */
36
+typedef struct BlockZoneWps {
37
+ CoMutex colock;
38
+ uint64_t wp[];
39
+} BlockZoneWps;
40
+
41
typedef struct BlockDriverInfo {
42
/* in bytes, 0 if irrelevant */
43
int cluster_size;
44
@@ -XXX,XX +XXX,XX @@ typedef enum {
45
#define BDRV_SECTOR_BITS 9
46
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
47
48
+/*
49
+ * Get the first most significant bit of wp. If it is zero, then
50
+ * the zone type is SWR.
51
+ */
52
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
53
+
54
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
55
INT_MAX >> BDRV_SECTOR_BITS)
56
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
57
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/block_int-common.h
60
+++ b/include/block/block_int-common.h
61
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
62
63
/* maximum number of active zones */
64
int64_t max_active_zones;
65
+
66
+ int64_t write_granularity;
67
} BlockLimits;
68
69
typedef struct BdrvOpBlocker BdrvOpBlocker;
70
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
71
CoMutex bsc_modify_lock;
72
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
73
BdrvBlockStatusCache *block_status_cache;
74
+
75
+ /* array of write pointers' location of each zone in the zoned device. */
76
+ BlockZoneWps *wps;
77
};
78
79
struct BlockBackendRootState {
80
diff --git a/block/file-posix.c b/block/file-posix.c
81
index XXXXXXX..XXXXXXX 100644
82
--- a/block/file-posix.c
83
+++ b/block/file-posix.c
84
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
85
#endif
86
}
87
88
+#if defined(CONFIG_BLKZONED)
89
+/*
90
+ * If the reset_all flag is true, then the wps of zone whose state is
91
+ * not readonly or offline should be all reset to the start sector.
92
+ * Else, take the real wp of the device.
93
+ */
94
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
95
+ unsigned int nrz, bool reset_all)
96
+{
97
+ struct blk_zone *blkz;
98
+ size_t rep_size;
99
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
100
+ BlockZoneWps *wps = bs->wps;
101
+ int j = offset / bs->bl.zone_size;
102
+ int ret, n = 0, i = 0;
103
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
104
+ g_autofree struct blk_zone_report *rep = NULL;
105
+
106
+ rep = g_malloc(rep_size);
107
+ blkz = (struct blk_zone *)(rep + 1);
108
+ while (n < nrz) {
109
+ memset(rep, 0, rep_size);
110
+ rep->sector = sector;
111
+ rep->nr_zones = nrz - n;
112
+
113
+ do {
114
+ ret = ioctl(fd, BLKREPORTZONE, rep);
115
+ } while (ret != 0 && errno == EINTR);
116
+ if (ret != 0) {
117
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
118
+ fd, offset, errno);
119
+ return -errno;
120
+ }
121
+
122
+ if (!rep->nr_zones) {
123
+ break;
124
+ }
125
+
126
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
127
+ /*
128
+ * The wp tracking cares only about sequential writes required and
129
+ * sequential write preferred zones so that the wp can advance to
130
+ * the right location.
131
+ * Use the most significant bit of the wp location to indicate the
132
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
133
+ */
134
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
135
+ wps->wp[j] |= 1ULL << 63;
136
+ } else {
137
+ switch(blkz[i].cond) {
138
+ case BLK_ZONE_COND_FULL:
139
+ case BLK_ZONE_COND_READONLY:
140
+ /* Zone not writable */
141
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
142
+ break;
143
+ case BLK_ZONE_COND_OFFLINE:
144
+ /* Zone not writable nor readable */
145
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
146
+ break;
147
+ default:
148
+ if (reset_all) {
149
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
150
+ } else {
151
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
152
+ }
153
+ break;
154
+ }
155
+ }
156
+ }
157
+ sector = blkz[i - 1].start + blkz[i - 1].len;
158
+ }
159
+
160
+ return 0;
161
+}
162
+
163
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
164
+ unsigned int nrz)
165
+{
166
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
167
+ error_report("update zone wp failed");
168
+ }
169
+}
170
+#endif
171
+
172
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
173
{
174
BDRVRawState *s = bs->opaque;
175
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
176
if (ret >= 0) {
177
bs->bl.max_active_zones = ret;
178
}
179
+
180
+ ret = get_sysfs_long_val(&st, "physical_block_size");
181
+ if (ret >= 0) {
182
+ bs->bl.write_granularity = ret;
183
+ }
184
+
185
+ /* The refresh_limits() function can be called multiple times. */
186
+ g_free(bs->wps);
187
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
188
+ sizeof(int64_t) * bs->bl.nr_zones);
189
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
190
+ if (ret < 0) {
191
+ error_setg_errno(errp, -ret, "report wps failed");
192
+ bs->wps = NULL;
193
+ return;
194
+ }
195
+ qemu_co_mutex_init(&bs->wps->colock);
196
return;
197
}
198
out:
199
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
200
{
201
BDRVRawState *s = bs->opaque;
202
RawPosixAIOData acb;
203
+ int ret;
204
205
if (fd_open(bs) < 0)
206
return -EIO;
207
+#if defined(CONFIG_BLKZONED)
208
+ if (type & QEMU_AIO_WRITE && bs->wps) {
209
+ qemu_co_mutex_lock(&bs->wps->colock);
210
+ }
211
+#endif
212
213
/*
214
* When using O_DIRECT, the request must be aligned to be able to use
215
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
216
} else if (s->use_linux_io_uring) {
217
LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
218
assert(qiov->size == bytes);
219
- return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
220
+ ret = luring_co_submit(bs, aio, s->fd, offset, qiov, type);
221
+ goto out;
222
#endif
223
#ifdef CONFIG_LINUX_AIO
224
} else if (s->use_linux_aio) {
225
LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
226
assert(qiov->size == bytes);
227
- return laio_co_submit(bs, aio, s->fd, offset, qiov, type,
228
+ ret = laio_co_submit(bs, aio, s->fd, offset, qiov, type,
229
s->aio_max_batch);
230
+ goto out;
231
#endif
232
}
233
234
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
235
};
236
237
assert(qiov->size == bytes);
238
- return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
239
+ ret = raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
240
+
241
+out:
242
+#if defined(CONFIG_BLKZONED)
243
+ BlockZoneWps *wps = bs->wps;
244
+ if (ret == 0) {
245
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
246
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
247
+ if (!BDRV_ZT_IS_CONV(*wp)) {
248
+ /* Advance the wp if needed */
249
+ if (offset + bytes > *wp) {
250
+ *wp = offset + bytes;
251
+ }
252
+ }
253
+ }
254
+ } else {
255
+ if (type & QEMU_AIO_WRITE) {
256
+ update_zones_wp(bs, s->fd, 0, 1);
257
+ }
258
+ }
259
+
260
+ if (type & QEMU_AIO_WRITE && wps) {
261
+ qemu_co_mutex_unlock(&wps->colock);
262
+ }
263
+#endif
264
+ return ret;
265
}
266
267
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
268
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
269
BDRVRawState *s = bs->opaque;
270
271
if (s->fd >= 0) {
272
+#if defined(CONFIG_BLKZONED)
273
+ g_free(bs->wps);
274
+#endif
275
qemu_close(s->fd);
276
s->fd = -1;
277
}
278
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
279
const char *op_name;
280
unsigned long zo;
281
int ret;
282
+ BlockZoneWps *wps = bs->wps;
283
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
284
285
zone_size = bs->bl.zone_size;
286
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
287
return -EINVAL;
288
}
289
290
+ QEMU_LOCK_GUARD(&wps->colock);
291
+ uint32_t i = offset / bs->bl.zone_size;
292
+ uint32_t nrz = len / bs->bl.zone_size;
293
+ uint64_t *wp = &wps->wp[i];
294
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
295
+ error_report("zone mgmt operations are not allowed for conventional zones");
296
+ return -EIO;
297
+ }
298
+
299
switch (op) {
300
case BLK_ZO_OPEN:
301
op_name = "BLKOPENZONE";
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
303
len >> BDRV_SECTOR_BITS);
304
ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
305
if (ret != 0) {
306
+ update_zones_wp(bs, s->fd, offset, i);
307
error_report("ioctl %s failed %d", op_name, ret);
308
+ return ret;
309
+ }
310
+
311
+ if (zo == BLKRESETZONE && len == capacity) {
312
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
313
+ if (ret < 0) {
314
+ error_report("reporting single wp failed");
315
+ return ret;
316
+ }
317
+ } else if (zo == BLKRESETZONE) {
318
+ for (int j = 0; j < nrz; ++j) {
319
+ wp[j] = offset + j * zone_size;
320
+ }
321
+ } else if (zo == BLKFINISHZONE) {
322
+ for (int j = 0; j < nrz; ++j) {
323
+ /* The zoned device allows the last zone smaller that the
324
+ * zone size. */
325
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
326
+ }
327
}
328
329
return ret;
330
--
331
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
A zone append command is a write operation that specifies the first
4
logical block of a zone as the write position. When writing to a zoned
5
block device using zone append, the byte offset of the call may point at
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230407081657.17947-3-faithilikerun@gmail.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
include/block/block-io.h | 4 +++
17
include/block/block_int-common.h | 3 ++
18
include/block/raw-aio.h | 4 ++-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 60 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 ++++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 +++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 +++++
26
10 files changed, 172 insertions(+), 8 deletions(-)
27
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ if (!blk_is_available(blk)) {
172
+ blk_dec_in_flight(blk);
173
+ return -ENOMEDIUM;
174
+ }
175
+
176
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
177
+ blk_dec_in_flight(blk);
178
+ return ret;
179
+}
180
+
181
void blk_drain(BlockBackend *blk)
182
{
183
BlockDriverState *bs = blk_bs(blk);
184
diff --git a/block/file-posix.c b/block/file-posix.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/file-posix.c
187
+++ b/block/file-posix.c
188
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
189
bool has_write_zeroes:1;
190
bool use_linux_aio:1;
191
bool use_linux_io_uring:1;
192
+ int64_t *offset; /* offset of zone append operation */
193
int page_cache_inconsistent; /* errno from fdatasync failure */
194
bool has_fallocate;
195
bool needs_alignment;
196
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
197
ssize_t len;
198
199
len = RETRY_ON_EINTR(
200
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
201
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
202
qemu_pwritev(aiocb->aio_fildes,
203
aiocb->io.iov,
204
aiocb->io.niov,
205
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
206
ssize_t len;
207
208
while (offset < aiocb->aio_nbytes) {
209
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
210
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
211
len = pwrite(aiocb->aio_fildes,
212
(const char *)buf + offset,
213
aiocb->aio_nbytes - offset,
214
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
215
}
216
217
nbytes = handle_aiocb_rw_linear(aiocb, buf);
218
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
219
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
220
char *p = buf;
221
size_t count = aiocb->aio_nbytes, copy;
222
int i;
223
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
224
if (fd_open(bs) < 0)
225
return -EIO;
226
#if defined(CONFIG_BLKZONED)
227
- if (type & QEMU_AIO_WRITE && bs->wps) {
228
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
229
qemu_co_mutex_lock(&bs->wps->colock);
230
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
231
+ int index = offset / bs->bl.zone_size;
232
+ offset = bs->wps->wp[index];
233
+ }
234
}
235
#endif
236
237
@@ -XXX,XX +XXX,XX @@ out:
238
#if defined(CONFIG_BLKZONED)
239
BlockZoneWps *wps = bs->wps;
240
if (ret == 0) {
241
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
242
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
243
+ && wps && bs->bl.zone_size) {
244
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
245
if (!BDRV_ZT_IS_CONV(*wp)) {
246
+ if (type & QEMU_AIO_ZONE_APPEND) {
247
+ *s->offset = *wp;
248
+ }
249
/* Advance the wp if needed */
250
if (offset + bytes > *wp) {
251
*wp = offset + bytes;
252
@@ -XXX,XX +XXX,XX @@ out:
253
}
254
}
255
} else {
256
- if (type & QEMU_AIO_WRITE) {
257
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
258
update_zones_wp(bs, s->fd, 0, 1);
259
}
260
}
261
262
- if (type & QEMU_AIO_WRITE && wps) {
263
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
264
qemu_co_mutex_unlock(&wps->colock);
265
}
266
#endif
267
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
268
}
269
#endif
270
271
+#if defined(CONFIG_BLKZONED)
272
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
273
+ int64_t *offset,
274
+ QEMUIOVector *qiov,
275
+ BdrvRequestFlags flags) {
276
+ assert(flags == 0);
277
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
278
+ int64_t iov_len = 0;
279
+ int64_t len = 0;
280
+ BDRVRawState *s = bs->opaque;
281
+ s->offset = offset;
282
+
283
+ if (*offset & zone_size_mask) {
284
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
285
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
286
+ return -EINVAL;
287
+ }
288
+
289
+ int64_t wg = bs->bl.write_granularity;
290
+ int64_t wg_mask = wg - 1;
291
+ for (int i = 0; i < qiov->niov; i++) {
292
+ iov_len = qiov->iov[i].iov_len;
293
+ if (iov_len & wg_mask) {
294
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
295
+ "block size %" PRId64 "", i, iov_len, wg);
296
+ return -EINVAL;
297
+ }
298
+ len += iov_len;
299
+ }
300
+
301
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
302
+}
303
+#endif
304
+
305
static coroutine_fn int
306
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
307
bool blkdev)
308
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
309
/* zone management operations */
310
.bdrv_co_zone_report = raw_co_zone_report,
311
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
312
+ .bdrv_co_zone_append = raw_co_zone_append,
313
#endif
314
};
315
316
diff --git a/block/io.c b/block/io.c
317
index XXXXXXX..XXXXXXX 100644
318
--- a/block/io.c
319
+++ b/block/io.c
320
@@ -XXX,XX +XXX,XX @@ out:
321
return co.ret;
322
}
323
324
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
325
+ QEMUIOVector *qiov,
326
+ BdrvRequestFlags flags)
327
+{
328
+ int ret;
329
+ BlockDriver *drv = bs->drv;
330
+ CoroutineIOCompletion co = {
331
+ .coroutine = qemu_coroutine_self(),
332
+ };
333
+ IO_CODE();
334
+
335
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
336
+ if (ret < 0) {
337
+ return ret;
338
+ }
339
+
340
+ bdrv_inc_in_flight(bs);
341
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
342
+ co.ret = -ENOTSUP;
343
+ goto out;
344
+ }
345
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
346
+out:
347
+ bdrv_dec_in_flight(bs);
348
+ return co.ret;
349
+}
350
+
351
void *qemu_blockalign(BlockDriverState *bs, size_t size)
352
{
353
IO_CODE();
354
diff --git a/block/io_uring.c b/block/io_uring.c
355
index XXXXXXX..XXXXXXX 100644
356
--- a/block/io_uring.c
357
+++ b/block/io_uring.c
358
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
359
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
360
luringcb->qiov->niov, offset);
361
break;
362
+ case QEMU_AIO_ZONE_APPEND:
363
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
364
+ luringcb->qiov->niov, offset);
365
+ break;
366
case QEMU_AIO_READ:
367
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
368
luringcb->qiov->niov, offset);
369
diff --git a/block/linux-aio.c b/block/linux-aio.c
370
index XXXXXXX..XXXXXXX 100644
371
--- a/block/linux-aio.c
372
+++ b/block/linux-aio.c
373
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
374
case QEMU_AIO_WRITE:
375
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
376
break;
377
+ case QEMU_AIO_ZONE_APPEND:
378
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
379
+ break;
380
case QEMU_AIO_READ:
381
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
382
break;
383
diff --git a/block/raw-format.c b/block/raw-format.c
384
index XXXXXXX..XXXXXXX 100644
385
--- a/block/raw-format.c
386
+++ b/block/raw-format.c
387
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
388
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
389
}
390
391
+static int coroutine_fn GRAPH_RDLOCK
392
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
393
+ BdrvRequestFlags flags)
394
+{
395
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
396
+}
397
+
398
static int64_t coroutine_fn GRAPH_RDLOCK
399
raw_co_getlength(BlockDriverState *bs)
400
{
401
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
402
.bdrv_co_pdiscard = &raw_co_pdiscard,
403
.bdrv_co_zone_report = &raw_co_zone_report,
404
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
405
+ .bdrv_co_zone_append = &raw_co_zone_append,
406
.bdrv_co_block_status = &raw_co_block_status,
407
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
408
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
409
--
410
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
The patch tests zone append writes by reporting the zone wp after
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230407081657.17947-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
17
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/qemu-io-cmds.c
21
+++ b/qemu-io-cmds.c
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
23
.oneline = "reset a zone write pointer in zone block device",
24
};
25
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
27
+ int64_t *offset, int flags, int *total)
28
+{
29
+ int async_ret = NOT_DONE;
30
+
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
32
+ while (async_ret == NOT_DONE) {
33
+ main_loop_wait(false);
34
+ }
35
+
36
+ *total = qiov->size;
37
+ return async_ret < 0 ? async_ret : 1;
38
+}
39
+
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
41
+{
42
+ int ret;
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
51
+
52
+ if (optind > argc - 3) {
53
+ return -EINVAL;
54
+ }
55
+
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
57
+ pflag = true;
58
+ }
59
+
60
+ offset = cvtnum(argv[optind]);
61
+ if (offset < 0) {
62
+ print_cvtnum_err(offset, argv[optind]);
63
+ return offset;
64
+ }
65
+ optind++;
66
+ nr_iov = argc - optind;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
68
+ flags & BDRV_REQ_REGISTERED_BUF);
69
+ if (buf == NULL) {
70
+ return -EINVAL;
71
+ }
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
73
+ if (ret < 0) {
74
+ printf("zone append failed: %s\n", strerror(-ret));
75
+ goto out;
76
+ }
77
+
78
+ if (pflag) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
80
+ tosector(offset));
81
+ }
82
+
83
+out:
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
87
+ return ret;
88
+}
89
+
90
+static const cmdinfo_t zone_append_cmd = {
91
+ .name = "zone_append",
92
+ .altname = "zap",
93
+ .cfunc = zone_append_f,
94
+ .argmin = 3,
95
+ .argmax = 4,
96
+ .args = "offset len [len..]",
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
99
+
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
101
static const cmdinfo_t truncate_cmd = {
102
.name = "truncate",
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
146
+
147
+
148
+(6) append write
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
150
+After zap done, the append sector is 0x0
151
+After appending the first zone firstly:
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
153
+After zap done, the append sector is 0x18
154
+After appending the first zone secondly:
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
156
+After zap done, the append sector is 0x80000
157
+After appending the second zone firstly:
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
159
+After zap done, the append sector is 0x80018
160
+After appending the second zone secondly:
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
162
*** done
163
--
164
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230407081657.17947-5-faithilikerun@gmail.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
block/file-posix.c | 3 +++
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
12
13
diff --git a/block/file-posix.c b/block/file-posix.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
16
+++ b/block/file-posix.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
28
}
29
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
32
}
33
#endif
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
48
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Use scripts/update-linux-headers.sh to update headers to 6.3-rc1.
4
5
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Message-id: 20230407082528.18841-2-faithilikerun@gmail.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
include/standard-headers/drm/drm_fourcc.h | 12 +++
12
include/standard-headers/linux/ethtool.h | 48 ++++++++-
13
include/standard-headers/linux/fuse.h | 45 +++++++-
14
include/standard-headers/linux/pci_regs.h | 1 +
15
include/standard-headers/linux/vhost_types.h | 2 +
16
include/standard-headers/linux/virtio_blk.h | 105 +++++++++++++++++++
17
linux-headers/asm-arm64/kvm.h | 1 +
18
linux-headers/asm-x86/kvm.h | 34 +++++-
19
linux-headers/linux/kvm.h | 9 ++
20
linux-headers/linux/vfio.h | 15 +--
21
linux-headers/linux/vhost.h | 8 ++
22
11 files changed, 270 insertions(+), 10 deletions(-)
23
24
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
25
index XXXXXXX..XXXXXXX 100644
26
--- a/include/standard-headers/drm/drm_fourcc.h
27
+++ b/include/standard-headers/drm/drm_fourcc.h
28
@@ -XXX,XX +XXX,XX @@ extern "C" {
29
*
30
* The authoritative list of format modifier codes is found in
31
* `include/uapi/drm/drm_fourcc.h`
32
+ *
33
+ * Open Source User Waiver
34
+ * -----------------------
35
+ *
36
+ * Because this is the authoritative source for pixel formats and modifiers
37
+ * referenced by GL, Vulkan extensions and other standards and hence used both
38
+ * by open source and closed source driver stacks, the usual requirement for an
39
+ * upstream in-kernel or open source userspace user does not apply.
40
+ *
41
+ * To ensure, as much as feasible, compatibility across stacks and avoid
42
+ * confusion with incompatible enumerations stakeholders for all relevant driver
43
+ * stacks should approve additions.
44
*/
45
46
#define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
47
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
48
index XXXXXXX..XXXXXXX 100644
49
--- a/include/standard-headers/linux/ethtool.h
50
+++ b/include/standard-headers/linux/ethtool.h
51
@@ -XXX,XX +XXX,XX @@ enum ethtool_stringset {
52
    ETH_SS_COUNT
53
};
54
55
+/**
56
+ * enum ethtool_mac_stats_src - source of ethtool MAC statistics
57
+ * @ETHTOOL_MAC_STATS_SRC_AGGREGATE:
58
+ *    if device supports a MAC merge layer, this retrieves the aggregate
59
+ *    statistics of the eMAC and pMAC. Otherwise, it retrieves just the
60
+ *    statistics of the single (express) MAC.
61
+ * @ETHTOOL_MAC_STATS_SRC_EMAC:
62
+ *    if device supports a MM layer, this retrieves the eMAC statistics.
63
+ *    Otherwise, it retrieves the statistics of the single (express) MAC.
64
+ * @ETHTOOL_MAC_STATS_SRC_PMAC:
65
+ *    if device supports a MM layer, this retrieves the pMAC statistics.
66
+ */
67
+enum ethtool_mac_stats_src {
68
+    ETHTOOL_MAC_STATS_SRC_AGGREGATE,
69
+    ETHTOOL_MAC_STATS_SRC_EMAC,
70
+    ETHTOOL_MAC_STATS_SRC_PMAC,
71
+};
72
+
73
/**
74
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
75
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
76
@@ -XXX,XX +XXX,XX @@ enum ethtool_podl_pse_pw_d_status {
77
    ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
78
};
79
80
+/**
81
+ * enum ethtool_mm_verify_status - status of MAC Merge Verify function
82
+ * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN:
83
+ *    verification status is unknown
84
+ * @ETHTOOL_MM_VERIFY_STATUS_INITIAL:
85
+ *    the 802.3 Verify State diagram is in the state INIT_VERIFICATION
86
+ * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING:
87
+ *    the Verify State diagram is in the state VERIFICATION_IDLE,
88
+ *    SEND_VERIFY or WAIT_FOR_RESPONSE
89
+ * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
90
+ *    indicates that the Verify State diagram is in the state VERIFIED
91
+ * @ETHTOOL_MM_VERIFY_STATUS_FAILED:
92
+ *    the Verify State diagram is in the state VERIFY_FAIL
93
+ * @ETHTOOL_MM_VERIFY_STATUS_DISABLED:
94
+ *    verification of preemption operation is disabled
95
+ */
96
+enum ethtool_mm_verify_status {
97
+    ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
98
+    ETHTOOL_MM_VERIFY_STATUS_INITIAL,
99
+    ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
100
+    ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
101
+    ETHTOOL_MM_VERIFY_STATUS_FAILED,
102
+    ETHTOOL_MM_VERIFY_STATUS_DISABLED,
103
+};
104
+
105
/**
106
* struct ethtool_gstrings - string set for data tagging
107
* @cmd: Command number = %ETHTOOL_GSTRINGS
108
@@ -XXX,XX +XXX,XX @@ struct ethtool_rxnfc {
109
        uint32_t            rule_cnt;
110
        uint32_t            rss_context;
111
    };
112
-    uint32_t                rule_locs[0];
113
+    uint32_t                rule_locs[];
114
};
115
116
117
@@ -XXX,XX +XXX,XX @@ enum ethtool_link_mode_bit_indices {
118
    ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT     = 96,
119
    ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT     = 97,
120
    ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT     = 98,
121
+    ETHTOOL_LINK_MODE_10baseT1S_Full_BIT         = 99,
122
+    ETHTOOL_LINK_MODE_10baseT1S_Half_BIT         = 100,
123
+    ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT     = 101,
124
125
    /* must be last entry */
126
    __ETHTOOL_LINK_MODE_MASK_NBITS
127
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
128
index XXXXXXX..XXXXXXX 100644
129
--- a/include/standard-headers/linux/fuse.h
130
+++ b/include/standard-headers/linux/fuse.h
131
@@ -XXX,XX +XXX,XX @@
132
* 7.38
133
* - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
134
* - add FOPEN_PARALLEL_DIRECT_WRITES
135
+ * - add total_extlen to fuse_in_header
136
+ * - add FUSE_MAX_NR_SECCTX
137
+ * - add extension header
138
+ * - add FUSE_EXT_GROUPS
139
+ * - add FUSE_CREATE_SUPP_GROUP
140
*/
141
142
#ifndef _LINUX_FUSE_H
143
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
144
* FUSE_SECURITY_CTX:    add security context to create, mkdir, symlink, and
145
*            mknod
146
* FUSE_HAS_INODE_DAX: use per inode DAX
147
+ * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
148
+ *            symlink and mknod (single group that matches parent)
149
*/
150
#define FUSE_ASYNC_READ        (1 << 0)
151
#define FUSE_POSIX_LOCKS    (1 << 1)
152
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
153
/* bits 32..63 get shifted down 32 bits into the flags2 field */
154
#define FUSE_SECURITY_CTX    (1ULL << 32)
155
#define FUSE_HAS_INODE_DAX    (1ULL << 33)
156
+#define FUSE_CREATE_SUPP_GROUP    (1ULL << 34)
157
158
/**
159
* CUSE INIT request/reply flags
160
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
161
*/
162
#define FUSE_EXPIRE_ONLY        (1 << 0)
163
164
+/**
165
+ * extension type
166
+ * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
167
+ * FUSE_EXT_GROUPS: &fuse_supp_groups extension
168
+ */
169
+enum fuse_ext_type {
170
+    /* Types 0..31 are reserved for fuse_secctx_header */
171
+    FUSE_MAX_NR_SECCTX    = 31,
172
+    FUSE_EXT_GROUPS        = 32,
173
+};
174
+
175
enum fuse_opcode {
176
    FUSE_LOOKUP        = 1,
177
    FUSE_FORGET        = 2, /* no reply */
178
@@ -XXX,XX +XXX,XX @@ struct fuse_in_header {
179
    uint32_t    uid;
180
    uint32_t    gid;
181
    uint32_t    pid;
182
-    uint32_t    padding;
183
+    uint16_t    total_extlen; /* length of extensions in 8byte units */
184
+    uint16_t    padding;
185
};
186
187
struct fuse_out_header {
188
@@ -XXX,XX +XXX,XX @@ struct fuse_secctx_header {
189
    uint32_t    nr_secctx;
190
};
191
192
+/**
193
+ * struct fuse_ext_header - extension header
194
+ * @size: total size of this extension including this header
195
+ * @type: type of extension
196
+ *
197
+ * This is made compatible with fuse_secctx_header by using type values >
198
+ * FUSE_MAX_NR_SECCTX
199
+ */
200
+struct fuse_ext_header {
201
+    uint32_t    size;
202
+    uint32_t    type;
203
+};
204
+
205
+/**
206
+ * struct fuse_supp_groups - Supplementary group extension
207
+ * @nr_groups: number of supplementary groups
208
+ * @groups: flexible array of group IDs
209
+ */
210
+struct fuse_supp_groups {
211
+    uint32_t    nr_groups;
212
+    uint32_t    groups[];
213
+};
214
+
215
#endif /* _LINUX_FUSE_H */
216
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
217
index XXXXXXX..XXXXXXX 100644
218
--- a/include/standard-headers/linux/pci_regs.h
219
+++ b/include/standard-headers/linux/pci_regs.h
220
@@ -XXX,XX +XXX,XX @@
221
#define PCI_EXP_LNKCTL2_TX_MARGIN    0x0380 /* Transmit Margin */
222
#define PCI_EXP_LNKCTL2_HASD        0x0020 /* HW Autonomous Speed Disable */
223
#define PCI_EXP_LNKSTA2        0x32    /* Link Status 2 */
224
+#define PCI_EXP_LNKSTA2_FLIT        0x0400 /* Flit Mode Status */
225
#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2    0x32    /* end of v2 EPs w/ link */
226
#define PCI_EXP_SLTCAP2        0x34    /* Slot Capabilities 2 */
227
#define PCI_EXP_SLTCAP2_IBPD    0x00000001 /* In-band PD Disable Supported */
228
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
229
index XXXXXXX..XXXXXXX 100644
230
--- a/include/standard-headers/linux/vhost_types.h
231
+++ b/include/standard-headers/linux/vhost_types.h
232
@@ -XXX,XX +XXX,XX @@ struct vhost_vdpa_iova_range {
233
#define VHOST_BACKEND_F_IOTLB_ASID 0x3
234
/* Device can be suspended */
235
#define VHOST_BACKEND_F_SUSPEND 0x4
236
+/* Device can be resumed */
237
+#define VHOST_BACKEND_F_RESUME 0x5
238
239
#endif
240
diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
241
index XXXXXXX..XXXXXXX 100644
242
--- a/include/standard-headers/linux/virtio_blk.h
243
+++ b/include/standard-headers/linux/virtio_blk.h
244
@@ -XXX,XX +XXX,XX @@
245
#define VIRTIO_BLK_F_DISCARD    13    /* DISCARD is supported */
246
#define VIRTIO_BLK_F_WRITE_ZEROES    14    /* WRITE ZEROES is supported */
247
#define VIRTIO_BLK_F_SECURE_ERASE    16 /* Secure Erase is supported */
248
+#define VIRTIO_BLK_F_ZONED        17    /* Zoned block device */
249
250
/* Legacy feature bits */
251
#ifndef VIRTIO_BLK_NO_LEGACY
252
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
253
    /* Secure erase commands must be aligned to this number of sectors. */
254
    __virtio32 secure_erase_sector_alignment;
255
256
+    /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
257
+    struct virtio_blk_zoned_characteristics {
258
+        uint32_t zone_sectors;
259
+        uint32_t max_open_zones;
260
+        uint32_t max_active_zones;
261
+        uint32_t max_append_sectors;
262
+        uint32_t write_granularity;
263
+        uint8_t model;
264
+        uint8_t unused2[3];
265
+    } zoned;
266
} QEMU_PACKED;
267
268
/*
269
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
270
/* Secure erase command */
271
#define VIRTIO_BLK_T_SECURE_ERASE    14
272
273
+/* Zone append command */
274
+#define VIRTIO_BLK_T_ZONE_APPEND 15
275
+
276
+/* Report zones command */
277
+#define VIRTIO_BLK_T_ZONE_REPORT 16
278
+
279
+/* Open zone command */
280
+#define VIRTIO_BLK_T_ZONE_OPEN 18
281
+
282
+/* Close zone command */
283
+#define VIRTIO_BLK_T_ZONE_CLOSE 20
284
+
285
+/* Finish zone command */
286
+#define VIRTIO_BLK_T_ZONE_FINISH 22
287
+
288
+/* Reset zone command */
289
+#define VIRTIO_BLK_T_ZONE_RESET 24
290
+
291
+/* Reset All zones command */
292
+#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
293
+
294
#ifndef VIRTIO_BLK_NO_LEGACY
295
/* Barrier before this op. */
296
#define VIRTIO_BLK_T_BARRIER    0x80000000
297
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_outhdr {
298
    __virtio64 sector;
299
};
300
301
+/*
302
+ * Supported zoned device models.
303
+ */
304
+
305
+/* Regular block device */
306
+#define VIRTIO_BLK_Z_NONE 0
307
+/* Host-managed zoned device */
308
+#define VIRTIO_BLK_Z_HM 1
309
+/* Host-aware zoned device */
310
+#define VIRTIO_BLK_Z_HA 2
311
+
312
+/*
313
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
314
+ */
315
+struct virtio_blk_zone_descriptor {
316
+    /* Zone capacity */
317
+    uint64_t z_cap;
318
+    /* The starting sector of the zone */
319
+    uint64_t z_start;
320
+    /* Zone write pointer position in sectors */
321
+    uint64_t z_wp;
322
+    /* Zone type */
323
+    uint8_t z_type;
324
+    /* Zone state */
325
+    uint8_t z_state;
326
+    uint8_t reserved[38];
327
+};
328
+
329
+struct virtio_blk_zone_report {
330
+    uint64_t nr_zones;
331
+    uint8_t reserved[56];
332
+    struct virtio_blk_zone_descriptor zones[];
333
+};
334
+
335
+/*
336
+ * Supported zone types.
337
+ */
338
+
339
+/* Conventional zone */
340
+#define VIRTIO_BLK_ZT_CONV 1
341
+/* Sequential Write Required zone */
342
+#define VIRTIO_BLK_ZT_SWR 2
343
+/* Sequential Write Preferred zone */
344
+#define VIRTIO_BLK_ZT_SWP 3
345
+
346
+/*
347
+ * Zone states that are available for zones of all types.
348
+ */
349
+
350
+/* Not a write pointer (conventional zones only) */
351
+#define VIRTIO_BLK_ZS_NOT_WP 0
352
+/* Empty */
353
+#define VIRTIO_BLK_ZS_EMPTY 1
354
+/* Implicitly Open */
355
+#define VIRTIO_BLK_ZS_IOPEN 2
356
+/* Explicitly Open */
357
+#define VIRTIO_BLK_ZS_EOPEN 3
358
+/* Closed */
359
+#define VIRTIO_BLK_ZS_CLOSED 4
360
+/* Read-Only */
361
+#define VIRTIO_BLK_ZS_RDONLY 13
362
+/* Full */
363
+#define VIRTIO_BLK_ZS_FULL 14
364
+/* Offline */
365
+#define VIRTIO_BLK_ZS_OFFLINE 15
366
+
367
/* Unmap this range (only valid for write zeroes command) */
368
#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP    0x00000001
369
370
@@ -XXX,XX +XXX,XX @@ struct virtio_scsi_inhdr {
371
#define VIRTIO_BLK_S_OK        0
372
#define VIRTIO_BLK_S_IOERR    1
373
#define VIRTIO_BLK_S_UNSUPP    2
374
+
375
+/* Error codes that are specific to zoned block devices */
376
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
377
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4
378
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5
379
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
380
+
381
#endif /* _LINUX_VIRTIO_BLK_H */
382
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
383
index XXXXXXX..XXXXXXX 100644
384
--- a/linux-headers/asm-arm64/kvm.h
385
+++ b/linux-headers/asm-arm64/kvm.h
386
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
387
#define KVM_ARM_VCPU_SVE        4 /* enable SVE for this CPU */
388
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS    5 /* VCPU uses address authentication */
389
#define KVM_ARM_VCPU_PTRAUTH_GENERIC    6 /* VCPU uses generic authentication */
390
+#define KVM_ARM_VCPU_HAS_EL2        7 /* Support nested virtualization */
391
392
struct kvm_vcpu_init {
393
    __u32 target;
394
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
395
index XXXXXXX..XXXXXXX 100644
396
--- a/linux-headers/asm-x86/kvm.h
397
+++ b/linux-headers/asm-x86/kvm.h
398
@@ -XXX,XX +XXX,XX @@
399
400
#include <linux/types.h>
401
#include <linux/ioctl.h>
402
+#include <linux/stddef.h>
403
404
#define KVM_PIO_PAGE_OFFSET 1
405
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
406
@@ -XXX,XX +XXX,XX @@ struct kvm_nested_state {
407
     * KVM_{GET,PUT}_NESTED_STATE ioctl values.
408
     */
409
    union {
410
-        struct kvm_vmx_nested_state_data vmx[0];
411
-        struct kvm_svm_nested_state_data svm[0];
412
+        __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx);
413
+        __DECLARE_FLEX_ARRAY(struct kvm_svm_nested_state_data, svm);
414
    } data;
415
};
416
417
@@ -XXX,XX +XXX,XX @@ struct kvm_pmu_event_filter {
418
#define KVM_PMU_EVENT_ALLOW 0
419
#define KVM_PMU_EVENT_DENY 1
420
421
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
422
+#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
423
+
424
+/*
425
+ * Masked event layout.
426
+ * Bits Description
427
+ * ---- -----------
428
+ * 7:0 event select (low bits)
429
+ * 15:8 umask match
430
+ * 31:16 unused
431
+ * 35:32 event select (high bits)
432
+ * 36:54 unused
433
+ * 55 exclude bit
434
+ * 63:56 umask mask
435
+ */
436
+
437
+#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
438
+    (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \
439
+    (((mask) & 0xFFULL) << 56) | \
440
+    (((match) & 0xFFULL) << 8) | \
441
+    ((__u64)(!!(exclude)) << 55))
442
+
443
+#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
444
+    (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
445
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK        (GENMASK_ULL(63, 56))
446
+#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH    (GENMASK_ULL(15, 8))
447
+#define KVM_PMU_MASKED_ENTRY_EXCLUDE        (BIT_ULL(55))
448
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT    (56)
449
+
450
/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
451
#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
452
#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
453
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
454
index XXXXXXX..XXXXXXX 100644
455
--- a/linux-headers/linux/kvm.h
456
+++ b/linux-headers/linux/kvm.h
457
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
458
        struct {
459
            __u8 ar;    /* the access register number */
460
            __u8 key;    /* access key, ignored if flag unset */
461
+            __u8 pad1[6];    /* ignored */
462
+            __u64 old_addr;    /* ignored if cmpxchg flag unset */
463
        };
464
        __u32 sida_offset; /* offset into the sida */
465
        __u8 reserved[32]; /* ignored */
466
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
467
#define KVM_S390_MEMOP_SIDA_WRITE    3
468
#define KVM_S390_MEMOP_ABSOLUTE_READ    4
469
#define KVM_S390_MEMOP_ABSOLUTE_WRITE    5
470
+#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG    6
471
+
472
/* flags for kvm_s390_mem_op->flags */
473
#define KVM_S390_MEMOP_F_CHECK_ONLY        (1ULL << 0)
474
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION    (1ULL << 1)
475
#define KVM_S390_MEMOP_F_SKEY_PROTECTION    (1ULL << 2)
476
477
+/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
478
+#define KVM_S390_MEMOP_EXTENSION_CAP_BASE    (1 << 0)
479
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG    (1 << 1)
480
+
481
/* for KVM_INTERRUPT */
482
struct kvm_interrupt {
483
    /* in */
484
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
485
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
486
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
487
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
488
+#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
489
490
#ifdef KVM_CAP_IRQ_ROUTING
491
492
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
493
index XXXXXXX..XXXXXXX 100644
494
--- a/linux-headers/linux/vfio.h
495
+++ b/linux-headers/linux/vfio.h
496
@@ -XXX,XX +XXX,XX @@
497
/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
498
#define VFIO_UNMAP_ALL            9
499
500
-/* Supports the vaddr flag for DMA map and unmap */
501
+/*
502
+ * Supports the vaddr flag for DMA map and unmap. Not supported for mediated
503
+ * devices, so this capability is subject to change as groups are added or
504
+ * removed.
505
+ */
506
#define VFIO_UPDATE_VADDR        10
507
508
/*
509
@@ -XXX,XX +XXX,XX @@ struct vfio_iommu_type1_info_dma_avail {
510
* Map process virtual addresses to IO virtual addresses using the
511
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
512
*
513
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
514
- * unblock translation of host virtual addresses in the iova range. The vaddr
515
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
516
* must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR. To
517
* maintain memory consistency within the user application, the updated vaddr
518
* must address the same memory object as originally mapped. Failure to do so
519
@@ -XXX,XX +XXX,XX @@ struct vfio_bitmap {
520
* must be 0. This cannot be combined with the get-dirty-bitmap flag.
521
*
522
* If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
523
- * virtual addresses in the iova range. Tasks that attempt to translate an
524
- * iova's vaddr will block. DMA to already-mapped pages continues. This
525
- * cannot be combined with the get-dirty-bitmap flag.
526
+ * virtual addresses in the iova range. DMA to already-mapped pages continues.
527
+ * Groups may not be added to the container while any addresses are invalid.
528
+ * This cannot be combined with the get-dirty-bitmap flag.
529
*/
530
struct vfio_iommu_type1_dma_unmap {
531
    __u32    argsz;
532
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
533
index XXXXXXX..XXXXXXX 100644
534
--- a/linux-headers/linux/vhost.h
535
+++ b/linux-headers/linux/vhost.h
536
@@ -XXX,XX +XXX,XX @@
537
*/
538
#define VHOST_VDPA_SUSPEND        _IO(VHOST_VIRTIO, 0x7D)
539
540
+/* Resume a device so it can resume processing virtqueue requests
541
+ *
542
+ * After the return of this ioctl the device will have restored all the
543
+ * necessary states and it is fully operational to continue processing the
544
+ * virtqueue descriptors.
545
+ */
546
+#define VHOST_VDPA_RESUME        _IO(VHOST_VIRTIO, 0x7E)
547
+
548
#endif
549
--
550
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
This patch extends virtio-blk emulation to handle zoned device commands
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
7
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
9
support zoned block devices. Regular block devices(conventional zones)
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230407082528.18841-3-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
hw/block/virtio-blk-common.c | 2 +
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
25
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
27
+++ b/hw/block/virtio-blk-common.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
35
};
36
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/hw/block/virtio-blk.c
40
+++ b/hw/block/virtio-blk.c
41
@@ -XXX,XX +XXX,XX @@
42
#include "qemu/module.h"
43
#include "qemu/error-report.h"
44
#include "qemu/main-loop.h"
45
+#include "block/block_int.h"
46
#include "trace.h"
47
#include "hw/block/block.h"
48
#include "hw/qdev-properties.h"
49
@@ -XXX,XX +XXX,XX @@ err:
50
return err_status;
51
}
52
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
463
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
466
VirtIOBlock *s = VIRTIO_BLK(dev);
467
VirtIOBlkConf *conf = &s->conf;
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
Error *err = NULL;
470
unsigned i;
471
472
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
473
return;
474
}
475
476
+ if (bs->bl.zoned != BLK_Z_NONE) {
477
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
478
+ if (bs->bl.zoned == BLK_Z_HM) {
479
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
480
+ }
481
+ }
482
+
483
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
484
(!conf->max_discard_sectors ||
485
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
486
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
487
index XXXXXXX..XXXXXXX 100644
488
--- a/hw/virtio/virtio-qmp.c
489
+++ b/hw/virtio/virtio-qmp.c
490
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
491
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
492
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
493
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
494
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
495
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
496
#ifndef VIRTIO_BLK_NO_LEGACY
497
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
498
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
499
--
500
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Taking account of the new zone append write operation for zoned devices,
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
5
write, flush).
6
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
Message-id: 20230407082528.18841-4-faithilikerun@gmail.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
12
qapi/block.json | 4 +++
13
include/block/accounting.h | 1 +
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
6 files changed, 95 insertions(+), 11 deletions(-)
18
19
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
index XXXXXXX..XXXXXXX 100644
21
--- a/qapi/block-core.json
22
+++ b/qapi/block-core.json
23
@@ -XXX,XX +XXX,XX @@
24
# @min_wr_latency_ns: Minimum latency of write operations in the
25
# defined interval, in nanoseconds.
26
#
27
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
28
+# in the defined interval, in nanoseconds
29
+# (since 8.1)
30
+#
31
# @min_flush_latency_ns: Minimum latency of flush operations in the
32
# defined interval, in nanoseconds.
33
#
34
@@ -XXX,XX +XXX,XX @@
35
# @max_wr_latency_ns: Maximum latency of write operations in the
36
# defined interval, in nanoseconds.
37
#
38
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
39
+# in the defined interval, in nanoseconds
40
+# (since 8.1)
41
+#
42
# @max_flush_latency_ns: Maximum latency of flush operations in the
43
# defined interval, in nanoseconds.
44
#
45
@@ -XXX,XX +XXX,XX @@
46
# @avg_wr_latency_ns: Average latency of write operations in the
47
# defined interval, in nanoseconds.
48
#
49
+# @avg_zone_append_latency_ns: Average latency of zone append operations
50
+# in the defined interval, in nanoseconds
51
+# (since 8.1)
52
+#
53
# @avg_flush_latency_ns: Average latency of flush operations in the
54
# defined interval, in nanoseconds.
55
#
56
@@ -XXX,XX +XXX,XX @@
57
# @avg_wr_queue_depth: Average number of pending write operations
58
# in the defined interval.
59
#
60
+# @avg_zone_append_queue_depth: Average number of pending zone append
61
+# operations in the defined interval
62
+# (since 8.1).
63
+#
64
# Since: 2.5
65
##
66
{ 'struct': 'BlockDeviceTimedStats',
67
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
68
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
69
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
70
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
71
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
72
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
73
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
74
+ 'max_zone_append_latency_ns': 'int',
75
+ 'avg_zone_append_latency_ns': 'int',
76
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
77
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
78
+ 'avg_wr_queue_depth': 'number',
79
+ 'avg_zone_append_queue_depth': 'number' } }
80
81
##
82
# @BlockDeviceStats:
83
@@ -XXX,XX +XXX,XX @@
84
#
85
# @wr_bytes: The number of bytes written by the device.
86
#
87
+# @zone_append_bytes: The number of bytes appended by the zoned devices
88
+# (since 8.1)
89
+#
90
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
91
#
92
# @rd_operations: The number of read operations performed by the device.
93
#
94
# @wr_operations: The number of write operations performed by the device.
95
#
96
+# @zone_append_operations: The number of zone append operations performed
97
+# by the zoned devices (since 8.1)
98
+#
99
# @flush_operations: The number of cache flush operations performed by the
100
# device (since 0.15)
101
#
102
@@ -XXX,XX +XXX,XX @@
103
#
104
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15).
105
#
106
+# @zone_append_total_time_ns: Total time spent on zone append writes
107
+# in nanoseconds (since 8.1)
108
+#
109
# @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
110
# (since 0.15).
111
#
112
@@ -XXX,XX +XXX,XX @@
113
# @wr_merged: Number of write requests that have been merged into another
114
# request (Since 2.3).
115
#
116
+# @zone_append_merged: Number of zone append requests that have been merged
117
+# into another request (since 8.1)
118
+#
119
# @unmap_merged: Number of unmap requests that have been merged into another
120
# request (Since 4.2)
121
#
122
@@ -XXX,XX +XXX,XX @@
123
# @failed_wr_operations: The number of failed write operations
124
# performed by the device (Since 2.5)
125
#
126
+# @failed_zone_append_operations: The number of failed zone append write
127
+# operations performed by the zoned devices
128
+# (since 8.1)
129
+#
130
# @failed_flush_operations: The number of failed flush operations
131
# performed by the device (Since 2.5)
132
#
133
@@ -XXX,XX +XXX,XX @@
134
# @invalid_wr_operations: The number of invalid write operations
135
# performed by the device (Since 2.5)
136
#
137
+# @invalid_zone_append_operations: The number of invalid zone append operations
138
+# performed by the zoned device (since 8.1)
139
+#
140
# @invalid_flush_operations: The number of invalid flush operations
141
# performed by the device (Since 2.5)
142
#
143
@@ -XXX,XX +XXX,XX @@
144
#
145
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
146
#
147
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
148
+#
149
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
150
#
151
# Since: 0.14
152
##
153
{ 'struct': 'BlockDeviceStats',
154
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
155
- 'rd_operations': 'int', 'wr_operations': 'int',
156
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
157
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
158
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
159
'flush_operations': 'int', 'unmap_operations': 'int',
160
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
161
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
162
- 'wr_highest_offset': 'int',
163
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
164
- '*idle_time_ns': 'int',
165
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
166
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
167
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
168
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
169
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
170
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
171
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
172
+ 'failed_zone_append_operations': 'int',
173
+ 'failed_flush_operations': 'int',
174
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
175
+ 'invalid_wr_operations': 'int',
176
+ 'invalid_zone_append_operations': 'int',
177
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
178
'account_invalid': 'bool', 'account_failed': 'bool',
179
'timed_stats': ['BlockDeviceTimedStats'],
180
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
181
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
182
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
184
185
##
186
diff --git a/qapi/block.json b/qapi/block.json
187
index XXXXXXX..XXXXXXX 100644
188
--- a/qapi/block.json
189
+++ b/qapi/block.json
190
@@ -XXX,XX +XXX,XX @@
191
# @boundaries-write: list of interval boundary values for write latency
192
# histogram.
193
#
194
+# @boundaries-zap: list of interval boundary values for zone append write
195
+# latency histogram.
196
+#
197
# @boundaries-flush: list of interval boundary values for flush latency
198
# histogram.
199
#
200
@@ -XXX,XX +XXX,XX @@
201
'*boundaries': ['uint64'],
202
'*boundaries-read': ['uint64'],
203
'*boundaries-write': ['uint64'],
204
+ '*boundaries-zap': ['uint64'],
205
'*boundaries-flush': ['uint64'] },
206
'allow-preconfig': true }
207
diff --git a/include/block/accounting.h b/include/block/accounting.h
208
index XXXXXXX..XXXXXXX 100644
209
--- a/include/block/accounting.h
210
+++ b/include/block/accounting.h
211
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
212
BLOCK_ACCT_READ,
213
BLOCK_ACCT_WRITE,
214
BLOCK_ACCT_FLUSH,
215
+ BLOCK_ACCT_ZONE_APPEND,
216
BLOCK_ACCT_UNMAP,
217
BLOCK_MAX_IOTYPE,
218
};
219
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
220
index XXXXXXX..XXXXXXX 100644
221
--- a/block/qapi-sysemu.c
222
+++ b/block/qapi-sysemu.c
223
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
224
bool has_boundaries, uint64List *boundaries,
225
bool has_boundaries_read, uint64List *boundaries_read,
226
bool has_boundaries_write, uint64List *boundaries_write,
227
+ bool has_boundaries_append, uint64List *boundaries_append,
228
bool has_boundaries_flush, uint64List *boundaries_flush,
229
Error **errp)
230
{
231
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
232
}
233
}
234
235
+ if (has_boundaries || has_boundaries_append) {
236
+ ret = block_latency_histogram_set(
237
+ stats, BLOCK_ACCT_ZONE_APPEND,
238
+ has_boundaries_append ? boundaries_append : boundaries);
239
+ if (ret) {
240
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
241
+ return;
242
+ }
243
+ }
244
+
245
if (has_boundaries || has_boundaries_flush) {
246
ret = block_latency_histogram_set(
247
stats, BLOCK_ACCT_FLUSH,
248
diff --git a/block/qapi.c b/block/qapi.c
249
index XXXXXXX..XXXXXXX 100644
250
--- a/block/qapi.c
251
+++ b/block/qapi.c
252
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
253
254
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
255
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
256
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
257
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
258
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
259
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
260
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
261
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
262
263
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
264
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
265
+ ds->failed_zone_append_operations =
266
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
267
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
268
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
269
270
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
271
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
272
+ ds->invalid_zone_append_operations =
273
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
274
ds->invalid_flush_operations =
275
stats->invalid_ops[BLOCK_ACCT_FLUSH];
276
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
277
278
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
279
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
280
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
281
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
282
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
283
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
284
+ ds->zone_append_total_time_ns =
285
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
286
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
287
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
288
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
289
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
290
291
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
292
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
293
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
294
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
295
296
dev_stats->interval_length = ts->interval_length;
297
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
298
dev_stats->max_wr_latency_ns = timed_average_max(wr);
299
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
300
301
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
302
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
303
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
304
+
305
dev_stats->min_flush_latency_ns = timed_average_min(fl);
306
dev_stats->max_flush_latency_ns = timed_average_max(fl);
307
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
308
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
309
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
310
dev_stats->avg_wr_queue_depth =
311
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
312
+ dev_stats->avg_zone_append_queue_depth =
313
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
314
315
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
316
}
317
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
318
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
319
ds->wr_latency_histogram
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
321
+ ds->zone_append_latency_histogram
322
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
323
ds->flush_latency_histogram
324
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
325
}
326
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
327
index XXXXXXX..XXXXXXX 100644
328
--- a/hw/block/virtio-blk.c
329
+++ b/hw/block/virtio-blk.c
330
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
331
data->in_num = in_num;
332
data->zone_append_data.offset = offset;
333
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
334
+
335
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
336
+ BLOCK_ACCT_ZONE_APPEND);
337
+
338
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
339
virtio_blk_zone_append_complete, data);
340
return 0;
341
--
342
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20230407082528.18841-5-faithilikerun@gmail.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
hw/block/virtio-blk.c | 12 ++++++++++++
9
hw/block/trace-events | 7 +++++++
10
2 files changed, 19 insertions(+)
11
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/block/virtio-blk.c
15
+++ b/hw/block/virtio-blk.c
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
17
int64_t nz = data->zone_report_data.nr_zones;
18
int8_t err_status = VIRTIO_BLK_S_OK;
19
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
21
if (ret) {
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
23
goto out;
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
26
sizeof(struct virtio_blk_zone_report)) /
27
sizeof(struct virtio_blk_zone_descriptor);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
30
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
65
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
99
--
100
2.39.2
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Message-id: 20230407082528.18841-6-faithilikerun@gmail.com
8
[Fix Sphinx indentation error by turning command-lines into
9
pre-formatted text.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
docs/devel/zoned-storage.rst | 25 ++++++++++++++++++++++---
14
1 file changed, 22 insertions(+), 3 deletions(-)
15
16
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
17
index XXXXXXX..XXXXXXX 100644
18
--- a/docs/devel/zoned-storage.rst
19
+++ b/docs/devel/zoned-storage.rst
20
@@ -XXX,XX +XXX,XX @@ When the BlockBackend's BlockLimits model reports a zoned storage device, users
21
like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
22
APIs for zoned storage emulation or testing.
23
24
-For example, to test zone_report on a null_blk device using qemu-io is:
25
-$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
26
--c "zrp offset nr_zones"
27
+For example, to test zone_report on a null_blk device using qemu-io is::
28
+
29
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
30
+
31
+To expose the host's zoned block device through virtio-blk, the command line
32
+can be (includes the -device parameter)::
33
+
34
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
35
+ -device virtio-blk-pci,drive=drive0
36
+
37
+Or only use the -drive parameter::
38
+
39
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
40
+
41
+Additionally, QEMU has several ways of supporting zoned storage, including:
42
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
43
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
44
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
45
+purposes, it cannot yet pass through a zoned device from the host. To pass on
46
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
47
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
48
+attached to the HBA.
49
--
50
2.39.2
diff view generated by jsdifflib
1
From: Carlos Santos <casantos@redhat.com>
1
Tests should place their files into the test directory. This includes
2
Unix sockets. 205 currently fails to do so, which prevents it from
3
being run concurrently.
2
4
3
It is not useful when configuring with --enable-trace-backends=nop.
5
Signed-off-by: Max Reitz <mreitz@redhat.com>
4
6
Message-id: 20190618210238.9524-1-mreitz@redhat.com
5
Signed-off-by: Carlos Santos <casantos@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Max Reitz <mreitz@redhat.com>
7
Message-Id: <20230408010410.281263-1-casantos@redhat.com>
8
---
9
---
9
trace/meson.build | 2 +-
10
tests/qemu-iotests/205 | 2 +-
10
1 file changed, 1 insertion(+), 1 deletion(-)
11
1 file changed, 1 insertion(+), 1 deletion(-)
11
12
12
diff --git a/trace/meson.build b/trace/meson.build
13
diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100755
14
--- a/trace/meson.build
15
--- a/tests/qemu-iotests/205
15
+++ b/trace/meson.build
16
+++ b/tests/qemu-iotests/205
16
@@ -XXX,XX +XXX,XX @@ trace_events_all = custom_target('trace-events-all',
17
@@ -XXX,XX +XXX,XX @@ import iotests
17
input: trace_events_files,
18
import time
18
command: [ 'cat', '@INPUT@' ],
19
from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
19
capture: true,
20
20
- install: true,
21
-nbd_sock = 'nbd_sock'
21
+ install: get_option('trace_backends') != [ 'nop' ],
22
+nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
22
install_dir: qemu_datadir)
23
nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
23
24
disk = os.path.join(iotests.test_dir, 'disk')
24
if 'ust' in get_option('trace_backends')
25
25
--
26
--
26
2.39.2
27
2.21.0
28
29
diff view generated by jsdifflib