1
The following changes since commit bfec359afba088aaacc7d316f43302f28c6e642a:
1
The following changes since commit 8844bb8d896595ee1d25d21c770e6e6f29803097:
2
2
3
Merge remote-tracking branch 'remotes/armbru/tags/pull-qdev-2017-04-21' into staging (2017-04-21 11:42:03 +0100)
3
Merge tag 'or1k-pull-request-20230513' of https://github.com/stffrdhrn/qemu into staging (2023-05-13 11:23:14 +0100)
4
4
5
are available in the git repository at:
5
are available in the Git repository at:
6
6
7
git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 1507631e438930bc07f776f303af127a9cdb4d41:
9
for you to fetch changes up to 01562fee5f3ad4506d57dbcf4b1903b565eceec7:
10
10
11
qemu-iotests: _cleanup_qemu must be called on exit (2017-04-21 08:32:44 -0400)
11
docs/zoned-storage:add zoned emulation use case (2023-05-15 08:19:04 -0400)
12
13
----------------------------------------------------------------
14
Pull request
15
16
This pull request contain's Sam Li's zoned storage support in the QEMU block
17
layer and virtio-blk emulation.
18
19
v2:
20
- Sam fixed the CI failures. CI passes for me now. [Richard]
12
21
13
----------------------------------------------------------------
22
----------------------------------------------------------------
14
23
15
Block patches for 2.10
24
Sam Li (16):
25
block/block-common: add zoned device structs
26
block/file-posix: introduce helper functions for sysfs attributes
27
block/block-backend: add block layer APIs resembling Linux
28
ZonedBlockDevice ioctls
29
block/raw-format: add zone operations to pass through requests
30
block: add zoned BlockDriver check to block layer
31
iotests: test new zone operations
32
block: add some trace events for new block layer APIs
33
docs/zoned-storage: add zoned device documentation
34
file-posix: add tracking of the zone write pointers
35
block: introduce zone append write for zoned devices
36
qemu-iotests: test zone append operation
37
block: add some trace events for zone append
38
virtio-blk: add zoned storage emulation for zoned devices
39
block: add accounting for zone append operation
40
virtio-blk: add some trace events for zoned emulation
41
docs/zoned-storage:add zoned emulation use case
16
42
17
----------------------------------------------------------------
43
docs/devel/index-api.rst | 1 +
18
44
docs/devel/zoned-storage.rst | 62 +++
19
Ashish Mittal (2):
45
qapi/block-core.json | 68 ++-
20
block/vxhs.c: Add support for a new block device type called "vxhs"
46
qapi/block.json | 4 +
21
block/vxhs.c: Add qemu-iotests for new block device type "vxhs"
47
meson.build | 5 +
22
48
include/block/accounting.h | 1 +
23
Jeff Cody (10):
49
include/block/block-common.h | 57 ++
24
qemu-iotests: exclude vxhs from image creation via protocol
50
include/block/block-io.h | 13 +
25
block: add bdrv_set_read_only() helper function
51
include/block/block_int-common.h | 37 ++
26
block: do not set BDS read_only if copy_on_read enabled
52
include/block/raw-aio.h | 8 +-
27
block: honor BDRV_O_ALLOW_RDWR when clearing bs->read_only
53
include/sysemu/block-backend-io.h | 27 +
28
block: code movement
54
block.c | 19 +
29
block: introduce bdrv_can_set_read_only()
55
block/block-backend.c | 198 +++++++
30
block: use bdrv_can_set_read_only() during reopen
56
block/file-posix.c | 692 +++++++++++++++++++++++--
31
block/rbd - update variable names to more apt names
57
block/io.c | 68 +++
32
block/rbd: Add support for reopen()
58
block/io_uring.c | 4 +
33
qemu-iotests: _cleanup_qemu must be called on exit
59
block/linux-aio.c | 3 +
34
60
block/qapi-sysemu.c | 11 +
35
block.c | 56 +++-
61
block/qapi.c | 18 +
36
block/Makefile.objs | 2 +
62
block/raw-format.c | 26 +
37
block/bochs.c | 5 +-
63
hw/block/virtio-blk-common.c | 2 +
38
block/cloop.c | 5 +-
64
hw/block/virtio-blk.c | 405 +++++++++++++++
39
block/dmg.c | 6 +-
65
hw/virtio/virtio-qmp.c | 2 +
40
block/rbd.c | 65 +++--
66
qemu-io-cmds.c | 224 ++++++++
41
block/trace-events | 17 ++
67
block/trace-events | 4 +
42
block/vvfat.c | 19 +-
68
docs/system/qemu-block-drivers.rst.inc | 6 +
43
block/vxhs.c | 575 +++++++++++++++++++++++++++++++++++++++
69
hw/block/trace-events | 7 +
44
configure | 39 +++
70
tests/qemu-iotests/227.out | 18 +
45
include/block/block.h | 2 +
71
tests/qemu-iotests/tests/zoned | 105 ++++
46
qapi/block-core.json | 23 +-
72
tests/qemu-iotests/tests/zoned.out | 69 +++
47
tests/qemu-iotests/017 | 1 +
73
30 files changed, 2106 insertions(+), 58 deletions(-)
48
tests/qemu-iotests/020 | 1 +
74
create mode 100644 docs/devel/zoned-storage.rst
49
tests/qemu-iotests/028 | 1 +
75
create mode 100755 tests/qemu-iotests/tests/zoned
50
tests/qemu-iotests/029 | 1 +
76
create mode 100644 tests/qemu-iotests/tests/zoned.out
51
tests/qemu-iotests/073 | 1 +
52
tests/qemu-iotests/094 | 11 +-
53
tests/qemu-iotests/102 | 5 +-
54
tests/qemu-iotests/109 | 1 +
55
tests/qemu-iotests/114 | 1 +
56
tests/qemu-iotests/117 | 1 +
57
tests/qemu-iotests/130 | 2 +
58
tests/qemu-iotests/134 | 1 +
59
tests/qemu-iotests/140 | 1 +
60
tests/qemu-iotests/141 | 1 +
61
tests/qemu-iotests/143 | 1 +
62
tests/qemu-iotests/156 | 2 +
63
tests/qemu-iotests/158 | 1 +
64
tests/qemu-iotests/common | 6 +
65
tests/qemu-iotests/common.config | 13 +
66
tests/qemu-iotests/common.filter | 1 +
67
tests/qemu-iotests/common.rc | 19 ++
68
33 files changed, 844 insertions(+), 42 deletions(-)
69
create mode 100644 block/vxhs.c
70
77
71
--
78
--
72
2.9.3
79
2.40.1
73
74
diff view generated by jsdifflib
1
The protocol VXHS does not support image creation. Some tests expect
1
From: Sam Li <faithilikerun@gmail.com>
2
to be able to create images through the protocol. Exclude VXHS from
3
these tests.
4
2
5
Signed-off-by: Jeff Cody <jcody@redhat.com>
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230508045533.175575-2-faithilikerun@gmail.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
16
---
7
tests/qemu-iotests/017 | 1 +
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
8
tests/qemu-iotests/020 | 1 +
18
1 file changed, 43 insertions(+)
9
tests/qemu-iotests/029 | 1 +
10
tests/qemu-iotests/073 | 1 +
11
tests/qemu-iotests/114 | 1 +
12
tests/qemu-iotests/130 | 1 +
13
tests/qemu-iotests/134 | 1 +
14
tests/qemu-iotests/156 | 1 +
15
tests/qemu-iotests/158 | 1 +
16
9 files changed, 9 insertions(+)
17
19
18
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
19
index XXXXXXX..XXXXXXX 100755
21
index XXXXXXX..XXXXXXX 100644
20
--- a/tests/qemu-iotests/017
22
--- a/include/block/block-common.h
21
+++ b/tests/qemu-iotests/017
23
+++ b/include/block/block-common.h
22
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
23
# Any format supporting backing files
25
typedef struct BdrvChild BdrvChild;
24
_supported_fmt qcow qcow2 vmdk qed
26
typedef struct BdrvChildClass BdrvChildClass;
25
_supported_proto generic
27
26
+_unsupported_proto vxhs
28
+typedef enum BlockZoneOp {
27
_supported_os Linux
29
+ BLK_ZO_OPEN,
28
_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
30
+ BLK_ZO_CLOSE,
29
31
+ BLK_ZO_FINISH,
30
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
32
+ BLK_ZO_RESET,
31
index XXXXXXX..XXXXXXX 100755
33
+} BlockZoneOp;
32
--- a/tests/qemu-iotests/020
34
+
33
+++ b/tests/qemu-iotests/020
35
+typedef enum BlockZoneModel {
34
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
35
# Any format supporting backing files
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
36
_supported_fmt qcow qcow2 vmdk qed
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
37
_supported_proto generic
39
+} BlockZoneModel;
38
+_unsupported_proto vxhs
40
+
39
_supported_os Linux
41
+typedef enum BlockZoneState {
40
_unsupported_imgopts "subformat=monolithicFlat" \
42
+ BLK_ZS_NOT_WP = 0x0,
41
"subformat=twoGbMaxExtentFlat" \
43
+ BLK_ZS_EMPTY = 0x1,
42
diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029
44
+ BLK_ZS_IOPEN = 0x2,
43
index XXXXXXX..XXXXXXX 100755
45
+ BLK_ZS_EOPEN = 0x3,
44
--- a/tests/qemu-iotests/029
46
+ BLK_ZS_CLOSED = 0x4,
45
+++ b/tests/qemu-iotests/029
47
+ BLK_ZS_RDONLY = 0xD,
46
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
48
+ BLK_ZS_FULL = 0xE,
47
# Any format supporting intenal snapshots
49
+ BLK_ZS_OFFLINE = 0xF,
48
_supported_fmt qcow2
50
+} BlockZoneState;
49
_supported_proto generic
51
+
50
+_unsupported_proto vxhs
52
+typedef enum BlockZoneType {
51
_supported_os Linux
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
52
# Internal snapshots are (currently) impossible with refcount_bits=1
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
53
_unsupported_imgopts 'refcount_bits=1[^0-9]'
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
54
diff --git a/tests/qemu-iotests/073 b/tests/qemu-iotests/073
56
+} BlockZoneType;
55
index XXXXXXX..XXXXXXX 100755
57
+
56
--- a/tests/qemu-iotests/073
58
+/*
57
+++ b/tests/qemu-iotests/073
59
+ * Zone descriptor data structure.
58
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
60
+ * Provides information on a zone with all position and size values in bytes.
59
61
+ */
60
_supported_fmt qcow2
62
+typedef struct BlockZoneDescriptor {
61
_supported_proto generic
63
+ uint64_t start;
62
+_unsupported_proto vxhs
64
+ uint64_t length;
63
_supported_os Linux
65
+ uint64_t cap;
64
66
+ uint64_t wp;
65
CLUSTER_SIZE=64k
67
+ BlockZoneType type;
66
diff --git a/tests/qemu-iotests/114 b/tests/qemu-iotests/114
68
+ BlockZoneState state;
67
index XXXXXXX..XXXXXXX 100755
69
+} BlockZoneDescriptor;
68
--- a/tests/qemu-iotests/114
70
+
69
+++ b/tests/qemu-iotests/114
71
typedef struct BlockDriverInfo {
70
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
72
/* in bytes, 0 if irrelevant */
71
73
int cluster_size;
72
_supported_fmt qcow2
73
_supported_proto generic
74
+_unsupported_proto vxhs
75
_supported_os Linux
76
77
78
diff --git a/tests/qemu-iotests/130 b/tests/qemu-iotests/130
79
index XXXXXXX..XXXXXXX 100755
80
--- a/tests/qemu-iotests/130
81
+++ b/tests/qemu-iotests/130
82
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
83
84
_supported_fmt qcow2
85
_supported_proto generic
86
+_unsupported_proto vxhs
87
_supported_os Linux
88
89
qemu_comm_method="monitor"
90
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
91
index XXXXXXX..XXXXXXX 100755
92
--- a/tests/qemu-iotests/134
93
+++ b/tests/qemu-iotests/134
94
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
95
96
_supported_fmt qcow2
97
_supported_proto generic
98
+_unsupported_proto vxhs
99
_supported_os Linux
100
101
102
diff --git a/tests/qemu-iotests/156 b/tests/qemu-iotests/156
103
index XXXXXXX..XXXXXXX 100755
104
--- a/tests/qemu-iotests/156
105
+++ b/tests/qemu-iotests/156
106
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
107
108
_supported_fmt qcow2 qed
109
_supported_proto generic
110
+_unsupported_proto vxhs
111
_supported_os Linux
112
113
# Create source disk
114
diff --git a/tests/qemu-iotests/158 b/tests/qemu-iotests/158
115
index XXXXXXX..XXXXXXX 100755
116
--- a/tests/qemu-iotests/158
117
+++ b/tests/qemu-iotests/158
118
@@ -XXX,XX +XXX,XX @@ trap "_cleanup; exit \$status" 0 1 2 3 15
119
120
_supported_fmt qcow2
121
_supported_proto generic
122
+_unsupported_proto vxhs
123
_supported_os Linux
124
125
126
--
74
--
127
2.9.3
75
2.40.1
128
76
129
77
diff view generated by jsdifflib
1
A few block drivers will set the BDS read_only flag from their
1
From: Sam Li <faithilikerun@gmail.com>
2
.bdrv_open() function. This means the bs->read_only flag could
2
3
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
3
Use get_sysfs_str_val() to get the string value of device
4
flag check occurs prior to the call to bdrv->bdrv_open().
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
5
BlockZoneModel type of QEMU.
6
This adds an error return to bdrv_set_read_only(), and an error will be
6
7
return if we try to set the BDS to read_only while copy_on_read is
7
Use get_sysfs_long_val() to get the long value of zoned device
8
enabled.
8
information.
9
9
10
This patch also changes the behavior of vvfat. Before, vvfat could
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
override the drive 'readonly' flag with its own, internal 'rw' flag.
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
13
For instance, this -drive parameter would result in a writable image:
14
15
"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"
16
17
This is not correct. Now, attempting to use the above -drive parameter
18
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').
19
20
Signed-off-by: Jeff Cody <jcody@redhat.com>
21
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Reviewed-by: John Snow <jsnow@redhat.com>
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
23
Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
<philmd@linaro.org>.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
---
23
---
25
block.c | 10 +++++++++-
24
include/block/block_int-common.h | 3 +
26
block/bochs.c | 5 ++++-
25
block/file-posix.c | 135 ++++++++++++++++++++++---------
27
block/cloop.c | 5 ++++-
26
2 files changed, 100 insertions(+), 38 deletions(-)
28
block/dmg.c | 6 +++++-
27
29
block/rbd.c | 11 ++++++++++-
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
30
block/vvfat.c | 19 +++++++++++++++----
31
include/block/block.h | 2 +-
32
7 files changed, 48 insertions(+), 10 deletions(-)
33
34
diff --git a/block.c b/block.c
35
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
36
--- a/block.c
30
--- a/include/block/block_int-common.h
37
+++ b/block.c
31
+++ b/include/block/block_int-common.h
38
@@ -XXX,XX +XXX,XX @@ void path_combine(char *dest, int dest_size,
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
39
}
33
* an explicit monitor command to load the disk inside the guest).
34
*/
35
bool has_variable_length;
36
+
37
+ /* device zone model */
38
+ BlockZoneModel zoned;
39
} BlockLimits;
40
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
42
diff --git a/block/file-posix.c b/block/file-posix.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block/file-posix.c
45
+++ b/block/file-posix.c
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
47
#endif
40
}
48
}
41
49
42
-void bdrv_set_read_only(BlockDriverState *bs, bool read_only)
50
-static int hdev_get_max_segments(int fd, struct stat *st)
43
+int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
51
+/*
52
+ * Get a sysfs attribute value as character string.
53
+ */
54
+#ifdef CONFIG_LINUX
55
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
56
+ char **val) {
57
+ g_autofree char *sysfspath = NULL;
58
+ int ret;
59
+ size_t len;
60
+
61
+ if (!S_ISBLK(st->st_mode)) {
62
+ return -ENOTSUP;
63
+ }
64
+
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
66
+ major(st->st_rdev), minor(st->st_rdev),
67
+ attribute);
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
69
+ if (ret == -1) {
70
+ return -ENOENT;
71
+ }
72
+
73
+ /* The file is ended with '\n' */
74
+ char *p;
75
+ p = *val;
76
+ if (*(p + len - 1) == '\n') {
77
+ *(p + len - 1) = '\0';
78
+ }
79
+ return ret;
80
+}
81
+#endif
82
+
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
44
{
84
{
45
+ /* Do not set read_only if copy_on_read is enabled */
85
+ g_autofree char *val = NULL;
46
+ if (bs->copy_on_read && read_only) {
86
+ int ret;
47
+ error_setg(errp, "Can't set node '%s' to r/o with copy-on-read enabled",
87
+
48
+ bdrv_get_device_or_node_name(bs));
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
49
+ return -EINVAL;
50
+ }
51
+
52
bs->read_only = read_only;
53
+ return 0;
54
}
55
56
void bdrv_get_full_backing_filename_from_filename(const char *backed,
57
diff --git a/block/bochs.c b/block/bochs.c
58
index XXXXXXX..XXXXXXX 100644
59
--- a/block/bochs.c
60
+++ b/block/bochs.c
61
@@ -XXX,XX +XXX,XX @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
62
return -EINVAL;
63
}
64
65
- bdrv_set_read_only(bs, true); /* no write support yet */
66
+ ret = bdrv_set_read_only(bs, true, errp); /* no write support yet */
67
+ if (ret < 0) {
89
+ if (ret < 0) {
68
+ return ret;
90
+ return ret;
69
+ }
91
+ }
70
92
+
71
ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
93
+ if (strcmp(val, "host-managed") == 0) {
72
if (ret < 0) {
94
+ *zoned = BLK_Z_HM;
73
diff --git a/block/cloop.c b/block/cloop.c
95
+ } else if (strcmp(val, "host-aware") == 0) {
74
index XXXXXXX..XXXXXXX 100644
96
+ *zoned = BLK_Z_HA;
75
--- a/block/cloop.c
97
+ } else if (strcmp(val, "none") == 0) {
76
+++ b/block/cloop.c
98
+ *zoned = BLK_Z_NONE;
77
@@ -XXX,XX +XXX,XX @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
99
+ } else {
78
return -EINVAL;
100
+ return -ENOTSUP;
79
}
101
+ }
80
102
+ return 0;
81
- bdrv_set_read_only(bs, true);
103
+}
82
+ ret = bdrv_set_read_only(bs, true, errp);
104
+
105
+/*
106
+ * Get a sysfs attribute value as a long integer.
107
+ */
108
#ifdef CONFIG_LINUX
109
- char buf[32];
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
+{
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
83
+ if (ret < 0) {
119
+ if (ret < 0) {
84
+ return ret;
120
+ return ret;
85
+ }
121
+ }
86
122
+
87
/* read header */
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
88
ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
124
+ ret = qemu_strtol(str, &end, 10, &val);
89
diff --git a/block/dmg.c b/block/dmg.c
125
+ if (ret == 0 && end && *end == '\0') {
90
index XXXXXXX..XXXXXXX 100644
126
+ ret = val;
91
--- a/block/dmg.c
127
+ }
92
+++ b/block/dmg.c
128
+ return ret;
93
@@ -XXX,XX +XXX,XX @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
129
+}
94
return -EINVAL;
130
+#endif
131
+
132
+static int hdev_get_max_segments(int fd, struct stat *st)
133
+{
134
+#ifdef CONFIG_LINUX
135
int ret;
136
- int sysfd = -1;
137
- long max_segments;
138
139
if (S_ISCHR(st->st_mode)) {
140
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
141
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
142
}
143
return -ENOTSUP;
95
}
144
}
96
145
-
97
+ ret = bdrv_set_read_only(bs, true, errp);
146
- if (!S_ISBLK(st->st_mode)) {
98
+ if (ret < 0) {
147
- return -ENOTSUP;
99
+ return ret;
148
- }
100
+ }
149
-
101
+
150
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
102
block_module_load_one("dmg-bz2");
151
- major(st->st_rdev), minor(st->st_rdev));
103
- bdrv_set_read_only(bs, true);
152
- sysfd = open(sysfspath, O_RDONLY);
104
153
- if (sysfd == -1) {
105
s->n_chunks = 0;
154
- ret = -errno;
106
s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
155
- goto out;
107
diff --git a/block/rbd.c b/block/rbd.c
156
- }
108
index XXXXXXX..XXXXXXX 100644
157
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
109
--- a/block/rbd.c
158
- if (ret < 0) {
110
+++ b/block/rbd.c
159
- ret = -errno;
111
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
160
- goto out;
112
goto failed_shutdown;
161
- } else if (ret == 0) {
162
- ret = -EIO;
163
- goto out;
164
- }
165
- buf[ret] = 0;
166
- /* The file is ended with '\n', pass 'end' to accept that. */
167
- ret = qemu_strtol(buf, &end, 10, &max_segments);
168
- if (ret == 0 && end && *end == '\n') {
169
- ret = max_segments;
170
- }
171
-
172
-out:
173
- if (sysfd != -1) {
174
- close(sysfd);
175
- }
176
- g_free(sysfspath);
177
- return ret;
178
+ return get_sysfs_long_val(st, "max_segments");
179
#else
180
return -ENOTSUP;
181
#endif
182
}
183
184
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
185
+ Error **errp)
186
+{
187
+ BlockZoneModel zoned;
188
+ int ret;
189
+
190
+ bs->bl.zoned = BLK_Z_NONE;
191
+
192
+ ret = get_sysfs_zoned_model(st, &zoned);
193
+ if (ret < 0 || zoned == BLK_Z_NONE) {
194
+ return;
195
+ }
196
+ bs->bl.zoned = zoned;
197
+}
198
+
199
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
200
{
201
BDRVRawState *s = bs->opaque;
202
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
203
bs->bl.max_hw_iov = ret;
204
}
113
}
205
}
114
206
+
115
+ /* rbd_open is always r/w */
207
+ raw_refresh_zoned_limits(bs, &st, errp);
116
r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
208
}
117
if (r < 0) {
209
118
error_setg_errno(errp, -r, "error reading header from %s", s->name);
210
static int check_for_dasd(int fd)
119
goto failed_open;
120
}
121
122
- bdrv_set_read_only(bs, (s->snap != NULL));
123
+ /* If we are using an rbd snapshot, we must be r/o, otherwise
124
+ * leave as-is */
125
+ if (s->snap != NULL) {
126
+ r = bdrv_set_read_only(bs, true, &local_err);
127
+ if (r < 0) {
128
+ error_propagate(errp, local_err);
129
+ goto failed_open;
130
+ }
131
+ }
132
133
qemu_opts_del(opts);
134
return 0;
135
diff --git a/block/vvfat.c b/block/vvfat.c
136
index XXXXXXX..XXXXXXX 100644
137
--- a/block/vvfat.c
138
+++ b/block/vvfat.c
139
@@ -XXX,XX +XXX,XX @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
140
141
s->current_cluster=0xffffffff;
142
143
- /* read only is the default for safety */
144
- bdrv_set_read_only(bs, true);
145
s->qcow = NULL;
146
s->qcow_filename = NULL;
147
s->fat2 = NULL;
148
@@ -XXX,XX +XXX,XX @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
149
s->sector_count = cyls * heads * secs - (s->first_sectors_number - 1);
150
151
if (qemu_opt_get_bool(opts, "rw", false)) {
152
- ret = enable_write_target(bs, errp);
153
+ if (!bdrv_is_read_only(bs)) {
154
+ ret = enable_write_target(bs, errp);
155
+ if (ret < 0) {
156
+ goto fail;
157
+ }
158
+ } else {
159
+ ret = -EPERM;
160
+ error_setg(errp,
161
+ "Unable to set VVFAT to 'rw' when drive is read-only");
162
+ goto fail;
163
+ }
164
+ } else {
165
+ /* read only is the default for safety */
166
+ ret = bdrv_set_read_only(bs, true, &local_err);
167
if (ret < 0) {
168
+ error_propagate(errp, local_err);
169
goto fail;
170
}
171
- bdrv_set_read_only(bs, false);
172
}
173
174
bs->total_sectors = cyls * heads * secs;
175
diff --git a/include/block/block.h b/include/block/block.h
176
index XXXXXXX..XXXXXXX 100644
177
--- a/include/block/block.h
178
+++ b/include/block/block.h
179
@@ -XXX,XX +XXX,XX @@ int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
180
int64_t sector_num, int nb_sectors, int *pnum);
181
182
bool bdrv_is_read_only(BlockDriverState *bs);
183
-void bdrv_set_read_only(BlockDriverState *bs, bool read_only);
184
+int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
185
bool bdrv_is_sg(BlockDriverState *bs);
186
bool bdrv_is_inserted(BlockDriverState *bs);
187
int bdrv_media_changed(BlockDriverState *bs);
188
--
211
--
189
2.9.3
212
2.40.1
190
213
191
214
diff view generated by jsdifflib
1
This adds support for reopen in rbd, for changing between r/w and r/o.
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Note, that this is only a flag change, but we will block a change from
3
Add zoned device option to host_device BlockDriver. It will be presented only
4
r/o to r/w if we are using an RBD internal snapshot.
4
for zoned host block devices. By adding zone management operations to the
5
host_block_device BlockDriver, users can use the new block layer APIs
6
including Report Zone and four zone management operations
7
(open, close, finish, reset, reset_all).
5
8
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
11
zone_finish(zf).
12
13
For example, to test zone_report, use following command:
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
15
-c "zrp offset nr_zones"
16
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Jeff Cody <jcody@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Reviewed-by: John Snow <jsnow@redhat.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Message-id: d4e87539167ec6527d44c97b164eabcccf96e4f3.1491597120.git.jcody@redhat.com
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
30
---
11
block/rbd.c | 21 +++++++++++++++++++++
31
meson.build | 5 +
12
1 file changed, 21 insertions(+)
32
include/block/block-io.h | 9 +
33
include/block/block_int-common.h | 21 ++
34
include/block/raw-aio.h | 6 +-
35
include/sysemu/block-backend-io.h | 18 ++
36
block/block-backend.c | 137 +++++++++++++
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
38
block/io.c | 41 ++++
39
qemu-io-cmds.c | 149 ++++++++++++++
40
9 files changed, 696 insertions(+), 3 deletions(-)
13
41
14
diff --git a/block/rbd.c b/block/rbd.c
42
diff --git a/meson.build b/meson.build
15
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
16
--- a/block/rbd.c
44
--- a/meson.build
17
+++ b/block/rbd.c
45
+++ b/meson.build
18
@@ -XXX,XX +XXX,XX @@ failed_opts:
46
@@ -XXX,XX +XXX,XX @@ if rdma.found()
19
return r;
47
endif
48
49
# has_header_symbol
50
+config_host_data.set('CONFIG_BLKZONED',
51
+ cc.has_header_symbol('linux/blkzoned.h', 'BLKOPENZONE'))
52
config_host_data.set('CONFIG_EPOLL_CREATE1',
53
cc.has_header_symbol('sys/epoll.h', 'epoll_create1'))
54
config_host_data.set('CONFIG_FALLOCATE_PUNCH_HOLE',
55
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
56
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
57
cc.has_member('struct stat', 'st_atim',
58
prefix: '#include <sys/stat.h>'))
59
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
60
+ cc.has_member('struct blk_zone', 'capacity',
61
+ prefix: '#include <linux/blkzoned.h>'))
62
63
# has_type
64
config_host_data.set('CONFIG_IOVEC',
65
diff --git a/include/block/block-io.h b/include/block/block-io.h
66
index XXXXXXX..XXXXXXX 100644
67
--- a/include/block/block-io.h
68
+++ b/include/block/block-io.h
69
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
70
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
71
int64_t bytes);
72
73
+/* Report zone information of zone block device. */
74
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
75
+ int64_t offset,
76
+ unsigned int *nr_zones,
77
+ BlockZoneDescriptor *zones);
78
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
79
+ BlockZoneOp op,
80
+ int64_t offset, int64_t len);
81
+
82
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
83
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
84
int64_t bytes, int64_t *pnum, int64_t *map,
85
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
86
index XXXXXXX..XXXXXXX 100644
87
--- a/include/block/block_int-common.h
88
+++ b/include/block/block_int-common.h
89
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
90
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
91
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
92
93
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
94
+ int64_t offset, unsigned int *nr_zones,
95
+ BlockZoneDescriptor *zones);
96
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
97
+ int64_t offset, int64_t len);
98
+
99
/* removable device specific */
100
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
101
BlockDriverState *bs);
102
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
103
104
/* device zone model */
105
BlockZoneModel zoned;
106
+
107
+ /* zone size expressed in bytes */
108
+ uint32_t zone_size;
109
+
110
+ /* total number of zones */
111
+ uint32_t nr_zones;
112
+
113
+ /* maximum sectors of a zone append write operation */
114
+ uint32_t max_append_sectors;
115
+
116
+ /* maximum number of open zones */
117
+ uint32_t max_open_zones;
118
+
119
+ /* maximum number of active zones */
120
+ uint32_t max_active_zones;
121
} BlockLimits;
122
123
typedef struct BdrvOpBlocker BdrvOpBlocker;
124
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
125
index XXXXXXX..XXXXXXX 100644
126
--- a/include/block/raw-aio.h
127
+++ b/include/block/raw-aio.h
128
@@ -XXX,XX +XXX,XX @@
129
#define QEMU_AIO_WRITE_ZEROES 0x0020
130
#define QEMU_AIO_COPY_RANGE 0x0040
131
#define QEMU_AIO_TRUNCATE 0x0080
132
+#define QEMU_AIO_ZONE_REPORT 0x0100
133
+#define QEMU_AIO_ZONE_MGMT 0x0200
134
#define QEMU_AIO_TYPE_MASK \
135
(QEMU_AIO_READ | \
136
QEMU_AIO_WRITE | \
137
@@ -XXX,XX +XXX,XX @@
138
QEMU_AIO_DISCARD | \
139
QEMU_AIO_WRITE_ZEROES | \
140
QEMU_AIO_COPY_RANGE | \
141
- QEMU_AIO_TRUNCATE)
142
+ QEMU_AIO_TRUNCATE | \
143
+ QEMU_AIO_ZONE_REPORT | \
144
+ QEMU_AIO_ZONE_MGMT)
145
146
/* AIO flags */
147
#define QEMU_AIO_MISALIGNED 0x1000
148
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
149
index XXXXXXX..XXXXXXX 100644
150
--- a/include/sysemu/block-backend-io.h
151
+++ b/include/sysemu/block-backend-io.h
152
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
153
BlockCompletionFunc *cb, void *opaque);
154
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
155
BlockCompletionFunc *cb, void *opaque);
156
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
157
+ unsigned int *nr_zones,
158
+ BlockZoneDescriptor *zones,
159
+ BlockCompletionFunc *cb, void *opaque);
160
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
161
+ int64_t offset, int64_t len,
162
+ BlockCompletionFunc *cb, void *opaque);
163
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
164
BlockCompletionFunc *cb, void *opaque);
165
void blk_aio_cancel_async(BlockAIOCB *acb);
166
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
167
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
168
int64_t bytes, BdrvRequestFlags flags);
169
170
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
171
+ unsigned int *nr_zones,
172
+ BlockZoneDescriptor *zones);
173
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
174
+ unsigned int *nr_zones,
175
+ BlockZoneDescriptor *zones);
176
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
177
+ int64_t offset, int64_t len);
178
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
179
+ int64_t offset, int64_t len);
180
+
181
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
182
int64_t bytes);
183
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
184
diff --git a/block/block-backend.c b/block/block-backend.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/block-backend.c
187
+++ b/block/block-backend.c
188
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
189
return ret;
20
}
190
}
21
191
22
+
192
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
23
+/* Since RBD is currently always opened R/W via the API,
193
+{
24
+ * we just need to check if we are using a snapshot or not, in
194
+ BlkAioEmAIOCB *acb = opaque;
25
+ * order to determine if we will allow it to be R/W */
195
+ BlkRwCo *rwco = &acb->rwco;
26
+static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
196
+
27
+ BlockReopenQueue *queue, Error **errp)
197
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
28
+{
198
+ (unsigned int*)(uintptr_t)acb->bytes,
29
+ BDRVRBDState *s = state->bs->opaque;
199
+ rwco->iobuf);
30
+ int ret = 0;
200
+ blk_aio_complete(acb);
31
+
201
+}
32
+ if (s->snap && state->flags & BDRV_O_RDWR) {
202
+
33
+ error_setg(errp,
203
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
34
+ "Cannot change node '%s' to r/w when using RBD snapshot",
204
+ unsigned int *nr_zones,
35
+ bdrv_get_device_or_node_name(state->bs));
205
+ BlockZoneDescriptor *zones,
36
+ ret = -EINVAL;
206
+ BlockCompletionFunc *cb, void *opaque)
37
+ }
207
+{
38
+
208
+ BlkAioEmAIOCB *acb;
209
+ Coroutine *co;
210
+ IO_CODE();
211
+
212
+ blk_inc_in_flight(blk);
213
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
214
+ acb->rwco = (BlkRwCo) {
215
+ .blk = blk,
216
+ .offset = offset,
217
+ .iobuf = zones,
218
+ .ret = NOT_DONE,
219
+ };
220
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
221
+ acb->has_returned = false;
222
+
223
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
224
+ aio_co_enter(blk_get_aio_context(blk), co);
225
+
226
+ acb->has_returned = true;
227
+ if (acb->rwco.ret != NOT_DONE) {
228
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
229
+ blk_aio_complete_bh, acb);
230
+ }
231
+
232
+ return &acb->common;
233
+}
234
+
235
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
236
+{
237
+ BlkAioEmAIOCB *acb = opaque;
238
+ BlkRwCo *rwco = &acb->rwco;
239
+
240
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
241
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
242
+ rwco->offset, acb->bytes);
243
+ blk_aio_complete(acb);
244
+}
245
+
246
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
247
+ int64_t offset, int64_t len,
248
+ BlockCompletionFunc *cb, void *opaque) {
249
+ BlkAioEmAIOCB *acb;
250
+ Coroutine *co;
251
+ IO_CODE();
252
+
253
+ blk_inc_in_flight(blk);
254
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
255
+ acb->rwco = (BlkRwCo) {
256
+ .blk = blk,
257
+ .offset = offset,
258
+ .iobuf = (void *)(uintptr_t)op,
259
+ .ret = NOT_DONE,
260
+ };
261
+ acb->bytes = len;
262
+ acb->has_returned = false;
263
+
264
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
265
+ aio_co_enter(blk_get_aio_context(blk), co);
266
+
267
+ acb->has_returned = true;
268
+ if (acb->rwco.ret != NOT_DONE) {
269
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
270
+ blk_aio_complete_bh, acb);
271
+ }
272
+
273
+ return &acb->common;
274
+}
275
+
276
+/*
277
+ * Send a zone_report command.
278
+ * offset is a byte offset from the start of the device. No alignment
279
+ * required for offset.
280
+ * nr_zones represents IN maximum and OUT actual.
281
+ */
282
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
283
+ unsigned int *nr_zones,
284
+ BlockZoneDescriptor *zones)
285
+{
286
+ int ret;
287
+ IO_CODE();
288
+
289
+ blk_inc_in_flight(blk); /* increase before waiting */
290
+ blk_wait_while_drained(blk);
291
+ GRAPH_RDLOCK_GUARD();
292
+ if (!blk_is_available(blk)) {
293
+ blk_dec_in_flight(blk);
294
+ return -ENOMEDIUM;
295
+ }
296
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
297
+ blk_dec_in_flight(blk);
39
+ return ret;
298
+ return ret;
40
+}
299
+}
41
+
300
+
42
static void qemu_rbd_close(BlockDriverState *bs)
301
+/*
302
+ * Send a zone_management command.
303
+ * op is the zone operation;
304
+ * offset is the byte offset from the start of the zoned device;
305
+ * len is the maximum number of bytes the command should operate on. It
306
+ * should be aligned with the device zone size.
307
+ */
308
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
309
+ int64_t offset, int64_t len)
310
+{
311
+ int ret;
312
+ IO_CODE();
313
+
314
+ blk_inc_in_flight(blk);
315
+ blk_wait_while_drained(blk);
316
+ GRAPH_RDLOCK_GUARD();
317
+
318
+ ret = blk_check_byte_request(blk, offset, len);
319
+ if (ret < 0) {
320
+ blk_dec_in_flight(blk);
321
+ return ret;
322
+ }
323
+
324
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
325
+ blk_dec_in_flight(blk);
326
+ return ret;
327
+}
328
+
329
void blk_drain(BlockBackend *blk)
43
{
330
{
44
BDRVRBDState *s = bs->opaque;
331
BlockDriverState *bs = blk_bs(blk);
45
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_rbd = {
332
diff --git a/block/file-posix.c b/block/file-posix.c
46
.bdrv_parse_filename = qemu_rbd_parse_filename,
333
index XXXXXXX..XXXXXXX 100644
47
.bdrv_file_open = qemu_rbd_open,
334
--- a/block/file-posix.c
48
.bdrv_close = qemu_rbd_close,
335
+++ b/block/file-posix.c
49
+ .bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
336
@@ -XXX,XX +XXX,XX @@
50
.bdrv_create = qemu_rbd_create,
337
#include <sys/param.h>
51
.bdrv_has_zero_init = bdrv_has_zero_init_1,
338
#include <sys/syscall.h>
52
.bdrv_get_info = qemu_rbd_getinfo,
339
#include <sys/vfs.h>
340
+#if defined(CONFIG_BLKZONED)
341
+#include <linux/blkzoned.h>
342
+#endif
343
#include <linux/cdrom.h>
344
#include <linux/fd.h>
345
#include <linux/fs.h>
346
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
347
PreallocMode prealloc;
348
Error **errp;
349
} truncate;
350
+ struct {
351
+ unsigned int *nr_zones;
352
+ BlockZoneDescriptor *zones;
353
+ } zone_report;
354
+ struct {
355
+ unsigned long op;
356
+ } zone_mgmt;
357
};
358
} RawPosixAIOData;
359
360
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
361
}
362
#endif
363
364
+#if defined(CONFIG_BLKZONED)
365
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
366
{
367
g_autofree char *val = NULL;
368
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
369
}
370
return 0;
371
}
372
+#endif /* defined(CONFIG_BLKZONED) */
373
374
/*
375
* Get a sysfs attribute value as a long integer.
376
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
377
#endif
378
}
379
380
+#if defined(CONFIG_BLKZONED)
381
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
382
Error **errp)
383
{
384
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
385
return;
386
}
387
bs->bl.zoned = zoned;
388
+
389
+ ret = get_sysfs_long_val(st, "max_open_zones");
390
+ if (ret >= 0) {
391
+ bs->bl.max_open_zones = ret;
392
+ }
393
+
394
+ ret = get_sysfs_long_val(st, "max_active_zones");
395
+ if (ret >= 0) {
396
+ bs->bl.max_active_zones = ret;
397
+ }
398
+
399
+ /*
400
+ * The zoned device must at least have zone size and nr_zones fields.
401
+ */
402
+ ret = get_sysfs_long_val(st, "chunk_sectors");
403
+ if (ret < 0) {
404
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
405
+ "sysfs attribute");
406
+ return;
407
+ } else if (!ret) {
408
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
409
+ return;
410
+ }
411
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
412
+
413
+ ret = get_sysfs_long_val(st, "nr_zones");
414
+ if (ret < 0) {
415
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
416
+ "sysfs attribute");
417
+ return;
418
+ } else if (!ret) {
419
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
420
+ return;
421
+ }
422
+ bs->bl.nr_zones = ret;
423
+
424
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
425
+ if (ret > 0) {
426
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
427
+ }
428
}
429
+#else /* !defined(CONFIG_BLKZONED) */
430
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
431
+ Error **errp)
432
+{
433
+ bs->bl.zoned = BLK_Z_NONE;
434
+}
435
+#endif /* !defined(CONFIG_BLKZONED) */
436
437
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
438
{
439
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
440
BDRVRawState *s = bs->opaque;
441
int ret;
442
443
- /* If DASD, get blocksizes */
444
+ /* If DASD or zoned devices, get blocksizes */
445
if (check_for_dasd(s->fd) < 0) {
446
- return -ENOTSUP;
447
+ /* zoned devices are not DASD */
448
+ if (bs->bl.zoned == BLK_Z_NONE) {
449
+ return -ENOTSUP;
450
+ }
451
}
452
ret = probe_logical_blocksize(s->fd, &bsz->log);
453
if (ret < 0) {
454
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
455
}
456
#endif
457
458
+/*
459
+ * parse_zone - Fill a zone descriptor
460
+ */
461
+#if defined(CONFIG_BLKZONED)
462
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
463
+ const struct blk_zone *blkz) {
464
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
465
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
466
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
467
+
468
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
469
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
470
+#else
471
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
472
+#endif
473
+
474
+ switch (blkz->type) {
475
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
476
+ zone->type = BLK_ZT_SWR;
477
+ break;
478
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
479
+ zone->type = BLK_ZT_SWP;
480
+ break;
481
+ case BLK_ZONE_TYPE_CONVENTIONAL:
482
+ zone->type = BLK_ZT_CONV;
483
+ break;
484
+ default:
485
+ error_report("Unsupported zone type: 0x%x", blkz->type);
486
+ return -ENOTSUP;
487
+ }
488
+
489
+ switch (blkz->cond) {
490
+ case BLK_ZONE_COND_NOT_WP:
491
+ zone->state = BLK_ZS_NOT_WP;
492
+ break;
493
+ case BLK_ZONE_COND_EMPTY:
494
+ zone->state = BLK_ZS_EMPTY;
495
+ break;
496
+ case BLK_ZONE_COND_IMP_OPEN:
497
+ zone->state = BLK_ZS_IOPEN;
498
+ break;
499
+ case BLK_ZONE_COND_EXP_OPEN:
500
+ zone->state = BLK_ZS_EOPEN;
501
+ break;
502
+ case BLK_ZONE_COND_CLOSED:
503
+ zone->state = BLK_ZS_CLOSED;
504
+ break;
505
+ case BLK_ZONE_COND_READONLY:
506
+ zone->state = BLK_ZS_RDONLY;
507
+ break;
508
+ case BLK_ZONE_COND_FULL:
509
+ zone->state = BLK_ZS_FULL;
510
+ break;
511
+ case BLK_ZONE_COND_OFFLINE:
512
+ zone->state = BLK_ZS_OFFLINE;
513
+ break;
514
+ default:
515
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
516
+ return -ENOTSUP;
517
+ }
518
+ return 0;
519
+}
520
+#endif
521
+
522
+#if defined(CONFIG_BLKZONED)
523
+static int handle_aiocb_zone_report(void *opaque)
524
+{
525
+ RawPosixAIOData *aiocb = opaque;
526
+ int fd = aiocb->aio_fildes;
527
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
528
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
529
+ /* zoned block devices use 512-byte sectors */
530
+ uint64_t sector = aiocb->aio_offset / 512;
531
+
532
+ struct blk_zone *blkz;
533
+ size_t rep_size;
534
+ unsigned int nrz;
535
+ int ret;
536
+ unsigned int n = 0, i = 0;
537
+
538
+ nrz = *nr_zones;
539
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
540
+ g_autofree struct blk_zone_report *rep = NULL;
541
+ rep = g_malloc(rep_size);
542
+
543
+ blkz = (struct blk_zone *)(rep + 1);
544
+ while (n < nrz) {
545
+ memset(rep, 0, rep_size);
546
+ rep->sector = sector;
547
+ rep->nr_zones = nrz - n;
548
+
549
+ do {
550
+ ret = ioctl(fd, BLKREPORTZONE, rep);
551
+ } while (ret != 0 && errno == EINTR);
552
+ if (ret != 0) {
553
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
554
+ fd, sector, errno);
555
+ return -errno;
556
+ }
557
+
558
+ if (!rep->nr_zones) {
559
+ break;
560
+ }
561
+
562
+ for (i = 0; i < rep->nr_zones; i++, n++) {
563
+ ret = parse_zone(&zones[n], &blkz[i]);
564
+ if (ret != 0) {
565
+ return ret;
566
+ }
567
+
568
+ /* The next report should start after the last zone reported */
569
+ sector = blkz[i].start + blkz[i].len;
570
+ }
571
+ }
572
+
573
+ *nr_zones = n;
574
+ return 0;
575
+}
576
+#endif
577
+
578
+#if defined(CONFIG_BLKZONED)
579
+static int handle_aiocb_zone_mgmt(void *opaque)
580
+{
581
+ RawPosixAIOData *aiocb = opaque;
582
+ int fd = aiocb->aio_fildes;
583
+ uint64_t sector = aiocb->aio_offset / 512;
584
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
585
+ struct blk_zone_range range;
586
+ int ret;
587
+
588
+ /* Execute the operation */
589
+ range.sector = sector;
590
+ range.nr_sectors = nr_sectors;
591
+ do {
592
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
593
+ } while (ret != 0 && errno == EINTR);
594
+
595
+ return ret;
596
+}
597
+#endif
598
+
599
static int handle_aiocb_copy_range(void *opaque)
600
{
601
RawPosixAIOData *aiocb = opaque;
602
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
603
}
604
}
605
606
+/*
607
+ * zone report - Get a zone block device's information in the form
608
+ * of an array of zone descriptors.
609
+ * zones is an array of zone descriptors to hold zone information on reply;
610
+ * offset can be any byte within the entire size of the device;
611
+ * nr_zones is the maxium number of sectors the command should operate on.
612
+ */
613
+#if defined(CONFIG_BLKZONED)
614
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
615
+ unsigned int *nr_zones,
616
+ BlockZoneDescriptor *zones) {
617
+ BDRVRawState *s = bs->opaque;
618
+ RawPosixAIOData acb = (RawPosixAIOData) {
619
+ .bs = bs,
620
+ .aio_fildes = s->fd,
621
+ .aio_type = QEMU_AIO_ZONE_REPORT,
622
+ .aio_offset = offset,
623
+ .zone_report = {
624
+ .nr_zones = nr_zones,
625
+ .zones = zones,
626
+ },
627
+ };
628
+
629
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
630
+}
631
+#endif
632
+
633
+/*
634
+ * zone management operations - Execute an operation on a zone
635
+ */
636
+#if defined(CONFIG_BLKZONED)
637
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
638
+ int64_t offset, int64_t len) {
639
+ BDRVRawState *s = bs->opaque;
640
+ RawPosixAIOData acb;
641
+ int64_t zone_size, zone_size_mask;
642
+ const char *op_name;
643
+ unsigned long zo;
644
+ int ret;
645
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
646
+
647
+ zone_size = bs->bl.zone_size;
648
+ zone_size_mask = zone_size - 1;
649
+ if (offset & zone_size_mask) {
650
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
651
+ "%" PRId64 "", offset / 512, zone_size / 512);
652
+ return -EINVAL;
653
+ }
654
+
655
+ if (((offset + len) < capacity && len & zone_size_mask) ||
656
+ offset + len > capacity) {
657
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
658
+ " %" PRId64 "", len / 512, zone_size / 512);
659
+ return -EINVAL;
660
+ }
661
+
662
+ switch (op) {
663
+ case BLK_ZO_OPEN:
664
+ op_name = "BLKOPENZONE";
665
+ zo = BLKOPENZONE;
666
+ break;
667
+ case BLK_ZO_CLOSE:
668
+ op_name = "BLKCLOSEZONE";
669
+ zo = BLKCLOSEZONE;
670
+ break;
671
+ case BLK_ZO_FINISH:
672
+ op_name = "BLKFINISHZONE";
673
+ zo = BLKFINISHZONE;
674
+ break;
675
+ case BLK_ZO_RESET:
676
+ op_name = "BLKRESETZONE";
677
+ zo = BLKRESETZONE;
678
+ break;
679
+ default:
680
+ error_report("Unsupported zone op: 0x%x", op);
681
+ return -ENOTSUP;
682
+ }
683
+
684
+ acb = (RawPosixAIOData) {
685
+ .bs = bs,
686
+ .aio_fildes = s->fd,
687
+ .aio_type = QEMU_AIO_ZONE_MGMT,
688
+ .aio_offset = offset,
689
+ .aio_nbytes = len,
690
+ .zone_mgmt = {
691
+ .op = zo,
692
+ },
693
+ };
694
+
695
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
696
+ if (ret != 0) {
697
+ error_report("ioctl %s failed %d", op_name, ret);
698
+ }
699
+
700
+ return ret;
701
+}
702
+#endif
703
+
704
static coroutine_fn int
705
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
706
bool blkdev)
707
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
708
#ifdef __linux__
709
.bdrv_co_ioctl = hdev_co_ioctl,
710
#endif
711
+
712
+ /* zoned device */
713
+#if defined(CONFIG_BLKZONED)
714
+ /* zone management operations */
715
+ .bdrv_co_zone_report = raw_co_zone_report,
716
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
717
+#endif
718
};
719
720
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
721
diff --git a/block/io.c b/block/io.c
722
index XXXXXXX..XXXXXXX 100644
723
--- a/block/io.c
724
+++ b/block/io.c
725
@@ -XXX,XX +XXX,XX @@ out:
726
return co.ret;
727
}
728
729
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
730
+ unsigned int *nr_zones,
731
+ BlockZoneDescriptor *zones)
732
+{
733
+ BlockDriver *drv = bs->drv;
734
+ CoroutineIOCompletion co = {
735
+ .coroutine = qemu_coroutine_self(),
736
+ };
737
+ IO_CODE();
738
+
739
+ bdrv_inc_in_flight(bs);
740
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
741
+ co.ret = -ENOTSUP;
742
+ goto out;
743
+ }
744
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
745
+out:
746
+ bdrv_dec_in_flight(bs);
747
+ return co.ret;
748
+}
749
+
750
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
751
+ int64_t offset, int64_t len)
752
+{
753
+ BlockDriver *drv = bs->drv;
754
+ CoroutineIOCompletion co = {
755
+ .coroutine = qemu_coroutine_self(),
756
+ };
757
+ IO_CODE();
758
+
759
+ bdrv_inc_in_flight(bs);
760
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
761
+ co.ret = -ENOTSUP;
762
+ goto out;
763
+ }
764
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
765
+out:
766
+ bdrv_dec_in_flight(bs);
767
+ return co.ret;
768
+}
769
+
770
void *qemu_blockalign(BlockDriverState *bs, size_t size)
771
{
772
IO_CODE();
773
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
774
index XXXXXXX..XXXXXXX 100644
775
--- a/qemu-io-cmds.c
776
+++ b/qemu-io-cmds.c
777
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
778
.oneline = "flush all in-core file state to disk",
779
};
780
781
+static inline int64_t tosector(int64_t bytes)
782
+{
783
+ return bytes >> BDRV_SECTOR_BITS;
784
+}
785
+
786
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
787
+{
788
+ int ret;
789
+ int64_t offset;
790
+ unsigned int nr_zones;
791
+
792
+ ++optind;
793
+ offset = cvtnum(argv[optind]);
794
+ ++optind;
795
+ nr_zones = cvtnum(argv[optind]);
796
+
797
+ g_autofree BlockZoneDescriptor *zones = NULL;
798
+ zones = g_new(BlockZoneDescriptor, nr_zones);
799
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
800
+ if (ret < 0) {
801
+ printf("zone report failed: %s\n", strerror(-ret));
802
+ } else {
803
+ for (int i = 0; i < nr_zones; ++i) {
804
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
805
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
806
+ "zcond:%u, [type: %u]\n",
807
+ tosector(zones[i].start), tosector(zones[i].length),
808
+ tosector(zones[i].cap), tosector(zones[i].wp),
809
+ zones[i].state, zones[i].type);
810
+ }
811
+ }
812
+ return ret;
813
+}
814
+
815
+static const cmdinfo_t zone_report_cmd = {
816
+ .name = "zone_report",
817
+ .altname = "zrp",
818
+ .cfunc = zone_report_f,
819
+ .argmin = 2,
820
+ .argmax = 2,
821
+ .args = "offset number",
822
+ .oneline = "report zone information",
823
+};
824
+
825
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
826
+{
827
+ int ret;
828
+ int64_t offset, len;
829
+ ++optind;
830
+ offset = cvtnum(argv[optind]);
831
+ ++optind;
832
+ len = cvtnum(argv[optind]);
833
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
834
+ if (ret < 0) {
835
+ printf("zone open failed: %s\n", strerror(-ret));
836
+ }
837
+ return ret;
838
+}
839
+
840
+static const cmdinfo_t zone_open_cmd = {
841
+ .name = "zone_open",
842
+ .altname = "zo",
843
+ .cfunc = zone_open_f,
844
+ .argmin = 2,
845
+ .argmax = 2,
846
+ .args = "offset len",
847
+ .oneline = "explicit open a range of zones in zone block device",
848
+};
849
+
850
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
851
+{
852
+ int ret;
853
+ int64_t offset, len;
854
+ ++optind;
855
+ offset = cvtnum(argv[optind]);
856
+ ++optind;
857
+ len = cvtnum(argv[optind]);
858
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
859
+ if (ret < 0) {
860
+ printf("zone close failed: %s\n", strerror(-ret));
861
+ }
862
+ return ret;
863
+}
864
+
865
+static const cmdinfo_t zone_close_cmd = {
866
+ .name = "zone_close",
867
+ .altname = "zc",
868
+ .cfunc = zone_close_f,
869
+ .argmin = 2,
870
+ .argmax = 2,
871
+ .args = "offset len",
872
+ .oneline = "close a range of zones in zone block device",
873
+};
874
+
875
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
876
+{
877
+ int ret;
878
+ int64_t offset, len;
879
+ ++optind;
880
+ offset = cvtnum(argv[optind]);
881
+ ++optind;
882
+ len = cvtnum(argv[optind]);
883
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
884
+ if (ret < 0) {
885
+ printf("zone finish failed: %s\n", strerror(-ret));
886
+ }
887
+ return ret;
888
+}
889
+
890
+static const cmdinfo_t zone_finish_cmd = {
891
+ .name = "zone_finish",
892
+ .altname = "zf",
893
+ .cfunc = zone_finish_f,
894
+ .argmin = 2,
895
+ .argmax = 2,
896
+ .args = "offset len",
897
+ .oneline = "finish a range of zones in zone block device",
898
+};
899
+
900
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
901
+{
902
+ int ret;
903
+ int64_t offset, len;
904
+ ++optind;
905
+ offset = cvtnum(argv[optind]);
906
+ ++optind;
907
+ len = cvtnum(argv[optind]);
908
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
909
+ if (ret < 0) {
910
+ printf("zone reset failed: %s\n", strerror(-ret));
911
+ }
912
+ return ret;
913
+}
914
+
915
+static const cmdinfo_t zone_reset_cmd = {
916
+ .name = "zone_reset",
917
+ .altname = "zrs",
918
+ .cfunc = zone_reset_f,
919
+ .argmin = 2,
920
+ .argmax = 2,
921
+ .args = "offset len",
922
+ .oneline = "reset a zone write pointer in zone block device",
923
+};
924
+
925
static int truncate_f(BlockBackend *blk, int argc, char **argv);
926
static const cmdinfo_t truncate_cmd = {
927
.name = "truncate",
928
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
929
qemuio_add_command(&aio_write_cmd);
930
qemuio_add_command(&aio_flush_cmd);
931
qemuio_add_command(&flush_cmd);
932
+ qemuio_add_command(&zone_report_cmd);
933
+ qemuio_add_command(&zone_open_cmd);
934
+ qemuio_add_command(&zone_close_cmd);
935
+ qemuio_add_command(&zone_finish_cmd);
936
+ qemuio_add_command(&zone_reset_cmd);
937
qemuio_add_command(&truncate_cmd);
938
qemuio_add_command(&length_cmd);
939
qemuio_add_command(&info_cmd);
53
--
940
--
54
2.9.3
941
2.40.1
55
942
56
943
diff view generated by jsdifflib
1
Move bdrv_is_read_only() up with its friends.
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
raw-format driver usually sits on top of file-posix driver. It needs to
4
pass through requests of zone commands.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
3
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: John Snow <jsnow@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
5
Signed-off-by: Jeff Cody <jcody@redhat.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
6
Message-id: 73b2399459760c32506f9407efb9dddb3a2789de.1491597120.git.jcody@redhat.com
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230508045533.175575-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
19
---
8
block.c | 10 +++++-----
20
block/raw-format.c | 17 +++++++++++++++++
9
1 file changed, 5 insertions(+), 5 deletions(-)
21
1 file changed, 17 insertions(+)
10
22
11
diff --git a/block.c b/block.c
23
diff --git a/block/raw-format.c b/block/raw-format.c
12
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
13
--- a/block.c
25
--- a/block/raw-format.c
14
+++ b/block.c
26
+++ b/block/raw-format.c
15
@@ -XXX,XX +XXX,XX @@ void path_combine(char *dest, int dest_size,
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
16
}
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
17
}
29
}
18
30
19
+bool bdrv_is_read_only(BlockDriverState *bs)
31
+static int coroutine_fn GRAPH_RDLOCK
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
33
+ unsigned int *nr_zones,
34
+ BlockZoneDescriptor *zones)
20
+{
35
+{
21
+ return bs->read_only;
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
22
+}
37
+}
23
+
38
+
24
int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
39
+static int coroutine_fn GRAPH_RDLOCK
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
41
+ int64_t offset, int64_t len)
42
+{
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
44
+}
45
+
46
static int64_t coroutine_fn GRAPH_RDLOCK
47
raw_co_getlength(BlockDriverState *bs)
25
{
48
{
26
/* Do not set read_only if copy_on_read is enabled */
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
27
@@ -XXX,XX +XXX,XX @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
50
.bdrv_co_pwritev = &raw_co_pwritev,
28
*nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors;
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
29
}
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
30
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
31
-bool bdrv_is_read_only(BlockDriverState *bs)
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
32
-{
55
.bdrv_co_block_status = &raw_co_block_status,
33
- return bs->read_only;
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
34
-}
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
35
-
36
bool bdrv_is_sg(BlockDriverState *bs)
37
{
38
return bs->sg;
39
--
58
--
40
2.9.3
59
2.40.1
41
60
42
61
diff view generated by jsdifflib
1
The BDRV_O_ALLOW_RDWR flag allows / prohibits the changing of
1
From: Sam Li <faithilikerun@gmail.com>
2
the BDS 'read_only' state, but there are a few places where it
3
is ignored. In the bdrv_set_read_only() helper, make sure to
4
honor the flag.
5
2
6
Signed-off-by: Jeff Cody <jcody@redhat.com>
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
allowed.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: John Snow <jsnow@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Message-id: be2e5fb2d285cbece2b6d06bed54a6f56520d251.1491597120.git.jcody@redhat.com
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org> and clarify that the check is about zoned
16
BlockDrivers.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
19
---
11
block.c | 7 +++++++
20
include/block/block_int-common.h | 5 +++++
12
1 file changed, 7 insertions(+)
21
block.c | 19 +++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
23
block/raw-format.c | 1 +
24
4 files changed, 37 insertions(+)
13
25
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
27
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
29
+++ b/include/block/block_int-common.h
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
31
*/
32
bool is_format;
33
34
+ /*
35
+ * Set to true if the BlockDriver supports zoned children.
36
+ */
37
+ bool supports_zoned_children;
38
+
39
/*
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
41
* this field set to true, except ones that are defined only by their
14
diff --git a/block.c b/block.c
42
diff --git a/block.c b/block.c
15
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
16
--- a/block.c
44
--- a/block.c
17
+++ b/block.c
45
+++ b/block.c
18
@@ -XXX,XX +XXX,XX @@ int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
19
return -EINVAL;
47
return;
20
}
48
}
21
49
22
+ /* Do not clear read_only if it is prohibited */
50
+ /*
23
+ if (!read_only && !(bs->open_flags & BDRV_O_ALLOW_RDWR)) {
51
+ * Non-zoned block drivers do not follow zoned storage constraints
24
+ error_setg(errp, "Node '%s' is read only",
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
25
+ bdrv_get_device_or_node_name(bs));
53
+ * drivers in a graph.
26
+ return -EPERM;
54
+ */
55
+ if (!parent_bs->drv->supports_zoned_children &&
56
+ child_bs->bl.zoned == BLK_Z_HM) {
57
+ /*
58
+ * The host-aware model allows zoned storage constraints and random
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
60
+ * host-aware device as a regular device.
61
+ */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
64
+ parent_bs->drv->supports_zoned_children ?
65
+ "support zoned children" : "not support zoned children");
66
+ return;
27
+ }
67
+ }
28
+
68
+
29
bs->read_only = read_only;
69
if (!QLIST_EMPTY(&child_bs->parents)) {
30
return 0;
70
error_setg(errp, "The node %s already has a parent",
31
}
71
child_bs->node_name);
72
diff --git a/block/file-posix.c b/block/file-posix.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/block/file-posix.c
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
90
+ }
91
+#endif
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/block/raw-format.c
98
+++ b/block/raw-format.c
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
100
BlockDriver bdrv_raw = {
101
.format_name = "raw",
102
.instance_size = sizeof(BDRVRawState),
103
+ .supports_zoned_children = true,
104
.bdrv_probe = &raw_probe,
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
106
.bdrv_reopen_commit = &raw_reopen_commit,
32
--
107
--
33
2.9.3
108
2.40.1
34
109
35
110
diff view generated by jsdifflib
1
For the tests that use the common.qemu functions for running a QEMU
1
From: Sam Li <faithilikerun@gmail.com>
2
process, _cleanup_qemu must be called in the exit function.
3
2
4
If it is not, if the qemu process aborts, then not all of the droppings
3
The new block layer APIs of zoned block devices can be tested by:
5
are cleaned up (e.g. pidfile, fifos).
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
6
7
7
This updates those tests that did not have a cleanup in qemu-iotests.
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-7-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org>.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
21
2 files changed, 142 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
8
24
9
(I swapped spaces for tabs in test 102 as well)
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
10
26
new file mode 100755
11
Reported-by: Eric Blake <eblake@redhat.com>
27
index XXXXXXX..XXXXXXX
12
Reviewed-by: Eric Blake <eblake@redhat.com>
28
--- /dev/null
13
Signed-off-by: Jeff Cody <jcody@redhat.com>
29
+++ b/tests/qemu-iotests/tests/zoned
14
Message-id: d59c2f6ad6c1da8b9b3c7f357c94a7122ccfc55a.1492544096.git.jcody@redhat.com
30
@@ -XXX,XX +XXX,XX @@
15
---
31
+#!/usr/bin/env bash
16
tests/qemu-iotests/028 | 1 +
32
+#
17
tests/qemu-iotests/094 | 11 ++++++++---
33
+# Test zone management operations.
18
tests/qemu-iotests/102 | 5 +++--
34
+#
19
tests/qemu-iotests/109 | 1 +
35
+
20
tests/qemu-iotests/117 | 1 +
36
+seq="$(basename $0)"
21
tests/qemu-iotests/130 | 1 +
37
+echo "QA output created by $seq"
22
tests/qemu-iotests/140 | 1 +
38
+status=1 # failure is the default!
23
tests/qemu-iotests/141 | 1 +
39
+
24
tests/qemu-iotests/143 | 1 +
25
tests/qemu-iotests/156 | 1 +
26
10 files changed, 19 insertions(+), 5 deletions(-)
27
28
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
29
index XXXXXXX..XXXXXXX 100755
30
--- a/tests/qemu-iotests/028
31
+++ b/tests/qemu-iotests/028
32
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
33
34
_cleanup()
35
{
36
+ _cleanup_qemu
37
rm -f "${TEST_IMG}.copy"
38
_cleanup_test_img
39
}
40
diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094
41
index XXXXXXX..XXXXXXX 100755
42
--- a/tests/qemu-iotests/094
43
+++ b/tests/qemu-iotests/094
44
@@ -XXX,XX +XXX,XX @@ echo "QA output created by $seq"
45
here="$PWD"
46
status=1    # failure is the default!
47
48
-trap "exit \$status" 0 1 2 3 15
49
+_cleanup()
40
+_cleanup()
50
+{
41
+{
51
+ _cleanup_qemu
42
+ _cleanup_test_img
52
+ _cleanup_test_img
43
+ sudo -n rmmod null_blk
53
+ rm -f "$TEST_DIR/source.$IMGFMT"
54
+}
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
55
+
46
+
56
+trap "_cleanup; exit \$status" 0 1 2 3 15
47
+# get standard environment, filters and checks
57
48
+. ../common.rc
58
# get standard environment, filters and checks
49
+. ../common.filter
59
. ./common.rc
50
+. ../common.qemu
60
@@ -XXX,XX +XXX,XX @@ _send_qemu_cmd $QEMU_HANDLE \
51
+
61
52
+# This test only runs on Linux hosts with raw image files.
62
wait=1 _cleanup_qemu
53
+_supported_fmt raw
63
54
+_supported_proto file
64
-_cleanup_test_img
55
+_supported_os Linux
65
-rm -f "$TEST_DIR/source.$IMGFMT"
56
+
66
57
+sudo -n true || \
67
# success, all done
58
+ _notrun 'Password-less sudo required'
68
echo '*** done'
59
+
69
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
70
index XXXXXXX..XXXXXXX 100755
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
71
--- a/tests/qemu-iotests/102
62
+
72
+++ b/tests/qemu-iotests/102
63
+echo "Testing a null_blk device:"
73
@@ -XXX,XX +XXX,XX @@ seq=$(basename $0)
64
+echo "case 1: if the operations work"
74
echo "QA output created by $seq"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
75
66
+sudo -n chmod 0666 /dev/nullb0
76
here=$PWD
67
+
77
-status=1    # failure is the default!
68
+echo "(1) report the first zone:"
78
+status=1 # failure is the default!
69
+$QEMU_IO $IMG -c "zrp 0 1"
79
70
+echo
80
_cleanup()
71
+echo "report the first 10 zones"
81
{
72
+$QEMU_IO $IMG -c "zrp 0 10"
82
-    _cleanup_test_img
73
+echo
83
+ _cleanup_qemu
74
+echo "report the last zone:"
84
+ _cleanup_test_img
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
85
}
76
+echo
86
trap "_cleanup; exit \$status" 0 1 2 3 15
77
+echo
87
78
+echo "(2) opening the first zone"
88
diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
89
index XXXXXXX..XXXXXXX 100755
80
+echo "report after:"
90
--- a/tests/qemu-iotests/109
81
+$QEMU_IO $IMG -c "zrp 0 1"
91
+++ b/tests/qemu-iotests/109
82
+echo
92
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
83
+echo "opening the second zone"
93
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
94
_cleanup()
85
+echo "report after:"
95
{
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
96
+ _cleanup_qemu
87
+echo
97
rm -f $TEST_IMG.src
88
+echo "opening the last zone"
98
    _cleanup_test_img
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
99
}
90
+echo "report after:"
100
diff --git a/tests/qemu-iotests/117 b/tests/qemu-iotests/117
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
101
index XXXXXXX..XXXXXXX 100755
92
+echo
102
--- a/tests/qemu-iotests/117
93
+echo
103
+++ b/tests/qemu-iotests/117
94
+echo "(3) closing the first zone"
104
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
95
+$QEMU_IO $IMG -c "zc 0 268435456"
105
96
+echo "report after:"
106
_cleanup()
97
+$QEMU_IO $IMG -c "zrp 0 1"
107
{
98
+echo
108
+ _cleanup_qemu
99
+echo "closing the last zone"
109
    _cleanup_test_img
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
110
}
101
+echo "report after:"
111
trap "_cleanup; exit \$status" 0 1 2 3 15
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
112
diff --git a/tests/qemu-iotests/130 b/tests/qemu-iotests/130
103
+echo
113
index XXXXXXX..XXXXXXX 100755
104
+echo
114
--- a/tests/qemu-iotests/130
105
+echo "(4) finishing the second zone"
115
+++ b/tests/qemu-iotests/130
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
116
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
107
+echo "After finishing a zone:"
117
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
118
_cleanup()
109
+echo
119
{
110
+echo
120
+ _cleanup_qemu
111
+echo "(5) resetting the second zone"
121
_cleanup_test_img
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
122
}
113
+echo "After resetting a zone:"
123
trap "_cleanup; exit \$status" 0 1 2 3 15
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
124
diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
115
+
125
index XXXXXXX..XXXXXXX 100755
116
+# success, all done
126
--- a/tests/qemu-iotests/140
117
+echo "*** done"
127
+++ b/tests/qemu-iotests/140
118
+rm -f $seq.full
128
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
119
+status=0
129
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
130
_cleanup()
121
new file mode 100644
131
{
122
index XXXXXXX..XXXXXXX
132
+ _cleanup_qemu
123
--- /dev/null
133
_cleanup_test_img
124
+++ b/tests/qemu-iotests/tests/zoned.out
134
rm -f "$TEST_DIR/nbd"
125
@@ -XXX,XX +XXX,XX @@
135
}
126
+QA output created by zoned
136
diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
127
+Testing a null_blk device:
137
index XXXXXXX..XXXXXXX 100755
128
+case 1: if the operations work
138
--- a/tests/qemu-iotests/141
129
+(1) report the first zone:
139
+++ b/tests/qemu-iotests/141
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
140
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
131
+
141
132
+report the first 10 zones
142
_cleanup()
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
143
{
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
144
+ _cleanup_qemu
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
145
_cleanup_test_img
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
146
rm -f "$TEST_DIR/{b,m,o}.$IMGFMT"
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
147
}
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
148
diff --git a/tests/qemu-iotests/143 b/tests/qemu-iotests/143
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
149
index XXXXXXX..XXXXXXX 100755
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
150
--- a/tests/qemu-iotests/143
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
151
+++ b/tests/qemu-iotests/143
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
152
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
143
+
153
144
+report the last zone:
154
_cleanup()
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
155
{
146
+
156
+ _cleanup_qemu
147
+
157
rm -f "$TEST_DIR/nbd"
148
+(2) opening the first zone
158
}
149
+report after:
159
trap "_cleanup; exit \$status" 0 1 2 3 15
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
160
diff --git a/tests/qemu-iotests/156 b/tests/qemu-iotests/156
151
+
161
index XXXXXXX..XXXXXXX 100755
152
+opening the second zone
162
--- a/tests/qemu-iotests/156
153
+report after:
163
+++ b/tests/qemu-iotests/156
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
164
@@ -XXX,XX +XXX,XX @@ status=1    # failure is the default!
155
+
165
156
+opening the last zone
166
_cleanup()
157
+report after:
167
{
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
168
+ _cleanup_qemu
159
+
169
rm -f "$TEST_IMG{,.target}{,.backing,.overlay}"
160
+
170
}
161
+(3) closing the first zone
171
trap "_cleanup; exit \$status" 0 1 2 3 15
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
164
+
165
+closing the last zone
166
+report after:
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
168
+
169
+
170
+(4) finishing the second zone
171
+After finishing a zone:
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
173
+
174
+
175
+(5) resetting the second zone
176
+After resetting a zone:
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
178
+*** done
172
--
179
--
173
2.9.3
180
2.40.1
174
181
175
182
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20230508045533.175575-8-faithilikerun@gmail.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/file-posix.c | 3 +++
13
block/trace-events | 2 ++
14
2 files changed, 5 insertions(+)
15
16
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/file-posix.c
19
+++ b/block/file-posix.c
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
21
},
22
};
23
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
26
}
27
#endif
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
29
},
30
};
31
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
33
+ len >> BDRV_SECTOR_BITS);
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
35
if (ret != 0) {
36
error_report("ioctl %s failed %d", op_name, ret);
37
diff --git a/block/trace-events b/block/trace-events
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/trace-events
40
+++ b/block/trace-events
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
44
file_flush_fdatasync_failed(int err) "errno %d"
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
47
48
# ssh.c
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
50
--
51
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-9-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
15
any toctree" error and fix pre-formatted code syntax.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
docs/devel/index-api.rst | 1 +
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
22
3 files changed, 50 insertions(+)
23
create mode 100644 docs/devel/zoned-storage.rst
24
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
26
index XXXXXXX..XXXXXXX 100644
27
--- a/docs/devel/index-api.rst
28
+++ b/docs/devel/index-api.rst
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
30
memory
31
modules
32
ui
33
+ zoned-storage
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
35
new file mode 100644
36
index XXXXXXX..XXXXXXX
37
--- /dev/null
38
+++ b/docs/devel/zoned-storage.rst
39
@@ -XXX,XX +XXX,XX @@
40
+=============
41
+zoned-storage
42
+=============
43
+
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
45
+that are larger than the LBA size. They can only allow sequential writes, which
46
+can reduce write amplification in SSDs, and potentially lead to higher
47
+throughput and increased capacity. More details about ZBDs can be found at:
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/docs/system/qemu-block-drivers.rst.inc
86
+++ b/docs/system/qemu-block-drivers.rst.inc
87
@@ -XXX,XX +XXX,XX @@ Hard disks
88
you may corrupt your host data (use the ``-snapshot`` command
89
line option or modify the device permissions accordingly).
90
91
+Zoned block devices
92
+ Zoned block devices can be passed through to the guest if the emulated storage
93
+ controller supports zoned storage. Use ``--blockdev host_device,
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
95
+ ``/dev/nullb0`` as ``drive0``.
96
+
97
Windows
98
^^^^^^^
99
100
--
101
2.40.1
diff view generated by jsdifflib
1
We have a helper wrapper for checking for the BDS read_only flag,
1
From: Sam Li <faithilikerun@gmail.com>
2
add a helper wrapper to set the read_only flag as well.
3
2
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Since Linux doesn't have a user API to issue zone append operations to
5
Signed-off-by: Jeff Cody <jcody@redhat.com>
4
zoned devices from user space, the file-posix driver is modified to add
6
Reviewed-by: John Snow <jsnow@redhat.com>
5
zone append emulation using regular writes. To do this, the file-posix
7
Message-id: 9b18972d05f5fa2ac16c014f0af98d680553048d.1491597120.git.jcody@redhat.com
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
9
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
19
--Stefan]
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
21
---
9
block.c | 5 +++++
22
include/block/block-common.h | 14 +++
10
block/bochs.c | 2 +-
23
include/block/block_int-common.h | 5 +
11
block/cloop.c | 2 +-
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
12
block/dmg.c | 2 +-
25
3 files changed, 193 insertions(+), 4 deletions(-)
13
block/rbd.c | 2 +-
14
block/vvfat.c | 4 ++--
15
include/block/block.h | 1 +
16
7 files changed, 12 insertions(+), 6 deletions(-)
17
26
18
diff --git a/block.c b/block.c
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
19
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
20
--- a/block.c
29
--- a/include/block/block-common.h
21
+++ b/block.c
30
+++ b/include/block/block-common.h
22
@@ -XXX,XX +XXX,XX @@ void path_combine(char *dest, int dest_size,
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
23
}
32
BlockZoneState state;
33
} BlockZoneDescriptor;
34
35
+/*
36
+ * Track write pointers of a zone in bytes.
37
+ */
38
+typedef struct BlockZoneWps {
39
+ CoMutex colock;
40
+ uint64_t wp[];
41
+} BlockZoneWps;
42
+
43
typedef struct BlockDriverInfo {
44
/* in bytes, 0 if irrelevant */
45
int cluster_size;
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
47
#define BDRV_SECTOR_BITS 9
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
49
50
+/*
51
+ * Get the first most significant bit of wp. If it is zero, then
52
+ * the zone type is SWR.
53
+ */
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
55
+
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
57
INT_MAX >> BDRV_SECTOR_BITS)
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
60
index XXXXXXX..XXXXXXX 100644
61
--- a/include/block/block_int-common.h
62
+++ b/include/block/block_int-common.h
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
64
65
/* maximum number of active zones */
66
uint32_t max_active_zones;
67
+
68
+ uint32_t write_granularity;
69
} BlockLimits;
70
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
73
CoMutex bsc_modify_lock;
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
75
BdrvBlockStatusCache *block_status_cache;
76
+
77
+ /* array of write pointers' location of each zone in the zoned device. */
78
+ BlockZoneWps *wps;
79
};
80
81
struct BlockBackendRootState {
82
diff --git a/block/file-posix.c b/block/file-posix.c
83
index XXXXXXX..XXXXXXX 100644
84
--- a/block/file-posix.c
85
+++ b/block/file-posix.c
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
24
}
87
}
25
88
26
+void bdrv_set_read_only(BlockDriverState *bs, bool read_only)
89
#if defined(CONFIG_BLKZONED)
90
+/*
91
+ * If the reset_all flag is true, then the wps of zone whose state is
92
+ * not readonly or offline should be all reset to the start sector.
93
+ * Else, take the real wp of the device.
94
+ */
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
96
+ unsigned int nrz, bool reset_all)
27
+{
97
+{
28
+ bs->read_only = read_only;
98
+ struct blk_zone *blkz;
99
+ size_t rep_size;
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
101
+ BlockZoneWps *wps = bs->wps;
102
+ unsigned int j = offset / bs->bl.zone_size;
103
+ unsigned int n = 0, i = 0;
104
+ int ret;
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
106
+ g_autofree struct blk_zone_report *rep = NULL;
107
+
108
+ rep = g_malloc(rep_size);
109
+ blkz = (struct blk_zone *)(rep + 1);
110
+ while (n < nrz) {
111
+ memset(rep, 0, rep_size);
112
+ rep->sector = sector;
113
+ rep->nr_zones = nrz - n;
114
+
115
+ do {
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
117
+ } while (ret != 0 && errno == EINTR);
118
+ if (ret != 0) {
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
120
+ fd, offset, errno);
121
+ return -errno;
122
+ }
123
+
124
+ if (!rep->nr_zones) {
125
+ break;
126
+ }
127
+
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
129
+ /*
130
+ * The wp tracking cares only about sequential writes required and
131
+ * sequential write preferred zones so that the wp can advance to
132
+ * the right location.
133
+ * Use the most significant bit of the wp location to indicate the
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
135
+ */
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
137
+ wps->wp[j] |= 1ULL << 63;
138
+ } else {
139
+ switch(blkz[i].cond) {
140
+ case BLK_ZONE_COND_FULL:
141
+ case BLK_ZONE_COND_READONLY:
142
+ /* Zone not writable */
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
144
+ break;
145
+ case BLK_ZONE_COND_OFFLINE:
146
+ /* Zone not writable nor readable */
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
148
+ break;
149
+ default:
150
+ if (reset_all) {
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
152
+ } else {
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
154
+ }
155
+ break;
156
+ }
157
+ }
158
+ }
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
160
+ }
161
+
162
+ return 0;
29
+}
163
+}
30
+
164
+
31
void bdrv_get_full_backing_filename_from_filename(const char *backed,
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
32
const char *backing,
166
+ unsigned int nrz)
33
char *dest, size_t sz,
167
+{
34
diff --git a/block/bochs.c b/block/bochs.c
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
35
index XXXXXXX..XXXXXXX 100644
169
+ error_report("update zone wp failed");
36
--- a/block/bochs.c
170
+ }
37
+++ b/block/bochs.c
171
+}
38
@@ -XXX,XX +XXX,XX @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
172
+
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
174
Error **errp)
175
{
176
+ BDRVRawState *s = bs->opaque;
177
BlockZoneModel zoned;
178
int ret;
179
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
181
if (ret > 0) {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
183
}
184
+
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
186
+ if (ret >= 0) {
187
+ bs->bl.write_granularity = ret;
188
+ }
189
+
190
+ /* The refresh_limits() function can be called multiple times. */
191
+ g_free(bs->wps);
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
193
+ sizeof(int64_t) * bs->bl.nr_zones);
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
195
+ if (ret < 0) {
196
+ error_setg_errno(errp, -ret, "report wps failed");
197
+ bs->wps = NULL;
198
+ return;
199
+ }
200
+ qemu_co_mutex_init(&bs->wps->colock);
201
}
202
#else /* !defined(CONFIG_BLKZONED) */
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
206
} while (ret != 0 && errno == EINTR);
207
208
- return ret;
209
+ return ret < 0 ? -errno : ret;
210
}
211
#endif
212
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
214
{
215
BDRVRawState *s = bs->opaque;
216
RawPosixAIOData acb;
217
+ int ret;
218
219
if (fd_open(bs) < 0)
220
return -EIO;
221
+#if defined(CONFIG_BLKZONED)
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
223
+ qemu_co_mutex_lock(&bs->wps->colock);
224
+ }
225
+#endif
226
227
/*
228
* When using O_DIRECT, the request must be aligned to be able to use
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
230
#ifdef CONFIG_LINUX_IO_URING
231
} else if (s->use_linux_io_uring) {
232
assert(qiov->size == bytes);
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
246
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
248
};
249
250
assert(qiov->size == bytes);
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
253
+ goto out; /* Avoid the compiler err of unused label */
254
+
255
+out:
256
+#if defined(CONFIG_BLKZONED)
257
+{
258
+ BlockZoneWps *wps = bs->wps;
259
+ if (ret == 0) {
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
279
+#endif
280
+ return ret;
281
}
282
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
285
BDRVRawState *s = bs->opaque;
286
287
if (s->fd >= 0) {
288
+#if defined(CONFIG_BLKZONED)
289
+ g_free(bs->wps);
290
+#endif
291
qemu_close(s->fd);
292
s->fd = -1;
293
}
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
295
const char *op_name;
296
unsigned long zo;
297
int ret;
298
+ BlockZoneWps *wps = bs->wps;
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
300
301
zone_size = bs->bl.zone_size;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
39
return -EINVAL;
303
return -EINVAL;
40
}
304
}
41
305
42
- bs->read_only = true; /* no write support yet */
306
+ uint32_t i = offset / bs->bl.zone_size;
43
+ bdrv_set_read_only(bs, true); /* no write support yet */
307
+ uint32_t nrz = len / bs->bl.zone_size;
44
308
+ uint64_t *wp = &wps->wp[i];
45
ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
46
if (ret < 0) {
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
47
diff --git a/block/cloop.c b/block/cloop.c
311
+ return -EIO;
48
index XXXXXXX..XXXXXXX 100644
312
+ }
49
--- a/block/cloop.c
313
+
50
+++ b/block/cloop.c
314
switch (op) {
51
@@ -XXX,XX +XXX,XX @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
315
case BLK_ZO_OPEN:
52
return -EINVAL;
316
op_name = "BLKOPENZONE";
53
}
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
54
318
len >> BDRV_SECTOR_BITS);
55
- bs->read_only = true;
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
56
+ bdrv_set_read_only(bs, true);
320
if (ret != 0) {
57
321
+ update_zones_wp(bs, s->fd, offset, i);
58
/* read header */
322
error_report("ioctl %s failed %d", op_name, ret);
59
ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
323
+ return ret;
60
diff --git a/block/dmg.c b/block/dmg.c
324
+ }
61
index XXXXXXX..XXXXXXX 100644
325
+
62
--- a/block/dmg.c
326
+ if (zo == BLKRESETZONE && len == capacity) {
63
+++ b/block/dmg.c
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
64
@@ -XXX,XX +XXX,XX @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
328
+ if (ret < 0) {
65
}
329
+ error_report("reporting single wp failed");
66
330
+ return ret;
67
block_module_load_one("dmg-bz2");
331
+ }
68
- bs->read_only = true;
332
+ } else if (zo == BLKRESETZONE) {
69
+ bdrv_set_read_only(bs, true);
333
+ for (unsigned int j = 0; j < nrz; ++j) {
70
334
+ wp[j] = offset + j * zone_size;
71
s->n_chunks = 0;
335
+ }
72
s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
336
+ } else if (zo == BLKFINISHZONE) {
73
diff --git a/block/rbd.c b/block/rbd.c
337
+ for (unsigned int j = 0; j < nrz; ++j) {
74
index XXXXXXX..XXXXXXX 100644
338
+ /* The zoned device allows the last zone smaller that the
75
--- a/block/rbd.c
339
+ * zone size. */
76
+++ b/block/rbd.c
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
77
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
341
+ }
78
goto failed_open;
342
}
79
}
343
80
344
return ret;
81
- bs->read_only = (s->snap != NULL);
82
+ bdrv_set_read_only(bs, (s->snap != NULL));
83
84
qemu_opts_del(opts);
85
return 0;
86
diff --git a/block/vvfat.c b/block/vvfat.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/block/vvfat.c
89
+++ b/block/vvfat.c
90
@@ -XXX,XX +XXX,XX @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
91
s->current_cluster=0xffffffff;
92
93
/* read only is the default for safety */
94
- bs->read_only = true;
95
+ bdrv_set_read_only(bs, true);
96
s->qcow = NULL;
97
s->qcow_filename = NULL;
98
s->fat2 = NULL;
99
@@ -XXX,XX +XXX,XX @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
100
if (ret < 0) {
101
goto fail;
102
}
103
- bs->read_only = false;
104
+ bdrv_set_read_only(bs, false);
105
}
106
107
bs->total_sectors = cyls * heads * secs;
108
diff --git a/include/block/block.h b/include/block/block.h
109
index XXXXXXX..XXXXXXX 100644
110
--- a/include/block/block.h
111
+++ b/include/block/block.h
112
@@ -XXX,XX +XXX,XX @@ int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
113
int64_t sector_num, int nb_sectors, int *pnum);
114
115
bool bdrv_is_read_only(BlockDriverState *bs);
116
+void bdrv_set_read_only(BlockDriverState *bs, bool read_only);
117
bool bdrv_is_sg(BlockDriverState *bs);
118
bool bdrv_is_inserted(BlockDriverState *bs);
119
int bdrv_media_changed(BlockDriverState *bs);
120
--
345
--
121
2.9.3
346
2.40.1
122
123
diff view generated by jsdifflib
1
Introduce check function for setting read_only flags. Will return < 0 on
1
From: Sam Li <faithilikerun@gmail.com>
2
error, with appropriate Error value set. Does not alter any flags.
3
2
4
Signed-off-by: Jeff Cody <jcody@redhat.com>
3
A zone append command is a write operation that specifies the first
4
logical block of a zone as the write position. When writing to a zoned
5
block device using zone append, the byte offset of the call may point at
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: John Snow <jsnow@redhat.com>
13
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
7
Message-id: e2bba34ac3bc76a0c42adc390413f358ae0566e8.1491597120.git.jcody@redhat.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
15
---
9
block.c | 14 +++++++++++++-
16
include/block/block-io.h | 4 ++
10
include/block/block.h | 1 +
17
include/block/block_int-common.h | 3 ++
11
2 files changed, 14 insertions(+), 1 deletion(-)
18
include/block/raw-aio.h | 4 +-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
12
27
13
diff --git a/block.c b/block.c
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
14
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
15
--- a/block.c
30
--- a/include/block/block-io.h
16
+++ b/block.c
31
+++ b/include/block/block-io.h
17
@@ -XXX,XX +XXX,XX @@ bool bdrv_is_read_only(BlockDriverState *bs)
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
18
return bs->read_only;
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
19
}
34
BlockZoneOp op,
20
35
int64_t offset, int64_t len);
21
-int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
22
+int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
23
{
183
{
24
/* Do not set read_only if copy_on_read is enabled */
184
BlockDriverState *bs = blk_bs(blk);
25
if (bs->copy_on_read && read_only) {
185
diff --git a/block/file-posix.c b/block/file-posix.c
26
@@ -XXX,XX +XXX,XX @@ int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
186
index XXXXXXX..XXXXXXX 100644
27
return -EPERM;
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
28
}
216
}
29
217
30
+ return 0;
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
31
+}
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
32
+
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
33
+int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
317
diff --git a/block/io.c b/block/io.c
318
index XXXXXXX..XXXXXXX 100644
319
--- a/block/io.c
320
+++ b/block/io.c
321
@@ -XXX,XX +XXX,XX @@ out:
322
return co.ret;
323
}
324
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
326
+ QEMUIOVector *qiov,
327
+ BdrvRequestFlags flags)
34
+{
328
+{
35
+ int ret = 0;
329
+ int ret;
36
+
330
+ BlockDriver *drv = bs->drv;
37
+ ret = bdrv_can_set_read_only(bs, read_only, errp);
331
+ CoroutineIOCompletion co = {
332
+ .coroutine = qemu_coroutine_self(),
333
+ };
334
+ IO_CODE();
335
+
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
38
+ if (ret < 0) {
337
+ if (ret < 0) {
39
+ return ret;
338
+ return ret;
40
+ }
339
+ }
41
+
340
+
42
bs->read_only = read_only;
341
+ bdrv_inc_in_flight(bs);
43
return 0;
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
44
}
343
+ co.ret = -ENOTSUP;
45
diff --git a/include/block/block.h b/include/block/block.h
344
+ goto out;
46
index XXXXXXX..XXXXXXX 100644
345
+ }
47
--- a/include/block/block.h
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
48
+++ b/include/block/block.h
347
+out:
49
@@ -XXX,XX +XXX,XX @@ int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
348
+ bdrv_dec_in_flight(bs);
50
int64_t sector_num, int nb_sectors, int *pnum);
349
+ return co.ret;
51
350
+}
52
bool bdrv_is_read_only(BlockDriverState *bs);
351
+
53
+int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
54
int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
353
{
55
bool bdrv_is_sg(BlockDriverState *bs);
354
IO_CODE();
56
bool bdrv_is_inserted(BlockDriverState *bs);
355
diff --git a/block/io_uring.c b/block/io_uring.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/block/io_uring.c
358
+++ b/block/io_uring.c
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
361
luringcb->qiov->niov, offset);
362
break;
363
+ case QEMU_AIO_ZONE_APPEND:
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
57
--
410
--
58
2.9.3
411
2.40.1
59
60
diff view generated by jsdifflib
1
From: Ashish Mittal <ashmit602@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Source code for the qnio library that this code loads can be downloaded from:
3
The patch tests zone append writes by reporting the zone wp after
4
https://github.com/VeritasHyperScale/libqnio.git
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
5
7
6
Sample command line using JSON syntax:
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
./x86_64-softmmu/qemu-system-x86_64 -name instance-00000008 -S -vnc 0.0.0.0:0
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
-k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
10
Message-id: 20230508051510.177850-4-faithilikerun@gmail.com
9
-msg timestamp=on
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
12
---
11
"server":{"host":"172.172.17.4","port":"9999"}}'
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
12
17
13
Sample command line using URI syntax:
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
14
qemu-img convert -f raw -O raw -n
15
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
16
vxhs://192.168.0.1:9999/c6718f6b-0401-441d-a8c3-1f0064d75ee0
17
18
Sample command line using TLS credentials (run in secure mode):
19
./qemu-io --object
20
tls-creds-x509,id=tls0,dir=/etc/pki/qemu/vxhs,endpoint=client -c 'read
21
-v 66000 2.5k' 'json:{"server.host": "127.0.0.1", "server.port": "9999",
22
"vdisk-id": "/test.raw", "driver": "vxhs", "tls-creds":"tls0"}'
23
24
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
25
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
26
Reviewed-by: Jeff Cody <jcody@redhat.com>
27
Signed-off-by: Jeff Cody <jcody@redhat.com>
28
Message-id: 1491277689-24949-2-git-send-email-Ashish.Mittal@veritas.com
29
---
30
block/Makefile.objs | 2 +
31
block/trace-events | 17 ++
32
block/vxhs.c | 575 +++++++++++++++++++++++++++++++++++++++++++++++++++
33
configure | 39 ++++
34
qapi/block-core.json | 23 ++-
35
5 files changed, 654 insertions(+), 2 deletions(-)
36
create mode 100644 block/vxhs.c
37
38
diff --git a/block/Makefile.objs b/block/Makefile.objs
39
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
40
--- a/block/Makefile.objs
20
--- a/qemu-io-cmds.c
41
+++ b/block/Makefile.objs
21
+++ b/qemu-io-cmds.c
42
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
43
block-obj-$(CONFIG_CURL) += curl.o
23
.oneline = "reset a zone write pointer in zone block device",
44
block-obj-$(CONFIG_RBD) += rbd.o
24
};
45
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
25
46
+block-obj-$(CONFIG_VXHS) += vxhs.o
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
47
block-obj-$(CONFIG_LIBSSH2) += ssh.o
27
+ int64_t *offset, int flags, int *total)
48
block-obj-y += accounting.o dirty-bitmap.o
28
+{
49
block-obj-y += write-threshold.o
29
+ int async_ret = NOT_DONE;
50
@@ -XXX,XX +XXX,XX @@ rbd.o-cflags := $(RBD_CFLAGS)
51
rbd.o-libs := $(RBD_LIBS)
52
gluster.o-cflags := $(GLUSTERFS_CFLAGS)
53
gluster.o-libs := $(GLUSTERFS_LIBS)
54
+vxhs.o-libs := $(VXHS_LIBS)
55
ssh.o-cflags := $(LIBSSH2_CFLAGS)
56
ssh.o-libs := $(LIBSSH2_LIBS)
57
block-obj-$(if $(CONFIG_BZIP2),m,n) += dmg-bz2.o
58
diff --git a/block/trace-events b/block/trace-events
59
index XXXXXXX..XXXXXXX 100644
60
--- a/block/trace-events
61
+++ b/block/trace-events
62
@@ -XXX,XX +XXX,XX @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s
63
qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
64
qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
65
qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
66
+
30
+
67
+# block/vxhs.c
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
68
+vxhs_iio_callback(int error) "ctx is NULL: error %d"
32
+ while (async_ret == NOT_DONE) {
69
+vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o %d, %d"
33
+ main_loop_wait(false);
70
+vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno %d"
71
+vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
72
+vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void *acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = %lu offset = %lu ACB = %p. Error = %d, errno = %d"
73
+vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl failed, ret = %d, errno = %d"
74
+vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat ioctl returned size %lu"
75
+vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %ld"
76
+vxhs_parse_uri_filename(const char *filename) "URI passed via bdrv_parse_filename %s"
77
+vxhs_open_vdiskid(const char *vdisk_id) "Opening vdisk-id %s"
78
+vxhs_open_hostinfo(char *of_vsa_addr, int port) "Adding host %s:%d to BDRVVXHSState"
79
+vxhs_open_iio_open(const char *host) "Failed to connect to storage agent on host %s"
80
+vxhs_parse_uri_hostinfo(char *host, int port) "Host: IP %s, Port %d"
81
+vxhs_close(char *vdisk_guid) "Closing vdisk %s"
82
+vxhs_get_creds(const char *cacert, const char *client_key, const char *client_cert) "cacert %s, client_key %s, client_cert %s"
83
diff --git a/block/vxhs.c b/block/vxhs.c
84
new file mode 100644
85
index XXXXXXX..XXXXXXX
86
--- /dev/null
87
+++ b/block/vxhs.c
88
@@ -XXX,XX +XXX,XX @@
89
+/*
90
+ * QEMU Block driver for Veritas HyperScale (VxHS)
91
+ *
92
+ * Copyright (c) 2017 Veritas Technologies LLC.
93
+ *
94
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
95
+ * See the COPYING file in the top-level directory.
96
+ *
97
+ */
98
+
99
+#include "qemu/osdep.h"
100
+#include <qnio/qnio_api.h>
101
+#include <sys/param.h>
102
+#include "block/block_int.h"
103
+#include "qapi/qmp/qerror.h"
104
+#include "qapi/qmp/qdict.h"
105
+#include "qapi/qmp/qstring.h"
106
+#include "trace.h"
107
+#include "qemu/uri.h"
108
+#include "qapi/error.h"
109
+#include "qemu/uuid.h"
110
+#include "crypto/tlscredsx509.h"
111
+
112
+#define VXHS_OPT_FILENAME "filename"
113
+#define VXHS_OPT_VDISK_ID "vdisk-id"
114
+#define VXHS_OPT_SERVER "server"
115
+#define VXHS_OPT_HOST "host"
116
+#define VXHS_OPT_PORT "port"
117
+
118
+/* Only accessed under QEMU global mutex */
119
+static uint32_t vxhs_ref;
120
+
121
+typedef enum {
122
+ VDISK_AIO_READ,
123
+ VDISK_AIO_WRITE,
124
+} VDISKAIOCmd;
125
+
126
+/*
127
+ * HyperScale AIO callbacks structure
128
+ */
129
+typedef struct VXHSAIOCB {
130
+ BlockAIOCB common;
131
+ int err;
132
+} VXHSAIOCB;
133
+
134
+typedef struct VXHSvDiskHostsInfo {
135
+ void *dev_handle; /* Device handle */
136
+ char *host; /* Host name or IP */
137
+ int port; /* Host's port number */
138
+} VXHSvDiskHostsInfo;
139
+
140
+/*
141
+ * Structure per vDisk maintained for state
142
+ */
143
+typedef struct BDRVVXHSState {
144
+ VXHSvDiskHostsInfo vdisk_hostinfo; /* Per host info */
145
+ char *vdisk_guid;
146
+ char *tlscredsid; /* tlscredsid */
147
+} BDRVVXHSState;
148
+
149
+static void vxhs_complete_aio_bh(void *opaque)
150
+{
151
+ VXHSAIOCB *acb = opaque;
152
+ BlockCompletionFunc *cb = acb->common.cb;
153
+ void *cb_opaque = acb->common.opaque;
154
+ int ret = 0;
155
+
156
+ if (acb->err != 0) {
157
+ trace_vxhs_complete_aio(acb, acb->err);
158
+ ret = (-EIO);
159
+ }
34
+ }
160
+
35
+
161
+ qemu_aio_unref(acb);
36
+ *total = qiov->size;
162
+ cb(cb_opaque, ret);
37
+ return async_ret < 0 ? async_ret : 1;
163
+}
38
+}
164
+
39
+
165
+/*
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
166
+ * Called from a libqnio thread
167
+ */
168
+static void vxhs_iio_callback(void *ctx, uint32_t opcode, uint32_t error)
169
+{
41
+{
170
+ VXHSAIOCB *acb = NULL;
42
+ int ret;
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
171
+
51
+
172
+ switch (opcode) {
52
+ if (optind > argc - 3) {
173
+ case IRP_READ_REQUEST:
174
+ case IRP_WRITE_REQUEST:
175
+
176
+ /*
177
+ * ctx is VXHSAIOCB*
178
+ * ctx is NULL if error is QNIOERROR_CHANNEL_HUP
179
+ */
180
+ if (ctx) {
181
+ acb = ctx;
182
+ } else {
183
+ trace_vxhs_iio_callback(error);
184
+ goto out;
185
+ }
186
+
187
+ if (error) {
188
+ if (!acb->err) {
189
+ acb->err = error;
190
+ }
191
+ trace_vxhs_iio_callback(error);
192
+ }
193
+
194
+ aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
195
+ vxhs_complete_aio_bh, acb);
196
+ break;
197
+
198
+ default:
199
+ if (error == QNIOERROR_HUP) {
200
+ /*
201
+ * Channel failed, spontaneous notification,
202
+ * not in response to I/O
203
+ */
204
+ trace_vxhs_iio_callback_chnfail(error, errno);
205
+ } else {
206
+ trace_vxhs_iio_callback_unknwn(opcode, error);
207
+ }
208
+ break;
209
+ }
210
+out:
211
+ return;
212
+}
213
+
214
+static QemuOptsList runtime_opts = {
215
+ .name = "vxhs",
216
+ .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
217
+ .desc = {
218
+ {
219
+ .name = VXHS_OPT_FILENAME,
220
+ .type = QEMU_OPT_STRING,
221
+ .help = "URI to the Veritas HyperScale image",
222
+ },
223
+ {
224
+ .name = VXHS_OPT_VDISK_ID,
225
+ .type = QEMU_OPT_STRING,
226
+ .help = "UUID of the VxHS vdisk",
227
+ },
228
+ {
229
+ .name = "tls-creds",
230
+ .type = QEMU_OPT_STRING,
231
+ .help = "ID of the TLS/SSL credentials to use",
232
+ },
233
+ { /* end of list */ }
234
+ },
235
+};
236
+
237
+static QemuOptsList runtime_tcp_opts = {
238
+ .name = "vxhs_tcp",
239
+ .head = QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head),
240
+ .desc = {
241
+ {
242
+ .name = VXHS_OPT_HOST,
243
+ .type = QEMU_OPT_STRING,
244
+ .help = "host address (ipv4 addresses)",
245
+ },
246
+ {
247
+ .name = VXHS_OPT_PORT,
248
+ .type = QEMU_OPT_NUMBER,
249
+ .help = "port number on which VxHSD is listening (default 9999)",
250
+ .def_value_str = "9999"
251
+ },
252
+ { /* end of list */ }
253
+ },
254
+};
255
+
256
+/*
257
+ * Parse incoming URI and populate *options with the host
258
+ * and device information
259
+ */
260
+static int vxhs_parse_uri(const char *filename, QDict *options)
261
+{
262
+ URI *uri = NULL;
263
+ char *port;
264
+ int ret = 0;
265
+
266
+ trace_vxhs_parse_uri_filename(filename);
267
+ uri = uri_parse(filename);
268
+ if (!uri || !uri->server || !uri->path) {
269
+ uri_free(uri);
270
+ return -EINVAL;
53
+ return -EINVAL;
271
+ }
54
+ }
272
+
55
+
273
+ qdict_put(options, VXHS_OPT_SERVER".host", qstring_from_str(uri->server));
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
274
+
57
+ pflag = true;
275
+ if (uri->port) {
276
+ port = g_strdup_printf("%d", uri->port);
277
+ qdict_put(options, VXHS_OPT_SERVER".port", qstring_from_str(port));
278
+ g_free(port);
279
+ }
58
+ }
280
+
59
+
281
+ qdict_put(options, "vdisk-id", qstring_from_str(uri->path));
60
+ offset = cvtnum(argv[optind]);
61
+ if (offset < 0) {
62
+ print_cvtnum_err(offset, argv[optind]);
63
+ return offset;
64
+ }
65
+ optind++;
66
+ nr_iov = argc - optind;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
68
+ flags & BDRV_REQ_REGISTERED_BUF);
69
+ if (buf == NULL) {
70
+ return -EINVAL;
71
+ }
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
73
+ if (ret < 0) {
74
+ printf("zone append failed: %s\n", strerror(-ret));
75
+ goto out;
76
+ }
282
+
77
+
283
+ trace_vxhs_parse_uri_hostinfo(uri->server, uri->port);
78
+ if (pflag) {
284
+ uri_free(uri);
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
80
+ tosector(offset));
81
+ }
285
+
82
+
83
+out:
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
286
+ return ret;
87
+ return ret;
287
+}
88
+}
288
+
89
+
289
+static void vxhs_parse_filename(const char *filename, QDict *options,
90
+static const cmdinfo_t zone_append_cmd = {
290
+ Error **errp)
91
+ .name = "zone_append",
291
+{
92
+ .altname = "zap",
292
+ if (qdict_haskey(options, "vdisk-id") || qdict_haskey(options, "server")) {
93
+ .cfunc = zone_append_f,
293
+ error_setg(errp, "vdisk-id/server and a file name may not be specified "
94
+ .argmin = 3,
294
+ "at the same time");
95
+ .argmax = 4,
295
+ return;
96
+ .args = "offset len [len..]",
296
+ }
97
+ .oneline = "append write a number of bytes at a specified offset",
297
+
298
+ if (strstr(filename, "://")) {
299
+ int ret = vxhs_parse_uri(filename, options);
300
+ if (ret < 0) {
301
+ error_setg(errp, "Invalid URI. URI should be of the form "
302
+ " vxhs://<host_ip>:<port>/<vdisk-id>");
303
+ }
304
+ }
305
+}
306
+
307
+static int vxhs_init_and_ref(void)
308
+{
309
+ if (vxhs_ref++ == 0) {
310
+ if (iio_init(QNIO_VERSION, vxhs_iio_callback)) {
311
+ return -ENODEV;
312
+ }
313
+ }
314
+ return 0;
315
+}
316
+
317
+static void vxhs_unref(void)
318
+{
319
+ if (--vxhs_ref == 0) {
320
+ iio_fini();
321
+ }
322
+}
323
+
324
+static void vxhs_get_tls_creds(const char *id, char **cacert,
325
+ char **key, char **cert, Error **errp)
326
+{
327
+ Object *obj;
328
+ QCryptoTLSCreds *creds;
329
+ QCryptoTLSCredsX509 *creds_x509;
330
+
331
+ obj = object_resolve_path_component(
332
+ object_get_objects_root(), id);
333
+
334
+ if (!obj) {
335
+ error_setg(errp, "No TLS credentials with id '%s'",
336
+ id);
337
+ return;
338
+ }
339
+
340
+ creds_x509 = (QCryptoTLSCredsX509 *)
341
+ object_dynamic_cast(obj, TYPE_QCRYPTO_TLS_CREDS_X509);
342
+
343
+ if (!creds_x509) {
344
+ error_setg(errp, "Object with id '%s' is not TLS credentials",
345
+ id);
346
+ return;
347
+ }
348
+
349
+ creds = &creds_x509->parent_obj;
350
+
351
+ if (creds->endpoint != QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT) {
352
+ error_setg(errp,
353
+ "Expecting TLS credentials with a client endpoint");
354
+ return;
355
+ }
356
+
357
+ /*
358
+ * Get the cacert, client_cert and client_key file names.
359
+ */
360
+ if (!creds->dir) {
361
+ error_setg(errp, "TLS object missing 'dir' property value");
362
+ return;
363
+ }
364
+
365
+ *cacert = g_strdup_printf("%s/%s", creds->dir,
366
+ QCRYPTO_TLS_CREDS_X509_CA_CERT);
367
+ *cert = g_strdup_printf("%s/%s", creds->dir,
368
+ QCRYPTO_TLS_CREDS_X509_CLIENT_CERT);
369
+ *key = g_strdup_printf("%s/%s", creds->dir,
370
+ QCRYPTO_TLS_CREDS_X509_CLIENT_KEY);
371
+}
372
+
373
+static int vxhs_open(BlockDriverState *bs, QDict *options,
374
+ int bdrv_flags, Error **errp)
375
+{
376
+ BDRVVXHSState *s = bs->opaque;
377
+ void *dev_handlep;
378
+ QDict *backing_options = NULL;
379
+ QemuOpts *opts = NULL;
380
+ QemuOpts *tcp_opts = NULL;
381
+ char *of_vsa_addr = NULL;
382
+ Error *local_err = NULL;
383
+ const char *vdisk_id_opt;
384
+ const char *server_host_opt;
385
+ int ret = 0;
386
+ char *cacert = NULL;
387
+ char *client_key = NULL;
388
+ char *client_cert = NULL;
389
+
390
+ ret = vxhs_init_and_ref();
391
+ if (ret < 0) {
392
+ ret = -EINVAL;
393
+ goto out;
394
+ }
395
+
396
+ /* Create opts info from runtime_opts and runtime_tcp_opts list */
397
+ opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
398
+ tcp_opts = qemu_opts_create(&runtime_tcp_opts, NULL, 0, &error_abort);
399
+
400
+ qemu_opts_absorb_qdict(opts, options, &local_err);
401
+ if (local_err) {
402
+ ret = -EINVAL;
403
+ goto out;
404
+ }
405
+
406
+ /* vdisk-id is the disk UUID */
407
+ vdisk_id_opt = qemu_opt_get(opts, VXHS_OPT_VDISK_ID);
408
+ if (!vdisk_id_opt) {
409
+ error_setg(&local_err, QERR_MISSING_PARAMETER, VXHS_OPT_VDISK_ID);
410
+ ret = -EINVAL;
411
+ goto out;
412
+ }
413
+
414
+ /* vdisk-id may contain a leading '/' */
415
+ if (strlen(vdisk_id_opt) > UUID_FMT_LEN + 1) {
416
+ error_setg(&local_err, "vdisk-id cannot be more than %d characters",
417
+ UUID_FMT_LEN);
418
+ ret = -EINVAL;
419
+ goto out;
420
+ }
421
+
422
+ s->vdisk_guid = g_strdup(vdisk_id_opt);
423
+ trace_vxhs_open_vdiskid(vdisk_id_opt);
424
+
425
+ /* get the 'server.' arguments */
426
+ qdict_extract_subqdict(options, &backing_options, VXHS_OPT_SERVER".");
427
+
428
+ qemu_opts_absorb_qdict(tcp_opts, backing_options, &local_err);
429
+ if (local_err != NULL) {
430
+ ret = -EINVAL;
431
+ goto out;
432
+ }
433
+
434
+ server_host_opt = qemu_opt_get(tcp_opts, VXHS_OPT_HOST);
435
+ if (!server_host_opt) {
436
+ error_setg(&local_err, QERR_MISSING_PARAMETER,
437
+ VXHS_OPT_SERVER"."VXHS_OPT_HOST);
438
+ ret = -EINVAL;
439
+ goto out;
440
+ }
441
+
442
+ if (strlen(server_host_opt) > MAXHOSTNAMELEN) {
443
+ error_setg(&local_err, "server.host cannot be more than %d characters",
444
+ MAXHOSTNAMELEN);
445
+ ret = -EINVAL;
446
+ goto out;
447
+ }
448
+
449
+ /* check if we got tls-creds via the --object argument */
450
+ s->tlscredsid = g_strdup(qemu_opt_get(opts, "tls-creds"));
451
+ if (s->tlscredsid) {
452
+ vxhs_get_tls_creds(s->tlscredsid, &cacert, &client_key,
453
+ &client_cert, &local_err);
454
+ if (local_err != NULL) {
455
+ ret = -EINVAL;
456
+ goto out;
457
+ }
458
+ trace_vxhs_get_creds(cacert, client_key, client_cert);
459
+ }
460
+
461
+ s->vdisk_hostinfo.host = g_strdup(server_host_opt);
462
+ s->vdisk_hostinfo.port = g_ascii_strtoll(qemu_opt_get(tcp_opts,
463
+ VXHS_OPT_PORT),
464
+ NULL, 0);
465
+
466
+ trace_vxhs_open_hostinfo(s->vdisk_hostinfo.host,
467
+ s->vdisk_hostinfo.port);
468
+
469
+ of_vsa_addr = g_strdup_printf("of://%s:%d",
470
+ s->vdisk_hostinfo.host,
471
+ s->vdisk_hostinfo.port);
472
+
473
+ /*
474
+ * Open qnio channel to storage agent if not opened before
475
+ */
476
+ dev_handlep = iio_open(of_vsa_addr, s->vdisk_guid, 0,
477
+ cacert, client_key, client_cert);
478
+ if (dev_handlep == NULL) {
479
+ trace_vxhs_open_iio_open(of_vsa_addr);
480
+ ret = -ENODEV;
481
+ goto out;
482
+ }
483
+ s->vdisk_hostinfo.dev_handle = dev_handlep;
484
+
485
+out:
486
+ g_free(of_vsa_addr);
487
+ QDECREF(backing_options);
488
+ qemu_opts_del(tcp_opts);
489
+ qemu_opts_del(opts);
490
+ g_free(cacert);
491
+ g_free(client_key);
492
+ g_free(client_cert);
493
+
494
+ if (ret < 0) {
495
+ vxhs_unref();
496
+ error_propagate(errp, local_err);
497
+ g_free(s->vdisk_hostinfo.host);
498
+ g_free(s->vdisk_guid);
499
+ g_free(s->tlscredsid);
500
+ s->vdisk_guid = NULL;
501
+ }
502
+
503
+ return ret;
504
+}
505
+
506
+static const AIOCBInfo vxhs_aiocb_info = {
507
+ .aiocb_size = sizeof(VXHSAIOCB)
508
+};
98
+};
509
+
99
+
510
+/*
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
511
+ * This allocates QEMU-VXHS callback for each IO
101
static const cmdinfo_t truncate_cmd = {
512
+ * and is passed to QNIO. When QNIO completes the work,
102
.name = "truncate",
513
+ * it will be passed back through the callback.
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
514
+ */
104
qemuio_add_command(&zone_close_cmd);
515
+static BlockAIOCB *vxhs_aio_rw(BlockDriverState *bs, int64_t sector_num,
105
qemuio_add_command(&zone_finish_cmd);
516
+ QEMUIOVector *qiov, int nb_sectors,
106
qemuio_add_command(&zone_reset_cmd);
517
+ BlockCompletionFunc *cb, void *opaque,
107
+ qemuio_add_command(&zone_append_cmd);
518
+ VDISKAIOCmd iodir)
108
qemuio_add_command(&truncate_cmd);
519
+{
109
qemuio_add_command(&length_cmd);
520
+ VXHSAIOCB *acb = NULL;
110
qemuio_add_command(&info_cmd);
521
+ BDRVVXHSState *s = bs->opaque;
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
522
+ size_t size;
112
index XXXXXXX..XXXXXXX 100755
523
+ uint64_t offset;
113
--- a/tests/qemu-iotests/tests/zoned
524
+ int iio_flags = 0;
114
+++ b/tests/qemu-iotests/tests/zoned
525
+ int ret = 0;
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
526
+ void *dev_handle = s->vdisk_hostinfo.dev_handle;
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
527
+
146
+
528
+ offset = sector_num * BDRV_SECTOR_SIZE;
529
+ size = nb_sectors * BDRV_SECTOR_SIZE;
530
+ acb = qemu_aio_get(&vxhs_aiocb_info, bs, cb, opaque);
531
+
147
+
532
+ /*
148
+(6) append write
533
+ * Initialize VXHSAIOCB.
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
534
+ */
150
+After zap done, the append sector is 0x0
535
+ acb->err = 0;
151
+After appending the first zone firstly:
536
+
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
537
+ iio_flags = IIO_FLAG_ASYNC;
153
+After zap done, the append sector is 0x18
538
+
154
+After appending the first zone secondly:
539
+ switch (iodir) {
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
540
+ case VDISK_AIO_WRITE:
156
+After zap done, the append sector is 0x80000
541
+ ret = iio_writev(dev_handle, acb, qiov->iov, qiov->niov,
157
+After appending the second zone firstly:
542
+ offset, (uint64_t)size, iio_flags);
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
543
+ break;
159
+After zap done, the append sector is 0x80018
544
+ case VDISK_AIO_READ:
160
+After appending the second zone secondly:
545
+ ret = iio_readv(dev_handle, acb, qiov->iov, qiov->niov,
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
546
+ offset, (uint64_t)size, iio_flags);
162
*** done
547
+ break;
548
+ default:
549
+ trace_vxhs_aio_rw_invalid(iodir);
550
+ goto errout;
551
+ }
552
+
553
+ if (ret != 0) {
554
+ trace_vxhs_aio_rw_ioerr(s->vdisk_guid, iodir, size, offset,
555
+ acb, ret, errno);
556
+ goto errout;
557
+ }
558
+ return &acb->common;
559
+
560
+errout:
561
+ qemu_aio_unref(acb);
562
+ return NULL;
563
+}
564
+
565
+static BlockAIOCB *vxhs_aio_readv(BlockDriverState *bs,
566
+ int64_t sector_num, QEMUIOVector *qiov,
567
+ int nb_sectors,
568
+ BlockCompletionFunc *cb, void *opaque)
569
+{
570
+ return vxhs_aio_rw(bs, sector_num, qiov, nb_sectors, cb,
571
+ opaque, VDISK_AIO_READ);
572
+}
573
+
574
+static BlockAIOCB *vxhs_aio_writev(BlockDriverState *bs,
575
+ int64_t sector_num, QEMUIOVector *qiov,
576
+ int nb_sectors,
577
+ BlockCompletionFunc *cb, void *opaque)
578
+{
579
+ return vxhs_aio_rw(bs, sector_num, qiov, nb_sectors,
580
+ cb, opaque, VDISK_AIO_WRITE);
581
+}
582
+
583
+static void vxhs_close(BlockDriverState *bs)
584
+{
585
+ BDRVVXHSState *s = bs->opaque;
586
+
587
+ trace_vxhs_close(s->vdisk_guid);
588
+
589
+ g_free(s->vdisk_guid);
590
+ s->vdisk_guid = NULL;
591
+
592
+ /*
593
+ * Close vDisk device
594
+ */
595
+ if (s->vdisk_hostinfo.dev_handle) {
596
+ iio_close(s->vdisk_hostinfo.dev_handle);
597
+ s->vdisk_hostinfo.dev_handle = NULL;
598
+ }
599
+
600
+ vxhs_unref();
601
+
602
+ /*
603
+ * Free the dynamically allocated host string etc
604
+ */
605
+ g_free(s->vdisk_hostinfo.host);
606
+ g_free(s->tlscredsid);
607
+ s->tlscredsid = NULL;
608
+ s->vdisk_hostinfo.host = NULL;
609
+ s->vdisk_hostinfo.port = 0;
610
+}
611
+
612
+static int64_t vxhs_get_vdisk_stat(BDRVVXHSState *s)
613
+{
614
+ int64_t vdisk_size = -1;
615
+ int ret = 0;
616
+ void *dev_handle = s->vdisk_hostinfo.dev_handle;
617
+
618
+ ret = iio_ioctl(dev_handle, IOR_VDISK_STAT, &vdisk_size, 0);
619
+ if (ret < 0) {
620
+ trace_vxhs_get_vdisk_stat_err(s->vdisk_guid, ret, errno);
621
+ return -EIO;
622
+ }
623
+
624
+ trace_vxhs_get_vdisk_stat(s->vdisk_guid, vdisk_size);
625
+ return vdisk_size;
626
+}
627
+
628
+/*
629
+ * Returns the size of vDisk in bytes. This is required
630
+ * by QEMU block upper block layer so that it is visible
631
+ * to guest.
632
+ */
633
+static int64_t vxhs_getlength(BlockDriverState *bs)
634
+{
635
+ BDRVVXHSState *s = bs->opaque;
636
+ int64_t vdisk_size;
637
+
638
+ vdisk_size = vxhs_get_vdisk_stat(s);
639
+ if (vdisk_size < 0) {
640
+ return -EIO;
641
+ }
642
+
643
+ return vdisk_size;
644
+}
645
+
646
+static BlockDriver bdrv_vxhs = {
647
+ .format_name = "vxhs",
648
+ .protocol_name = "vxhs",
649
+ .instance_size = sizeof(BDRVVXHSState),
650
+ .bdrv_file_open = vxhs_open,
651
+ .bdrv_parse_filename = vxhs_parse_filename,
652
+ .bdrv_close = vxhs_close,
653
+ .bdrv_getlength = vxhs_getlength,
654
+ .bdrv_aio_readv = vxhs_aio_readv,
655
+ .bdrv_aio_writev = vxhs_aio_writev,
656
+};
657
+
658
+static void bdrv_vxhs_init(void)
659
+{
660
+ bdrv_register(&bdrv_vxhs);
661
+}
662
+
663
+block_init(bdrv_vxhs_init);
664
diff --git a/configure b/configure
665
index XXXXXXX..XXXXXXX 100755
666
--- a/configure
667
+++ b/configure
668
@@ -XXX,XX +XXX,XX @@ numa=""
669
tcmalloc="no"
670
jemalloc="no"
671
replication="yes"
672
+vxhs=""
673
674
supported_cpu="no"
675
supported_os="no"
676
@@ -XXX,XX +XXX,XX @@ for opt do
677
;;
678
--enable-replication) replication="yes"
679
;;
680
+ --disable-vxhs) vxhs="no"
681
+ ;;
682
+ --enable-vxhs) vxhs="yes"
683
+ ;;
684
*)
685
echo "ERROR: unknown option $opt"
686
echo "Try '$0 --help' for more information"
687
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
688
xfsctl xfsctl support
689
qom-cast-debug cast debugging support
690
tools build qemu-io, qemu-nbd and qemu-image tools
691
+ vxhs Veritas HyperScale vDisk backend support
692
693
NOTE: The object files are built at the place where configure is launched
694
EOF
695
@@ -XXX,XX +XXX,XX @@ if compile_prog "" "" ; then
696
fi
697
698
##########################################
699
+# Veritas HyperScale block driver VxHS
700
+# Check if libvxhs is installed
701
+
702
+if test "$vxhs" != "no" ; then
703
+ cat > $TMPC <<EOF
704
+#include <stdint.h>
705
+#include <qnio/qnio_api.h>
706
+
707
+void *vxhs_callback;
708
+
709
+int main(void) {
710
+ iio_init(QNIO_VERSION, vxhs_callback);
711
+ return 0;
712
+}
713
+EOF
714
+ vxhs_libs="-lvxhs -lssl"
715
+ if compile_prog "" "$vxhs_libs" ; then
716
+ vxhs=yes
717
+ else
718
+ if test "$vxhs" = "yes" ; then
719
+ feature_not_found "vxhs block device" "Install libvxhs See github"
720
+ fi
721
+ vxhs=no
722
+ fi
723
+fi
724
+
725
+##########################################
726
# End of CC checks
727
# After here, no more $cc or $ld runs
728
729
@@ -XXX,XX +XXX,XX @@ echo "tcmalloc support $tcmalloc"
730
echo "jemalloc support $jemalloc"
731
echo "avx2 optimization $avx2_opt"
732
echo "replication support $replication"
733
+echo "VxHS block device $vxhs"
734
735
if test "$sdl_too_old" = "yes"; then
736
echo "-> Your SDL version is too old - please upgrade to have SDL support"
737
@@ -XXX,XX +XXX,XX @@ if test "$pthread_setname_np" = "yes" ; then
738
echo "CONFIG_PTHREAD_SETNAME_NP=y" >> $config_host_mak
739
fi
740
741
+if test "$vxhs" = "yes" ; then
742
+ echo "CONFIG_VXHS=y" >> $config_host_mak
743
+ echo "VXHS_LIBS=$vxhs_libs" >> $config_host_mak
744
+fi
745
+
746
if test "$tcg_interpreter" = "yes"; then
747
QEMU_INCLUDES="-I\$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"
748
elif test "$ARCH" = "sparc64" ; then
749
diff --git a/qapi/block-core.json b/qapi/block-core.json
750
index XXXXXXX..XXXXXXX 100644
751
--- a/qapi/block-core.json
752
+++ b/qapi/block-core.json
753
@@ -XXX,XX +XXX,XX @@
754
#
755
# Drivers that are supported in block device operations.
756
#
757
+# @vxhs: Since 2.10
758
+#
759
# Since: 2.9
760
##
761
{ 'enum': 'BlockdevDriver',
762
@@ -XXX,XX +XXX,XX @@
763
'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
764
'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
765
'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
766
- 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
767
+ 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
768
769
##
770
# @BlockdevOptionsFile:
771
@@ -XXX,XX +XXX,XX @@
772
'data': { '*offset': 'int', '*size': 'int' } }
773
774
##
775
+# @BlockdevOptionsVxHS:
776
+#
777
+# Driver specific block device options for VxHS
778
+#
779
+# @vdisk-id: UUID of VxHS volume
780
+# @server: vxhs server IP, port
781
+# @tls-creds: TLS credentials ID
782
+#
783
+# Since: 2.10
784
+##
785
+{ 'struct': 'BlockdevOptionsVxHS',
786
+ 'data': { 'vdisk-id': 'str',
787
+ 'server': 'InetSocketAddressBase',
788
+ '*tls-creds': 'str' } }
789
+
790
+##
791
# @BlockdevOptions:
792
#
793
# Options for creating a block device. Many options are available for all
794
@@ -XXX,XX +XXX,XX @@
795
'vhdx': 'BlockdevOptionsGenericFormat',
796
'vmdk': 'BlockdevOptionsGenericCOWFormat',
797
'vpc': 'BlockdevOptionsGenericFormat',
798
- 'vvfat': 'BlockdevOptionsVVFAT'
799
+ 'vvfat': 'BlockdevOptionsVVFAT',
800
+ 'vxhs': 'BlockdevOptionsVxHS'
801
} }
802
803
##
804
--
163
--
805
2.9.3
164
2.40.1
806
807
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230508051510.177850-5-faithilikerun@gmail.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
block/file-posix.c | 3 +++
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
12
13
diff --git a/block/file-posix.c b/block/file-posix.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
16
+++ b/block/file-posix.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
28
}
29
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
32
}
33
#endif
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
48
2.40.1
diff view generated by jsdifflib
1
From: Ashish Mittal <ashmit602@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
These changes use a vxhs test server that is a part of the following
3
This patch extends virtio-blk emulation to handle zoned device commands
4
repository:
4
by calling the new block layer APIs to perform zoned device I/O on
5
https://github.com/VeritasHyperScale/libqnio.git
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
6
7
7
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
support zoned block devices. Regular block devices(conventional zones)
9
Reviewed-by: Jeff Cody <jcody@redhat.com>
10
will not be set.
10
Signed-off-by: Jeff Cody <jcody@redhat.com>
11
11
Message-id: 1491277689-24949-3-git-send-email-Ashish.Mittal@veritas.com
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
18
---
13
tests/qemu-iotests/common | 6 ++++++
19
hw/block/virtio-blk-common.c | 2 +
14
tests/qemu-iotests/common.config | 13 +++++++++++++
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
15
tests/qemu-iotests/common.filter | 1 +
21
hw/virtio/virtio-qmp.c | 2 +
16
tests/qemu-iotests/common.rc | 19 +++++++++++++++++++
22
3 files changed, 393 insertions(+)
17
4 files changed, 39 insertions(+)
18
23
19
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
20
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
21
--- a/tests/qemu-iotests/common
26
--- a/hw/block/virtio-blk-common.c
22
+++ b/tests/qemu-iotests/common
27
+++ b/hw/block/virtio-blk-common.c
23
@@ -XXX,XX +XXX,XX @@ check options
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
24
-ssh test ssh
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
25
-nfs test nfs
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
26
-luks test luks
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
27
+ -vxhs test vxhs
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
28
-xdiff graphical mode diff
33
+ .end = endof(struct virtio_blk_config, zoned)},
29
-nocache use O_DIRECT on backing file
34
{}
30
-misalign misalign memory allocations
35
};
31
@@ -XXX,XX +XXX,XX @@ testlist options
36
32
xpand=false
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
33
;;
34
35
+ -vxhs)
36
+ IMGPROTO=vxhs
37
+ xpand=false
38
+ ;;
39
+
40
-ssh)
41
IMGPROTO=ssh
42
xpand=false
43
diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
44
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
45
--- a/tests/qemu-iotests/common.config
39
--- a/hw/block/virtio-blk.c
46
+++ b/tests/qemu-iotests/common.config
40
+++ b/hw/block/virtio-blk.c
47
@@ -XXX,XX +XXX,XX @@ if [ -z "$QEMU_NBD_PROG" ]; then
41
@@ -XXX,XX +XXX,XX @@
48
export QEMU_NBD_PROG="`set_prog_path qemu-nbd`"
42
#include "qemu/module.h"
49
fi
43
#include "qemu/error-report.h"
50
44
#include "qemu/main-loop.h"
51
+if [ -z "$QEMU_VXHS_PROG" ]; then
45
+#include "block/block_int.h"
52
+ export QEMU_VXHS_PROG="`set_prog_path qnio_server`"
46
#include "trace.h"
53
+fi
47
#include "hw/block/block.h"
54
+
48
#include "hw/qdev-properties.h"
55
_qemu_wrapper()
49
@@ -XXX,XX +XXX,XX @@ err:
50
return err_status;
51
}
52
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
56
{
383
{
57
(
384
uint32_t type;
58
@@ -XXX,XX +XXX,XX @@ _qemu_nbd_wrapper()
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
59
)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
60
}
462
}
61
463
62
+_qemu_vxhs_wrapper()
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
63
+{
465
return;
64
+ (
466
}
65
+ echo $BASHPID > "${TEST_DIR}/qemu-vxhs.pid"
467
66
+ exec "$QEMU_VXHS_PROG" $QEMU_VXHS_OPTIONS "$@"
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
67
+ )
469
+ if (bs->bl.zoned != BLK_Z_NONE) {
68
+}
470
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
69
+
471
+ if (bs->bl.zoned == BLK_Z_HM) {
70
export QEMU=_qemu_wrapper
472
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
71
export QEMU_IMG=_qemu_img_wrapper
473
+ }
72
export QEMU_IO=_qemu_io_wrapper
474
+ }
73
export QEMU_NBD=_qemu_nbd_wrapper
475
+
74
+export QEMU_VXHS=_qemu_vxhs_wrapper
476
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
75
477
(!conf->max_discard_sectors ||
76
QEMU_IMG_EXTRA_ARGS=
478
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
77
if [ "$IMGOPTSSYNTAX" = "true" ]; then
479
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
78
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
79
index XXXXXXX..XXXXXXX 100644
480
index XXXXXXX..XXXXXXX 100644
80
--- a/tests/qemu-iotests/common.filter
481
--- a/hw/virtio/virtio-qmp.c
81
+++ b/tests/qemu-iotests/common.filter
482
+++ b/hw/virtio/virtio-qmp.c
82
@@ -XXX,XX +XXX,XX @@ _filter_img_info()
483
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
83
-e "s#$TEST_DIR#TEST_DIR#g" \
484
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
84
-e "s#$IMGFMT#IMGFMT#g" \
485
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
85
-e 's#nbd://127.0.0.1:10810$#TEST_DIR/t.IMGFMT#g' \
486
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
86
+ -e 's#json.*vdisk-id.*vxhs"}}#TEST_DIR/t.IMGFMT#' \
487
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
87
-e "/encrypted: yes/d" \
488
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
88
-e "/cluster_size: [0-9]\\+/d" \
489
#ifndef VIRTIO_BLK_NO_LEGACY
89
-e "/table_size: [0-9]\\+/d" \
490
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
90
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
491
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
91
index XXXXXXX..XXXXXXX 100644
92
--- a/tests/qemu-iotests/common.rc
93
+++ b/tests/qemu-iotests/common.rc
94
@@ -XXX,XX +XXX,XX @@ else
95
elif [ "$IMGPROTO" = "nfs" ]; then
96
TEST_DIR="nfs://127.0.0.1/$TEST_DIR"
97
TEST_IMG=$TEST_DIR/t.$IMGFMT
98
+ elif [ "$IMGPROTO" = "vxhs" ]; then
99
+ TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
100
+ TEST_IMG="vxhs://127.0.0.1:9999/t.$IMGFMT"
101
else
102
TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
103
fi
104
@@ -XXX,XX +XXX,XX @@ _make_test_img()
105
eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT $TEST_IMG_FILE >/dev/null &"
106
sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
107
fi
108
+
109
+ # Start QNIO server on image directory for vxhs protocol
110
+ if [ $IMGPROTO = "vxhs" ]; then
111
+ eval "$QEMU_VXHS -d $TEST_DIR > /dev/null &"
112
+ sleep 1 # Wait for server to come up.
113
+ fi
114
}
115
116
_rm_test_img()
117
@@ -XXX,XX +XXX,XX @@ _cleanup_test_img()
118
fi
119
rm -f "$TEST_IMG_FILE"
120
;;
121
+ vxhs)
122
+ if [ -f "${TEST_DIR}/qemu-vxhs.pid" ]; then
123
+ local QEMU_VXHS_PID
124
+ read QEMU_VXHS_PID < "${TEST_DIR}/qemu-vxhs.pid"
125
+ kill ${QEMU_VXHS_PID} >/dev/null 2>&1
126
+ rm -f "${TEST_DIR}/qemu-vxhs.pid"
127
+ fi
128
+ rm -f "$TEST_IMG_FILE"
129
+ ;;
130
+
131
file)
132
_rm_test_img "$TEST_DIR/t.$IMGFMT"
133
_rm_test_img "$TEST_DIR/t.$IMGFMT.orig"
134
--
492
--
135
2.9.3
493
2.40.1
136
137
diff view generated by jsdifflib
1
Update 'clientname' to be 'user', which tracks better with both
1
From: Sam Li <faithilikerun@gmail.com>
2
the QAPI and rados variable naming.
3
2
4
Update 'name' to be 'image_name', as it indicates the rbd image.
3
Taking account of the new zone append write operation for zoned devices,
5
Naming it 'image' would have been ideal, but we are using that for
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
6
the rados_image_t value returned by rbd_open().
5
write, flush).
7
6
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Signed-off-by: Jeff Cody <jcody@redhat.com>
8
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
10
Reviewed-by: John Snow <jsnow@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Message-id: b7ec1fb2e1cf36f9b6911631447a5b0422590b7d.1491597120.git.jcody@redhat.com
12
---
10
---
13
block/rbd.c | 33 +++++++++++++++++----------------
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
14
1 file changed, 17 insertions(+), 16 deletions(-)
12
qapi/block.json | 4 +++
13
include/block/accounting.h | 1 +
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
tests/qemu-iotests/227.out | 18 ++++++++++
18
7 files changed, 113 insertions(+), 11 deletions(-)
15
19
16
diff --git a/block/rbd.c b/block/rbd.c
20
diff --git a/qapi/block-core.json b/qapi/block-core.json
17
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
18
--- a/block/rbd.c
22
--- a/qapi/block-core.json
19
+++ b/block/rbd.c
23
+++ b/qapi/block-core.json
20
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRBDState {
24
@@ -XXX,XX +XXX,XX @@
21
rados_t cluster;
25
# @min_wr_latency_ns: Minimum latency of write operations in the
22
rados_ioctx_t io_ctx;
26
# defined interval, in nanoseconds.
23
rbd_image_t image;
27
#
24
- char *name;
28
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
25
+ char *image_name;
29
+# in the defined interval, in nanoseconds
26
char *snap;
30
+# (since 8.1)
27
} BDRVRBDState;
31
+#
28
32
# @min_flush_latency_ns: Minimum latency of flush operations in the
29
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
33
# defined interval, in nanoseconds.
30
int64_t bytes = 0;
34
#
31
int64_t objsize;
35
@@ -XXX,XX +XXX,XX @@
32
int obj_order = 0;
36
# @max_wr_latency_ns: Maximum latency of write operations in the
33
- const char *pool, *name, *conf, *clientname, *keypairs;
37
# defined interval, in nanoseconds.
34
+ const char *pool, *image_name, *conf, *user, *keypairs;
38
#
35
const char *secretid;
39
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
36
rados_t cluster;
40
+# in the defined interval, in nanoseconds
37
rados_ioctx_t io_ctx;
41
+# (since 8.1)
38
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
42
+#
39
*/
43
# @max_flush_latency_ns: Maximum latency of flush operations in the
40
pool = qdict_get_try_str(options, "pool");
44
# defined interval, in nanoseconds.
41
conf = qdict_get_try_str(options, "conf");
45
#
42
- clientname = qdict_get_try_str(options, "user");
46
@@ -XXX,XX +XXX,XX @@
43
- name = qdict_get_try_str(options, "image");
47
# @avg_wr_latency_ns: Average latency of write operations in the
44
+ user = qdict_get_try_str(options, "user");
48
# defined interval, in nanoseconds.
45
+ image_name = qdict_get_try_str(options, "image");
49
#
46
keypairs = qdict_get_try_str(options, "=keyvalue-pairs");
50
+# @avg_zone_append_latency_ns: Average latency of zone append operations
47
51
+# in the defined interval, in nanoseconds
48
- ret = rados_create(&cluster, clientname);
52
+# (since 8.1)
49
+ ret = rados_create(&cluster, user);
53
+#
50
if (ret < 0) {
54
# @avg_flush_latency_ns: Average latency of flush operations in the
51
error_setg_errno(errp, -ret, "error initializing");
55
# defined interval, in nanoseconds.
52
goto exit;
56
#
53
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
57
@@ -XXX,XX +XXX,XX @@
54
goto shutdown;
58
# @avg_wr_queue_depth: Average number of pending write operations in
59
# the defined interval.
60
#
61
+# @avg_zone_append_queue_depth: Average number of pending zone append
62
+# operations in the defined interval
63
+# (since 8.1).
64
+#
65
# Since: 2.5
66
##
67
{ 'struct': 'BlockDeviceTimedStats',
68
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
69
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
70
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
71
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
72
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
73
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
74
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
75
+ 'max_zone_append_latency_ns': 'int',
76
+ 'avg_zone_append_latency_ns': 'int',
77
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
78
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
79
+ 'avg_wr_queue_depth': 'number',
80
+ 'avg_zone_append_queue_depth': 'number' } }
81
82
##
83
# @BlockDeviceStats:
84
@@ -XXX,XX +XXX,XX @@
85
#
86
# @wr_bytes: The number of bytes written by the device.
87
#
88
+# @zone_append_bytes: The number of bytes appended by the zoned devices
89
+# (since 8.1)
90
+#
91
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
92
#
93
# @rd_operations: The number of read operations performed by the
94
@@ -XXX,XX +XXX,XX @@
95
# @wr_operations: The number of write operations performed by the
96
# device.
97
#
98
+# @zone_append_operations: The number of zone append operations performed
99
+# by the zoned devices (since 8.1)
100
+#
101
# @flush_operations: The number of cache flush operations performed by
102
# the device (since 0.15)
103
#
104
@@ -XXX,XX +XXX,XX @@
105
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since
106
# 0.15).
107
#
108
+# @zone_append_total_time_ns: Total time spent on zone append writes
109
+# in nanoseconds (since 8.1)
110
+#
111
# @flush_total_time_ns: Total time spent on cache flushes in
112
# nanoseconds (since 0.15).
113
#
114
@@ -XXX,XX +XXX,XX @@
115
# @wr_merged: Number of write requests that have been merged into
116
# another request (Since 2.3).
117
#
118
+# @zone_append_merged: Number of zone append requests that have been merged
119
+# into another request (since 8.1)
120
+#
121
# @unmap_merged: Number of unmap requests that have been merged into
122
# another request (Since 4.2)
123
#
124
@@ -XXX,XX +XXX,XX @@
125
# @failed_wr_operations: The number of failed write operations
126
# performed by the device (Since 2.5)
127
#
128
+# @failed_zone_append_operations: The number of failed zone append write
129
+# operations performed by the zoned devices
130
+# (since 8.1)
131
+#
132
# @failed_flush_operations: The number of failed flush operations
133
# performed by the device (Since 2.5)
134
#
135
@@ -XXX,XX +XXX,XX @@
136
# @invalid_wr_operations: The number of invalid write operations
137
# performed by the device (Since 2.5)
138
#
139
+# @invalid_zone_append_operations: The number of invalid zone append operations
140
+# performed by the zoned device (since 8.1)
141
+#
142
# @invalid_flush_operations: The number of invalid flush operations
143
# performed by the device (Since 2.5)
144
#
145
@@ -XXX,XX +XXX,XX @@
146
#
147
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
148
#
149
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
150
+#
151
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
152
#
153
# Since: 0.14
154
##
155
{ 'struct': 'BlockDeviceStats',
156
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
157
- 'rd_operations': 'int', 'wr_operations': 'int',
158
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
159
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
160
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
161
'flush_operations': 'int', 'unmap_operations': 'int',
162
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
163
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
164
- 'wr_highest_offset': 'int',
165
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
166
- '*idle_time_ns': 'int',
167
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
168
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
169
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
170
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
171
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
172
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
173
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
174
+ 'failed_zone_append_operations': 'int',
175
+ 'failed_flush_operations': 'int',
176
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
177
+ 'invalid_wr_operations': 'int',
178
+ 'invalid_zone_append_operations': 'int',
179
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
180
'account_invalid': 'bool', 'account_failed': 'bool',
181
'timed_stats': ['BlockDeviceTimedStats'],
182
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
184
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
185
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
186
187
##
188
diff --git a/qapi/block.json b/qapi/block.json
189
index XXXXXXX..XXXXXXX 100644
190
--- a/qapi/block.json
191
+++ b/qapi/block.json
192
@@ -XXX,XX +XXX,XX @@
193
# @boundaries-write: list of interval boundary values for write
194
# latency histogram.
195
#
196
+# @boundaries-zap: list of interval boundary values for zone append write
197
+# latency histogram.
198
+#
199
# @boundaries-flush: list of interval boundary values for flush
200
# latency histogram.
201
#
202
@@ -XXX,XX +XXX,XX @@
203
'*boundaries': ['uint64'],
204
'*boundaries-read': ['uint64'],
205
'*boundaries-write': ['uint64'],
206
+ '*boundaries-zap': ['uint64'],
207
'*boundaries-flush': ['uint64'] },
208
'allow-preconfig': true }
209
diff --git a/include/block/accounting.h b/include/block/accounting.h
210
index XXXXXXX..XXXXXXX 100644
211
--- a/include/block/accounting.h
212
+++ b/include/block/accounting.h
213
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
214
BLOCK_ACCT_READ,
215
BLOCK_ACCT_WRITE,
216
BLOCK_ACCT_FLUSH,
217
+ BLOCK_ACCT_ZONE_APPEND,
218
BLOCK_ACCT_UNMAP,
219
BLOCK_MAX_IOTYPE,
220
};
221
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
222
index XXXXXXX..XXXXXXX 100644
223
--- a/block/qapi-sysemu.c
224
+++ b/block/qapi-sysemu.c
225
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
226
bool has_boundaries, uint64List *boundaries,
227
bool has_boundaries_read, uint64List *boundaries_read,
228
bool has_boundaries_write, uint64List *boundaries_write,
229
+ bool has_boundaries_append, uint64List *boundaries_append,
230
bool has_boundaries_flush, uint64List *boundaries_flush,
231
Error **errp)
232
{
233
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
234
}
55
}
235
}
56
236
57
- ret = rbd_create(io_ctx, name, bytes, &obj_order);
237
+ if (has_boundaries || has_boundaries_append) {
58
+ ret = rbd_create(io_ctx, image_name, bytes, &obj_order);
238
+ ret = block_latency_histogram_set(
59
if (ret < 0) {
239
+ stats, BLOCK_ACCT_ZONE_APPEND,
60
error_setg_errno(errp, -ret, "error rbd create");
240
+ has_boundaries_append ? boundaries_append : boundaries);
241
+ if (ret) {
242
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
243
+ return;
244
+ }
245
+ }
246
+
247
if (has_boundaries || has_boundaries_flush) {
248
ret = block_latency_histogram_set(
249
stats, BLOCK_ACCT_FLUSH,
250
diff --git a/block/qapi.c b/block/qapi.c
251
index XXXXXXX..XXXXXXX 100644
252
--- a/block/qapi.c
253
+++ b/block/qapi.c
254
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
255
256
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
257
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
258
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
259
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
260
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
261
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
262
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
263
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
264
265
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
266
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
267
+ ds->failed_zone_append_operations =
268
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
269
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
270
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
271
272
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
273
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
274
+ ds->invalid_zone_append_operations =
275
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
276
ds->invalid_flush_operations =
277
stats->invalid_ops[BLOCK_ACCT_FLUSH];
278
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
279
280
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
281
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
282
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
283
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
284
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
285
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
286
+ ds->zone_append_total_time_ns =
287
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
288
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
289
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
290
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
291
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
292
293
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
294
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
295
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
296
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
297
298
dev_stats->interval_length = ts->interval_length;
299
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
300
dev_stats->max_wr_latency_ns = timed_average_max(wr);
301
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
302
303
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
304
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
305
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
306
+
307
dev_stats->min_flush_latency_ns = timed_average_min(fl);
308
dev_stats->max_flush_latency_ns = timed_average_max(fl);
309
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
310
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
311
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
312
dev_stats->avg_wr_queue_depth =
313
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
314
+ dev_stats->avg_zone_append_queue_depth =
315
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
316
317
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
61
}
318
}
62
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
319
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
63
Error **errp)
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
64
{
321
ds->wr_latency_histogram
65
BDRVRBDState *s = bs->opaque;
322
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
66
- const char *pool, *snap, *conf, *clientname, *name, *keypairs;
323
+ ds->zone_append_latency_histogram
67
+ const char *pool, *snap, *conf, *user, *image_name, *keypairs;
324
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
68
const char *secretid;
325
ds->flush_latency_histogram
69
QemuOpts *opts;
326
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
70
Error *local_err = NULL;
71
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
72
pool = qemu_opt_get(opts, "pool");
73
conf = qemu_opt_get(opts, "conf");
74
snap = qemu_opt_get(opts, "snapshot");
75
- clientname = qemu_opt_get(opts, "user");
76
- name = qemu_opt_get(opts, "image");
77
+ user = qemu_opt_get(opts, "user");
78
+ image_name = qemu_opt_get(opts, "image");
79
keypairs = qemu_opt_get(opts, "=keyvalue-pairs");
80
81
- if (!pool || !name) {
82
+ if (!pool || !image_name) {
83
error_setg(errp, "Parameters 'pool' and 'image' are required");
84
r = -EINVAL;
85
goto failed_opts;
86
}
87
88
- r = rados_create(&s->cluster, clientname);
89
+ r = rados_create(&s->cluster, user);
90
if (r < 0) {
91
error_setg_errno(errp, -r, "error initializing");
92
goto failed_opts;
93
}
94
95
s->snap = g_strdup(snap);
96
- s->name = g_strdup(name);
97
+ s->image_name = g_strdup(image_name);
98
99
/* try default location when conf=NULL, but ignore failure */
100
r = rados_conf_read_file(s->cluster, conf);
101
@@ -XXX,XX +XXX,XX @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
102
}
103
104
/* rbd_open is always r/w */
105
- r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
106
+ r = rbd_open(s->io_ctx, s->image_name, &s->image, s->snap);
107
if (r < 0) {
108
- error_setg_errno(errp, -r, "error reading header from %s", s->name);
109
+ error_setg_errno(errp, -r, "error reading header from %s",
110
+ s->image_name);
111
goto failed_open;
112
}
113
114
@@ -XXX,XX +XXX,XX @@ failed_open:
115
failed_shutdown:
116
rados_shutdown(s->cluster);
117
g_free(s->snap);
118
- g_free(s->name);
119
+ g_free(s->image_name);
120
failed_opts:
121
qemu_opts_del(opts);
122
g_free(mon_host);
123
@@ -XXX,XX +XXX,XX @@ static void qemu_rbd_close(BlockDriverState *bs)
124
rbd_close(s->image);
125
rados_ioctx_destroy(s->io_ctx);
126
g_free(s->snap);
127
- g_free(s->name);
128
+ g_free(s->image_name);
129
rados_shutdown(s->cluster);
130
}
327
}
131
328
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
329
index XXXXXXX..XXXXXXX 100644
330
--- a/hw/block/virtio-blk.c
331
+++ b/hw/block/virtio-blk.c
332
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
333
data->in_num = in_num;
334
data->zone_append_data.offset = offset;
335
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
336
+
337
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
338
+ BLOCK_ACCT_ZONE_APPEND);
339
+
340
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
341
virtio_blk_zone_append_complete, data);
342
return 0;
343
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
344
index XXXXXXX..XXXXXXX 100644
345
--- a/tests/qemu-iotests/227.out
346
+++ b/tests/qemu-iotests/227.out
347
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
348
"stats": {
349
"unmap_operations": 0,
350
"unmap_merged": 0,
351
+ "failed_zone_append_operations": 0,
352
"flush_total_time_ns": 0,
353
"wr_highest_offset": 0,
354
"wr_total_time_ns": 0,
355
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
356
"timed_stats": [
357
],
358
"failed_unmap_operations": 0,
359
+ "zone_append_merged": 0,
360
"failed_flush_operations": 0,
361
"account_invalid": true,
362
"rd_total_time_ns": 0,
363
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
364
"unmap_total_time_ns": 0,
365
"invalid_flush_operations": 0,
366
"account_failed": true,
367
+ "zone_append_total_time_ns": 0,
368
+ "zone_append_operations": 0,
369
"rd_operations": 0,
370
+ "zone_append_bytes": 0,
371
+ "invalid_zone_append_operations": 0,
372
"invalid_wr_operations": 0,
373
"invalid_rd_operations": 0
374
},
375
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
376
"stats": {
377
"unmap_operations": 0,
378
"unmap_merged": 0,
379
+ "failed_zone_append_operations": 0,
380
"flush_total_time_ns": 0,
381
"wr_highest_offset": 0,
382
"wr_total_time_ns": 0,
383
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
384
"timed_stats": [
385
],
386
"failed_unmap_operations": 0,
387
+ "zone_append_merged": 0,
388
"failed_flush_operations": 0,
389
"account_invalid": true,
390
"rd_total_time_ns": 0,
391
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
392
"unmap_total_time_ns": 0,
393
"invalid_flush_operations": 0,
394
"account_failed": true,
395
+ "zone_append_total_time_ns": 0,
396
+ "zone_append_operations": 0,
397
"rd_operations": 0,
398
+ "zone_append_bytes": 0,
399
+ "invalid_zone_append_operations": 0,
400
"invalid_wr_operations": 0,
401
"invalid_rd_operations": 0
402
},
403
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
404
"stats": {
405
"unmap_operations": 0,
406
"unmap_merged": 0,
407
+ "failed_zone_append_operations": 0,
408
"flush_total_time_ns": 0,
409
"wr_highest_offset": 0,
410
"wr_total_time_ns": 0,
411
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
412
"timed_stats": [
413
],
414
"failed_unmap_operations": 0,
415
+ "zone_append_merged": 0,
416
"failed_flush_operations": 0,
417
"account_invalid": true,
418
"rd_total_time_ns": 0,
419
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
420
"unmap_total_time_ns": 0,
421
"invalid_flush_operations": 0,
422
"account_failed": true,
423
+ "zone_append_total_time_ns": 0,
424
+ "zone_append_operations": 0,
425
"rd_operations": 0,
426
+ "zone_append_bytes": 0,
427
+ "invalid_zone_append_operations": 0,
428
"invalid_wr_operations": 0,
429
"invalid_rd_operations": 0
430
},
132
--
431
--
133
2.9.3
432
2.40.1
134
135
diff view generated by jsdifflib
1
Signed-off-by: Jeff Cody <jcody@redhat.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
2
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Reviewed-by: John Snow <jsnow@redhat.com>
5
Message-id: 20230508051916.178322-4-faithilikerun@gmail.com
4
Message-id: 00aed7ffdd7be4b9ed9ce1007d50028a72b34ebe.1491597120.git.jcody@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
---
7
---
6
block.c | 14 ++++++++------
8
hw/block/virtio-blk.c | 12 ++++++++++++
7
1 file changed, 8 insertions(+), 6 deletions(-)
9
hw/block/trace-events | 7 +++++++
10
2 files changed, 19 insertions(+)
8
11
9
diff --git a/block.c b/block.c
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
10
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
11
--- a/block.c
14
--- a/hw/block/virtio-blk.c
12
+++ b/block.c
15
+++ b/hw/block/virtio-blk.c
13
@@ -XXX,XX +XXX,XX @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
14
BlockDriver *drv;
17
int64_t nz = data->zone_report_data.nr_zones;
15
QemuOpts *opts;
18
int8_t err_status = VIRTIO_BLK_S_OK;
16
const char *value;
19
17
+ bool read_only;
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
18
21
if (ret) {
19
assert(reopen_state != NULL);
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
20
assert(reopen_state->bs->drv != NULL);
23
goto out;
21
@@ -XXX,XX +XXX,XX @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
22
qdict_put(reopen_state->options, "driver", qstring_from_str(value));
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
26
sizeof(struct virtio_blk_zone_report)) /
27
sizeof(struct virtio_blk_zone_descriptor);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
30
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
23
}
59
}
24
60
25
- /* if we are to stay read-only, do not allow permission change
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
26
- * to r/w */
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
27
- if (!(reopen_state->bs->open_flags & BDRV_O_ALLOW_RDWR) &&
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
28
- reopen_state->flags & BDRV_O_RDWR) {
64
goto out;
29
- error_setg(errp, "Node '%s' is read only",
30
- bdrv_get_device_or_node_name(reopen_state->bs));
31
+ /* If we are to stay read-only, do not allow permission change
32
+ * to r/w. Attempting to set to r/w may fail if either BDRV_O_ALLOW_RDWR is
33
+ * not set, or if the BDS still has copy_on_read enabled */
34
+ read_only = !(reopen_state->flags & BDRV_O_RDWR);
35
+ ret = bdrv_can_set_read_only(reopen_state->bs, read_only, &local_err);
36
+ if (local_err) {
37
+ error_propagate(errp, local_err);
38
goto error;
39
}
65
}
40
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
41
--
99
--
42
2.9.3
100
2.40.1
43
44
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Message-id: 20230508051916.178322-5-faithilikerun@gmail.com
8
[Fix pre-formatted code syntax
9
--Stefan]
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
13
1 file changed, 19 insertions(+)
14
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/docs/devel/zoned-storage.rst
18
+++ b/docs/devel/zoned-storage.rst
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
20
For example, to test zone_report on a null_blk device using qemu-io is::
21
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
23
+
24
+To expose the host's zoned block device through virtio-blk, the command line
25
+can be (includes the -device parameter)::
26
+
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
29
+
30
+Or only use the -drive parameter::
31
+
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
33
+
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
41
+attached to the HBA.
42
--
43
2.40.1
diff view generated by jsdifflib