1
The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
1
The following changes since commit 8844bb8d896595ee1d25d21c770e6e6f29803097:
2
2
3
Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
3
Merge tag 'or1k-pull-request-20230513' of https://github.com/stffrdhrn/qemu into staging (2023-05-13 11:23:14 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
9
for you to fetch changes up to 01562fee5f3ad4506d57dbcf4b1903b565eceec7:
10
10
11
iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
11
docs/zoned-storage:add zoned emulation use case (2023-05-15 08:19:04 -0400)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Block patches:
14
Pull request
15
- The SSH block driver now uses libssh instead of libssh2
16
- The VMDK block driver gets read-only support for the seSparse
17
subformat
18
- Various fixes
19
15
20
---
16
This pull request contain's Sam Li's zoned storage support in the QEMU block
17
layer and virtio-blk emulation.
21
18
22
v2:
19
v2:
23
- Squashed Pino's fix for pre-0.8 libssh into the libssh patch
20
- Sam fixed the CI failures. CI passes for me now. [Richard]
24
21
25
----------------------------------------------------------------
22
----------------------------------------------------------------
26
Anton Nefedov (1):
27
iotest 134: test cluster-misaligned encrypted write
28
23
29
Klaus Birkelund Jensen (1):
24
Sam Li (16):
30
nvme: do not advertise support for unsupported arbitration mechanism
25
block/block-common: add zoned device structs
26
block/file-posix: introduce helper functions for sysfs attributes
27
block/block-backend: add block layer APIs resembling Linux
28
ZonedBlockDevice ioctls
29
block/raw-format: add zone operations to pass through requests
30
block: add zoned BlockDriver check to block layer
31
iotests: test new zone operations
32
block: add some trace events for new block layer APIs
33
docs/zoned-storage: add zoned device documentation
34
file-posix: add tracking of the zone write pointers
35
block: introduce zone append write for zoned devices
36
qemu-iotests: test zone append operation
37
block: add some trace events for zone append
38
virtio-blk: add zoned storage emulation for zoned devices
39
block: add accounting for zone append operation
40
virtio-blk: add some trace events for zoned emulation
41
docs/zoned-storage:add zoned emulation use case
31
42
32
Max Reitz (1):
43
docs/devel/index-api.rst | 1 +
33
iotests: Fix 205 for concurrent runs
44
docs/devel/zoned-storage.rst | 62 +++
34
45
qapi/block-core.json | 68 ++-
35
Pino Toscano (1):
46
qapi/block.json | 4 +
36
ssh: switch from libssh2 to libssh
47
meson.build | 5 +
37
48
include/block/accounting.h | 1 +
38
Sam Eiderman (3):
49
include/block/block-common.h | 57 ++
39
vmdk: Fix comment regarding max l1_size coverage
50
include/block/block-io.h | 13 +
40
vmdk: Reduce the max bound for L1 table size
51
include/block/block_int-common.h | 37 ++
41
vmdk: Add read-only support for seSparse snapshots
52
include/block/raw-aio.h | 8 +-
42
53
include/sysemu/block-backend-io.h | 27 +
43
Vladimir Sementsov-Ogievskiy (1):
54
block.c | 19 +
44
blockdev: enable non-root nodes for transaction drive-backup source
55
block/block-backend.c | 198 +++++++
45
56
block/file-posix.c | 692 +++++++++++++++++++++++--
46
configure | 65 +-
57
block/io.c | 68 +++
47
block/Makefile.objs | 6 +-
58
block/io_uring.c | 4 +
48
block/ssh.c | 652 ++++++++++--------
59
block/linux-aio.c | 3 +
49
block/vmdk.c | 372 +++++++++-
60
block/qapi-sysemu.c | 11 +
50
blockdev.c | 2 +-
61
block/qapi.c | 18 +
51
hw/block/nvme.c | 1 -
62
block/raw-format.c | 26 +
52
.travis.yml | 4 +-
63
hw/block/virtio-blk-common.c | 2 +
53
block/trace-events | 14 +-
64
hw/block/virtio-blk.c | 405 +++++++++++++++
54
docs/qemu-block-drivers.texi | 2 +-
65
hw/virtio/virtio-qmp.c | 2 +
55
.../dockerfiles/debian-win32-cross.docker | 1 -
66
qemu-io-cmds.c | 224 ++++++++
56
.../dockerfiles/debian-win64-cross.docker | 1 -
67
block/trace-events | 4 +
57
tests/docker/dockerfiles/fedora.docker | 4 +-
68
docs/system/qemu-block-drivers.rst.inc | 6 +
58
tests/docker/dockerfiles/ubuntu.docker | 2 +-
69
hw/block/trace-events | 7 +
59
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
70
tests/qemu-iotests/227.out | 18 +
60
tests/qemu-iotests/059.out | 2 +-
71
tests/qemu-iotests/tests/zoned | 105 ++++
61
tests/qemu-iotests/134 | 9 +
72
tests/qemu-iotests/tests/zoned.out | 69 +++
62
tests/qemu-iotests/134.out | 10 +
73
30 files changed, 2106 insertions(+), 58 deletions(-)
63
tests/qemu-iotests/205 | 2 +-
74
create mode 100644 docs/devel/zoned-storage.rst
64
tests/qemu-iotests/207 | 54 +-
75
create mode 100755 tests/qemu-iotests/tests/zoned
65
tests/qemu-iotests/207.out | 2 +-
76
create mode 100644 tests/qemu-iotests/tests/zoned.out
66
20 files changed, 823 insertions(+), 384 deletions(-)
67
77
68
--
78
--
69
2.21.0
79
2.40.1
70
71
diff view generated by jsdifflib
1
Tests should place their files into the test directory. This includes
1
From: Sam Li <faithilikerun@gmail.com>
2
Unix sockets. 205 currently fails to do so, which prevents it from
3
being run concurrently.
4
2
5
Signed-off-by: Max Reitz <mreitz@redhat.com>
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Message-id: 20190618210238.9524-1-mreitz@redhat.com
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
8
Signed-off-by: Max Reitz <mreitz@redhat.com>
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230508045533.175575-2-faithilikerun@gmail.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
16
---
10
tests/qemu-iotests/205 | 2 +-
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
11
1 file changed, 1 insertion(+), 1 deletion(-)
18
1 file changed, 43 insertions(+)
12
19
13
diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
14
index XXXXXXX..XXXXXXX 100755
21
index XXXXXXX..XXXXXXX 100644
15
--- a/tests/qemu-iotests/205
22
--- a/include/block/block-common.h
16
+++ b/tests/qemu-iotests/205
23
+++ b/include/block/block-common.h
17
@@ -XXX,XX +XXX,XX @@ import iotests
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
18
import time
25
typedef struct BdrvChild BdrvChild;
19
from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
26
typedef struct BdrvChildClass BdrvChildClass;
20
27
21
-nbd_sock = 'nbd_sock'
28
+typedef enum BlockZoneOp {
22
+nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
29
+ BLK_ZO_OPEN,
23
nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
30
+ BLK_ZO_CLOSE,
24
disk = os.path.join(iotests.test_dir, 'disk')
31
+ BLK_ZO_FINISH,
25
32
+ BLK_ZO_RESET,
33
+} BlockZoneOp;
34
+
35
+typedef enum BlockZoneModel {
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
39
+} BlockZoneModel;
40
+
41
+typedef enum BlockZoneState {
42
+ BLK_ZS_NOT_WP = 0x0,
43
+ BLK_ZS_EMPTY = 0x1,
44
+ BLK_ZS_IOPEN = 0x2,
45
+ BLK_ZS_EOPEN = 0x3,
46
+ BLK_ZS_CLOSED = 0x4,
47
+ BLK_ZS_RDONLY = 0xD,
48
+ BLK_ZS_FULL = 0xE,
49
+ BLK_ZS_OFFLINE = 0xF,
50
+} BlockZoneState;
51
+
52
+typedef enum BlockZoneType {
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
56
+} BlockZoneType;
57
+
58
+/*
59
+ * Zone descriptor data structure.
60
+ * Provides information on a zone with all position and size values in bytes.
61
+ */
62
+typedef struct BlockZoneDescriptor {
63
+ uint64_t start;
64
+ uint64_t length;
65
+ uint64_t cap;
66
+ uint64_t wp;
67
+ BlockZoneType type;
68
+ BlockZoneState state;
69
+} BlockZoneDescriptor;
70
+
71
typedef struct BlockDriverInfo {
72
/* in bytes, 0 if irrelevant */
73
int cluster_size;
26
--
74
--
27
2.21.0
75
2.40.1
28
76
29
77
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
3
Use get_sysfs_str_val() to get the string value of device
4
QEMU).
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
5
BlockZoneModel type of QEMU.
6
This format was lacking in the following:
6
7
7
Use get_sysfs_long_val() to get the long value of zoned device
8
* Grain directory (L1) and grain table (L2) entries were 32-bit,
8
information.
9
allowing access to only 2TB (slightly less) of data.
9
10
* The grain size (default) was 512 bytes - leading to data
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
fragmentation and many grain tables.
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
* For space reclamation purposes, it was necessary to find all the
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
grains which are not pointed to by any grain table - so a reverse
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
mapping of "offset of grain in vmdk" to "grain table" must be
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
constructed - which takes large amounts of CPU/RAM.
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
The format specification can be found in VMware's documentation:
17
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
18
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
20
<philmd@linaro.org>.
21
introduced: SESparse (Space Efficient).
21
--Stefan]
22
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
This format fixes the above issues:
24
25
* All entries are now 64-bit.
26
* The grain size (default) is 4KB.
27
* Grain directory and grain tables are now located at the beginning
28
of the file.
29
+ seSparse format reserves space for all grain tables.
30
+ Grain tables can be addressed using an index.
31
+ Grains are located in the end of the file and can also be
32
addressed with an index.
33
- seSparse vmdks of large disks (64TB) have huge preallocated
34
headers - mainly due to L2 tables, even for empty snapshots.
35
* The header contains a reverse mapping ("backmap") of "offset of
36
grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
37
specifies for each grain - whether it is allocated or not.
38
Using these data structures we can implement space reclamation
39
efficiently.
40
* Due to the fact that the header now maintains two mappings:
41
* The regular one (grain directory & grain tables)
42
* A reverse one (backmap and free bitmap)
43
These data structures can lose consistency upon crash and result
44
in a corrupted VMDK.
45
Therefore, a journal is also added to the VMDK and is replayed
46
when the VMware reopens the file after a crash.
47
48
Since ESXi 6.7 - SESparse is the only snapshot format available.
49
50
Unfortunately, VMware does not provide documentation regarding the new
51
seSparse format.
52
53
This commit is based on black-box research of the seSparse format.
54
Various in-guest block operations and their effect on the snapshot file
55
were tested.
56
57
The only VMware provided source of information (regarding the underlying
58
implementation) was a log file on the ESXi:
59
60
/var/log/hostd.log
61
62
Whenever an seSparse snapshot is created - the log is being populated
63
with seSparse records.
64
65
Relevant log records are of the form:
66
67
[...] Const Header:
68
[...] constMagic = 0xcafebabe
69
[...] version = 2.1
70
[...] capacity = 204800
71
[...] grainSize = 8
72
[...] grainTableSize = 64
73
[...] flags = 0
74
[...] Extents:
75
[...] Header : <1 : 1>
76
[...] JournalHdr : <2 : 2>
77
[...] Journal : <2048 : 2048>
78
[...] GrainDirectory : <4096 : 2048>
79
[...] GrainTables : <6144 : 2048>
80
[...] FreeBitmap : <8192 : 2048>
81
[...] BackMap : <10240 : 2048>
82
[...] Grain : <12288 : 204800>
83
[...] Volatile Header:
84
[...] volatileMagic = 0xcafecafe
85
[...] FreeGTNumber = 0
86
[...] nextTxnSeqNumber = 0
87
[...] replayJournal = 0
88
89
The sizes that are seen in the log file are in sectors.
90
Extents are of the following format: <offset : size>
91
92
This commit is a strict implementation which enforces:
93
* magics
94
* version number 2.1
95
* grain size of 8 sectors (4KB)
96
* grain table size of 64 sectors
97
* zero flags
98
* extent locations
99
100
Additionally, this commit proivdes only a subset of the functionality
101
offered by seSparse's format:
102
* Read-only
103
* No journal replay
104
* No space reclamation
105
* No unmap support
106
107
Hence, journal header, journal, free bitmap and backmap extents are
108
unused, only the "classic" (L1 -> L2 -> data) grain access is
109
implemented.
110
111
However there are several differences in the grain access itself.
112
Grain directory (L1):
113
* Grain directory entries are indexes (not offsets) to grain
114
tables.
115
* Valid grain directory entries have their highest nibble set to
116
0x1.
117
* Since grain tables are always located in the beginning of the
118
file - the index can fit into 32 bits - so we can use its low
119
part if it's valid.
120
Grain table (L2):
121
* Grain table entries are indexes (not offsets) to grains.
122
* If the highest nibble of the entry is:
123
0x0:
124
The grain in not allocated.
125
The rest of the bytes are 0.
126
0x1:
127
The grain is unmapped - guest sees a zero grain.
128
The rest of the bits point to the previously mapped grain,
129
see 0x3 case.
130
0x2:
131
The grain is zero.
132
0x3:
133
The grain is allocated - to get the index calculate:
134
((entry & 0x0fff000000000000) >> 48) |
135
((entry & 0x0000ffffffffffff) << 12)
136
* The difference between 0x1 and 0x2 is that 0x1 is an unallocated
137
grain which results from the guest using sg_unmap to unmap the
138
grain - but the grain itself still exists in the grain extent - a
139
space reclamation procedure should delete it.
140
Unmapping a zero grain has no effect (0x2 will not change to 0x1)
141
but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
142
143
In order to implement seSparse some fields had to be changed to support
144
both 32-bit and 64-bit entry sizes.
145
146
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
147
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
148
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
149
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
150
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
151
Signed-off-by: Max Reitz <mreitz@redhat.com>
152
---
23
---
153
block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
24
include/block/block_int-common.h | 3 +
154
1 file changed, 342 insertions(+), 16 deletions(-)
25
block/file-posix.c | 135 ++++++++++++++++++++++---------
155
26
2 files changed, 100 insertions(+), 38 deletions(-)
156
diff --git a/block/vmdk.c b/block/vmdk.c
27
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
157
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
158
--- a/block/vmdk.c
30
--- a/include/block/block_int-common.h
159
+++ b/block/vmdk.c
31
+++ b/include/block/block_int-common.h
160
@@ -XXX,XX +XXX,XX @@ typedef struct {
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
161
uint16_t compressAlgorithm;
33
* an explicit monitor command to load the disk inside the guest).
162
} QEMU_PACKED VMDK4Header;
34
*/
163
35
bool has_variable_length;
164
+typedef struct VMDKSESparseConstHeader {
36
+
165
+ uint64_t magic;
37
+ /* device zone model */
166
+ uint64_t version;
38
+ BlockZoneModel zoned;
167
+ uint64_t capacity;
39
} BlockLimits;
168
+ uint64_t grain_size;
40
169
+ uint64_t grain_table_size;
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
170
+ uint64_t flags;
42
diff --git a/block/file-posix.c b/block/file-posix.c
171
+ uint64_t reserved1;
43
index XXXXXXX..XXXXXXX 100644
172
+ uint64_t reserved2;
44
--- a/block/file-posix.c
173
+ uint64_t reserved3;
45
+++ b/block/file-posix.c
174
+ uint64_t reserved4;
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
175
+ uint64_t volatile_header_offset;
47
#endif
176
+ uint64_t volatile_header_size;
177
+ uint64_t journal_header_offset;
178
+ uint64_t journal_header_size;
179
+ uint64_t journal_offset;
180
+ uint64_t journal_size;
181
+ uint64_t grain_dir_offset;
182
+ uint64_t grain_dir_size;
183
+ uint64_t grain_tables_offset;
184
+ uint64_t grain_tables_size;
185
+ uint64_t free_bitmap_offset;
186
+ uint64_t free_bitmap_size;
187
+ uint64_t backmap_offset;
188
+ uint64_t backmap_size;
189
+ uint64_t grains_offset;
190
+ uint64_t grains_size;
191
+ uint8_t pad[304];
192
+} QEMU_PACKED VMDKSESparseConstHeader;
193
+
194
+typedef struct VMDKSESparseVolatileHeader {
195
+ uint64_t magic;
196
+ uint64_t free_gt_number;
197
+ uint64_t next_txn_seq_number;
198
+ uint64_t replay_journal;
199
+ uint8_t pad[480];
200
+} QEMU_PACKED VMDKSESparseVolatileHeader;
201
+
202
#define L2_CACHE_SIZE 16
203
204
typedef struct VmdkExtent {
205
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
206
bool compressed;
207
bool has_marker;
208
bool has_zero_grain;
209
+ bool sesparse;
210
+ uint64_t sesparse_l2_tables_offset;
211
+ uint64_t sesparse_clusters_offset;
212
+ int32_t entry_size;
213
int version;
214
int64_t sectors;
215
int64_t end_sector;
216
int64_t flat_start_offset;
217
int64_t l1_table_offset;
218
int64_t l1_backup_table_offset;
219
- uint32_t *l1_table;
220
+ void *l1_table;
221
uint32_t *l1_backup_table;
222
unsigned int l1_size;
223
uint32_t l1_entry_sectors;
224
225
unsigned int l2_size;
226
- uint32_t *l2_cache;
227
+ void *l2_cache;
228
uint32_t l2_cache_offsets[L2_CACHE_SIZE];
229
uint32_t l2_cache_counts[L2_CACHE_SIZE];
230
231
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
232
* minimal L2 table size: 512 entries
233
* 8 TB is still more than the maximal value supported for
234
* VMDK3 & VMDK4 which is 2TB.
235
+ * 64TB - for "ESXi seSparse Extent"
236
+ * minimal cluster size: 512B (default is 4KB)
237
+ * L2 table size: 4096 entries (const).
238
+ * 64TB is more than the maximal value supported for
239
+ * seSparse VMDKs (which is slightly less than 64TB)
240
*/
241
error_setg(errp, "L1 size too big");
242
return -EFBIG;
243
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
244
extent->l2_size = l2_size;
245
extent->cluster_sectors = flat ? sectors : cluster_sectors;
246
extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
247
+ extent->entry_size = sizeof(uint32_t);
248
249
if (s->num_extents > 1) {
250
extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
251
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
252
int i;
253
254
/* read the L1 table */
255
- l1_size = extent->l1_size * sizeof(uint32_t);
256
+ l1_size = extent->l1_size * extent->entry_size;
257
extent->l1_table = g_try_malloc(l1_size);
258
if (l1_size && extent->l1_table == NULL) {
259
return -ENOMEM;
260
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
261
goto fail_l1;
262
}
263
for (i = 0; i < extent->l1_size; i++) {
264
- le32_to_cpus(&extent->l1_table[i]);
265
+ if (extent->entry_size == sizeof(uint64_t)) {
266
+ le64_to_cpus((uint64_t *)extent->l1_table + i);
267
+ } else {
268
+ assert(extent->entry_size == sizeof(uint32_t));
269
+ le32_to_cpus((uint32_t *)extent->l1_table + i);
270
+ }
271
}
272
273
if (extent->l1_backup_table_offset) {
274
+ assert(!extent->sesparse);
275
extent->l1_backup_table = g_try_malloc(l1_size);
276
if (l1_size && extent->l1_backup_table == NULL) {
277
ret = -ENOMEM;
278
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
279
}
280
281
extent->l2_cache =
282
- g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
283
+ g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
284
return 0;
285
fail_l1b:
286
g_free(extent->l1_backup_table);
287
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
288
return ret;
289
}
48
}
290
49
291
+#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
50
-static int hdev_get_max_segments(int fd, struct stat *st)
292
+#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
51
+/*
293
+
52
+ * Get a sysfs attribute value as character string.
294
+/* Strict checks - format not officially documented */
53
+ */
295
+static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
54
+#ifdef CONFIG_LINUX
296
+ Error **errp)
55
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
297
+{
56
+ char **val) {
298
+ header->magic = le64_to_cpu(header->magic);
57
+ g_autofree char *sysfspath = NULL;
299
+ header->version = le64_to_cpu(header->version);
58
+ int ret;
300
+ header->grain_size = le64_to_cpu(header->grain_size);
59
+ size_t len;
301
+ header->grain_table_size = le64_to_cpu(header->grain_table_size);
60
+
302
+ header->flags = le64_to_cpu(header->flags);
61
+ if (!S_ISBLK(st->st_mode)) {
303
+ header->reserved1 = le64_to_cpu(header->reserved1);
304
+ header->reserved2 = le64_to_cpu(header->reserved2);
305
+ header->reserved3 = le64_to_cpu(header->reserved3);
306
+ header->reserved4 = le64_to_cpu(header->reserved4);
307
+
308
+ header->volatile_header_offset =
309
+ le64_to_cpu(header->volatile_header_offset);
310
+ header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
311
+
312
+ header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
313
+ header->journal_header_size = le64_to_cpu(header->journal_header_size);
314
+
315
+ header->journal_offset = le64_to_cpu(header->journal_offset);
316
+ header->journal_size = le64_to_cpu(header->journal_size);
317
+
318
+ header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
319
+ header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
320
+
321
+ header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
322
+ header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
323
+
324
+ header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
325
+ header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
326
+
327
+ header->backmap_offset = le64_to_cpu(header->backmap_offset);
328
+ header->backmap_size = le64_to_cpu(header->backmap_size);
329
+
330
+ header->grains_offset = le64_to_cpu(header->grains_offset);
331
+ header->grains_size = le64_to_cpu(header->grains_size);
332
+
333
+ if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
334
+ error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
335
+ header->magic);
336
+ return -EINVAL;
337
+ }
338
+
339
+ if (header->version != 0x0000000200000001) {
340
+ error_setg(errp, "Unsupported version: 0x%016" PRIx64,
341
+ header->version);
342
+ return -ENOTSUP;
62
+ return -ENOTSUP;
343
+ }
63
+ }
344
+
64
+
345
+ if (header->grain_size != 8) {
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
346
+ error_setg(errp, "Unsupported grain size: %" PRIu64,
66
+ major(st->st_rdev), minor(st->st_rdev),
347
+ header->grain_size);
67
+ attribute);
348
+ return -ENOTSUP;
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
349
+ }
69
+ if (ret == -1) {
350
+
70
+ return -ENOENT;
351
+ if (header->grain_table_size != 64) {
71
+ }
352
+ error_setg(errp, "Unsupported grain table size: %" PRIu64,
72
+
353
+ header->grain_table_size);
73
+ /* The file is ended with '\n' */
354
+ return -ENOTSUP;
74
+ char *p;
355
+ }
75
+ p = *val;
356
+
76
+ if (*(p + len - 1) == '\n') {
357
+ if (header->flags != 0) {
77
+ *(p + len - 1) = '\0';
358
+ error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
78
+ }
359
+ header->flags);
79
+ return ret;
360
+ return -ENOTSUP;
80
+}
361
+ }
81
+#endif
362
+
82
+
363
+ if (header->reserved1 != 0 || header->reserved2 != 0 ||
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
364
+ header->reserved3 != 0 || header->reserved4 != 0) {
84
{
365
+ error_setg(errp, "Unsupported reserved bits:"
85
+ g_autofree char *val = NULL;
366
+ " 0x%016" PRIx64 " 0x%016" PRIx64
86
+ int ret;
367
+ " 0x%016" PRIx64 " 0x%016" PRIx64,
87
+
368
+ header->reserved1, header->reserved2,
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
369
+ header->reserved3, header->reserved4);
370
+ return -ENOTSUP;
371
+ }
372
+
373
+ /* check that padding is 0 */
374
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
375
+ error_setg(errp, "Unsupported non-zero const header padding");
376
+ return -ENOTSUP;
377
+ }
378
+
379
+ return 0;
380
+}
381
+
382
+static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
383
+ Error **errp)
384
+{
385
+ header->magic = le64_to_cpu(header->magic);
386
+ header->free_gt_number = le64_to_cpu(header->free_gt_number);
387
+ header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
388
+ header->replay_journal = le64_to_cpu(header->replay_journal);
389
+
390
+ if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
391
+ error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
392
+ header->magic);
393
+ return -EINVAL;
394
+ }
395
+
396
+ if (header->replay_journal) {
397
+ error_setg(errp, "Image is dirty, Replaying journal not supported");
398
+ return -ENOTSUP;
399
+ }
400
+
401
+ /* check that padding is 0 */
402
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
403
+ error_setg(errp, "Unsupported non-zero volatile header padding");
404
+ return -ENOTSUP;
405
+ }
406
+
407
+ return 0;
408
+}
409
+
410
+static int vmdk_open_se_sparse(BlockDriverState *bs,
411
+ BdrvChild *file,
412
+ int flags, Error **errp)
413
+{
414
+ int ret;
415
+ VMDKSESparseConstHeader const_header;
416
+ VMDKSESparseVolatileHeader volatile_header;
417
+ VmdkExtent *extent;
418
+
419
+ ret = bdrv_apply_auto_read_only(bs,
420
+ "No write support for seSparse images available", errp);
421
+ if (ret < 0) {
89
+ if (ret < 0) {
422
+ return ret;
90
+ return ret;
423
+ }
91
+ }
424
+
92
+
425
+ assert(sizeof(const_header) == SECTOR_SIZE);
93
+ if (strcmp(val, "host-managed") == 0) {
426
+
94
+ *zoned = BLK_Z_HM;
427
+ ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
95
+ } else if (strcmp(val, "host-aware") == 0) {
428
+ if (ret < 0) {
96
+ *zoned = BLK_Z_HA;
429
+ bdrv_refresh_filename(file->bs);
97
+ } else if (strcmp(val, "none") == 0) {
430
+ error_setg_errno(errp, -ret,
98
+ *zoned = BLK_Z_NONE;
431
+ "Could not read const header from file '%s'",
99
+ } else {
432
+ file->bs->filename);
100
+ return -ENOTSUP;
433
+ return ret;
101
+ }
434
+ }
102
+ return 0;
435
+
103
+}
436
+ /* check const header */
104
+
437
+ ret = check_se_sparse_const_header(&const_header, errp);
105
+/*
106
+ * Get a sysfs attribute value as a long integer.
107
+ */
108
#ifdef CONFIG_LINUX
109
- char buf[32];
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
+{
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
438
+ if (ret < 0) {
119
+ if (ret < 0) {
439
+ return ret;
120
+ return ret;
440
+ }
121
+ }
441
+
122
+
442
+ assert(sizeof(volatile_header) == SECTOR_SIZE);
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
443
+
124
+ ret = qemu_strtol(str, &end, 10, &val);
444
+ ret = bdrv_pread(file,
125
+ if (ret == 0 && end && *end == '\0') {
445
+ const_header.volatile_header_offset * SECTOR_SIZE,
126
+ ret = val;
446
+ &volatile_header, sizeof(volatile_header));
127
+ }
447
+ if (ret < 0) {
448
+ bdrv_refresh_filename(file->bs);
449
+ error_setg_errno(errp, -ret,
450
+ "Could not read volatile header from file '%s'",
451
+ file->bs->filename);
452
+ return ret;
453
+ }
454
+
455
+ /* check volatile header */
456
+ ret = check_se_sparse_volatile_header(&volatile_header, errp);
457
+ if (ret < 0) {
458
+ return ret;
459
+ }
460
+
461
+ ret = vmdk_add_extent(bs, file, false,
462
+ const_header.capacity,
463
+ const_header.grain_dir_offset * SECTOR_SIZE,
464
+ 0,
465
+ const_header.grain_dir_size *
466
+ SECTOR_SIZE / sizeof(uint64_t),
467
+ const_header.grain_table_size *
468
+ SECTOR_SIZE / sizeof(uint64_t),
469
+ const_header.grain_size,
470
+ &extent,
471
+ errp);
472
+ if (ret < 0) {
473
+ return ret;
474
+ }
475
+
476
+ extent->sesparse = true;
477
+ extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
478
+ extent->sesparse_clusters_offset = const_header.grains_offset;
479
+ extent->entry_size = sizeof(uint64_t);
480
+
481
+ ret = vmdk_init_tables(bs, extent, errp);
482
+ if (ret) {
483
+ /* free extent allocated by vmdk_add_extent */
484
+ vmdk_free_last_extent(bs);
485
+ }
486
+
487
+ return ret;
128
+ return ret;
488
+}
129
+}
489
+
130
+#endif
490
static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
131
+
491
QDict *options, Error **errp);
132
+static int hdev_get_max_segments(int fd, struct stat *st)
492
133
+{
493
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
134
+#ifdef CONFIG_LINUX
494
* RW [size in sectors] SPARSE "file-name.vmdk"
135
int ret;
495
* RW [size in sectors] VMFS "file-name.vmdk"
136
- int sysfd = -1;
496
* RW [size in sectors] VMFSSPARSE "file-name.vmdk"
137
- long max_segments;
497
+ * RW [size in sectors] SESPARSE "file-name.vmdk"
138
498
*/
139
if (S_ISCHR(st->st_mode)) {
499
flat_offset = -1;
140
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
500
matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
141
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
501
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
502
503
if (sectors <= 0 ||
504
(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
505
- strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
506
+ strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
507
+ strcmp(type, "SESPARSE")) ||
508
(strcmp(access, "RW"))) {
509
continue;
510
}
142
}
511
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
143
return -ENOTSUP;
512
return ret;
144
}
513
}
145
-
514
extent = &s->extents[s->num_extents - 1];
146
- if (!S_ISBLK(st->st_mode)) {
515
+ } else if (!strcmp(type, "SESPARSE")) {
147
- return -ENOTSUP;
516
+ ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
148
- }
517
+ if (ret) {
149
-
518
+ bdrv_unref_child(bs, extent_file);
150
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
519
+ return ret;
151
- major(st->st_rdev), minor(st->st_rdev));
520
+ }
152
- sysfd = open(sysfspath, O_RDONLY);
521
+ extent = &s->extents[s->num_extents - 1];
153
- if (sysfd == -1) {
522
} else {
154
- ret = -errno;
523
error_setg(errp, "Unsupported extent type '%s'", type);
155
- goto out;
524
bdrv_unref_child(bs, extent_file);
156
- }
525
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
157
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
526
if (strcmp(ct, "monolithicFlat") &&
158
- if (ret < 0) {
527
strcmp(ct, "vmfs") &&
159
- ret = -errno;
528
strcmp(ct, "vmfsSparse") &&
160
- goto out;
529
+ strcmp(ct, "seSparse") &&
161
- } else if (ret == 0) {
530
strcmp(ct, "twoGbMaxExtentSparse") &&
162
- ret = -EIO;
531
strcmp(ct, "twoGbMaxExtentFlat")) {
163
- goto out;
532
error_setg(errp, "Unsupported image type '%s'", ct);
164
- }
533
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
165
- buf[ret] = 0;
166
- /* The file is ended with '\n', pass 'end' to accept that. */
167
- ret = qemu_strtol(buf, &end, 10, &max_segments);
168
- if (ret == 0 && end && *end == '\n') {
169
- ret = max_segments;
170
- }
171
-
172
-out:
173
- if (sysfd != -1) {
174
- close(sysfd);
175
- }
176
- g_free(sysfspath);
177
- return ret;
178
+ return get_sysfs_long_val(st, "max_segments");
179
#else
180
return -ENOTSUP;
181
#endif
182
}
183
184
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
185
+ Error **errp)
186
+{
187
+ BlockZoneModel zoned;
188
+ int ret;
189
+
190
+ bs->bl.zoned = BLK_Z_NONE;
191
+
192
+ ret = get_sysfs_zoned_model(st, &zoned);
193
+ if (ret < 0 || zoned == BLK_Z_NONE) {
194
+ return;
195
+ }
196
+ bs->bl.zoned = zoned;
197
+}
198
+
199
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
534
{
200
{
535
unsigned int l1_index, l2_offset, l2_index;
201
BDRVRawState *s = bs->opaque;
536
int min_index, i, j;
202
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
537
- uint32_t min_count, *l2_table;
203
bs->bl.max_hw_iov = ret;
538
+ uint32_t min_count;
539
+ void *l2_table;
540
bool zeroed = false;
541
int64_t ret;
542
int64_t cluster_sector;
543
+ unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
544
545
if (m_data) {
546
m_data->valid = 0;
547
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
548
if (l1_index >= extent->l1_size) {
549
return VMDK_ERROR;
550
}
551
- l2_offset = extent->l1_table[l1_index];
552
+ if (extent->sesparse) {
553
+ uint64_t l2_offset_u64;
554
+
555
+ assert(extent->entry_size == sizeof(uint64_t));
556
+
557
+ l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
558
+ if (l2_offset_u64 == 0) {
559
+ l2_offset = 0;
560
+ } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
561
+ /*
562
+ * Top most nibble is 0x1 if grain table is allocated.
563
+ * strict check - top most 4 bytes must be 0x10000000 since max
564
+ * supported size is 64TB for disk - so no more than 64TB / 16MB
565
+ * grain directories which is smaller than uint32,
566
+ * where 16MB is the only supported default grain table coverage.
567
+ */
568
+ return VMDK_ERROR;
569
+ } else {
570
+ l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
571
+ l2_offset_u64 = extent->sesparse_l2_tables_offset +
572
+ l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
573
+ if (l2_offset_u64 > 0x00000000ffffffff) {
574
+ return VMDK_ERROR;
575
+ }
576
+ l2_offset = (unsigned int)(l2_offset_u64);
577
+ }
578
+ } else {
579
+ assert(extent->entry_size == sizeof(uint32_t));
580
+ l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
581
+ }
582
if (!l2_offset) {
583
return VMDK_UNALLOC;
584
}
585
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
586
extent->l2_cache_counts[j] >>= 1;
587
}
588
}
589
- l2_table = extent->l2_cache + (i * extent->l2_size);
590
+ l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
591
goto found;
592
}
204
}
593
}
205
}
594
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
206
+
595
min_index = i;
207
+ raw_refresh_zoned_limits(bs, &st, errp);
596
}
208
}
597
}
209
598
- l2_table = extent->l2_cache + (min_index * extent->l2_size);
210
static int check_for_dasd(int fd)
599
+ l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
600
BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
601
if (bdrv_pread(extent->file,
602
(int64_t)l2_offset * 512,
603
l2_table,
604
- extent->l2_size * sizeof(uint32_t)
605
- ) != extent->l2_size * sizeof(uint32_t)) {
606
+ l2_size_bytes
607
+ ) != l2_size_bytes) {
608
return VMDK_ERROR;
609
}
610
611
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
612
extent->l2_cache_counts[min_index] = 1;
613
found:
614
l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
615
- cluster_sector = le32_to_cpu(l2_table[l2_index]);
616
617
- if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
618
- zeroed = true;
619
+ if (extent->sesparse) {
620
+ cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
621
+ switch (cluster_sector & 0xf000000000000000) {
622
+ case 0x0000000000000000:
623
+ /* unallocated grain */
624
+ if (cluster_sector != 0) {
625
+ return VMDK_ERROR;
626
+ }
627
+ break;
628
+ case 0x1000000000000000:
629
+ /* scsi-unmapped grain - fallthrough */
630
+ case 0x2000000000000000:
631
+ /* zero grain */
632
+ zeroed = true;
633
+ break;
634
+ case 0x3000000000000000:
635
+ /* allocated grain */
636
+ cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
637
+ ((cluster_sector & 0x0000ffffffffffff) << 12));
638
+ cluster_sector = extent->sesparse_clusters_offset +
639
+ cluster_sector * extent->cluster_sectors;
640
+ break;
641
+ default:
642
+ return VMDK_ERROR;
643
+ }
644
+ } else {
645
+ cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
646
+
647
+ if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
648
+ zeroed = true;
649
+ }
650
}
651
652
if (!cluster_sector || zeroed) {
653
if (!allocate) {
654
return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
655
}
656
+ assert(!extent->sesparse);
657
658
if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
659
return VMDK_ERROR;
660
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
661
m_data->l1_index = l1_index;
662
m_data->l2_index = l2_index;
663
m_data->l2_offset = l2_offset;
664
- m_data->l2_cache_entry = &l2_table[l2_index];
665
+ m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
666
}
667
}
668
*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
669
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
670
if (!extent) {
671
return -EIO;
672
}
673
+ if (extent->sesparse) {
674
+ return -ENOTSUP;
675
+ }
676
offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
677
n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
678
- offset_in_cluster);
679
--
211
--
680
2.21.0
212
2.40.1
681
213
682
214
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
512M of L1 entries is a very loose bound, only 32M are required to store
3
Add zoned device option to host_device BlockDriver. It will be presented only
4
the maximal supported VMDK file size of 2TB.
4
for zoned host block devices. By adding zone management operations to the
5
host_block_device BlockDriver, users can use the new block layer APIs
6
including Report Zone and four zone management operations
7
(open, close, finish, reset, reset_all).
5
8
6
Fixed qemu-iotest 59# - now failure occures before on impossible L1
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
7
table size.
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
11
zone_finish(zf).
8
12
9
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
13
For example, to test zone_report, use following command:
10
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
11
Reviewed-by: Liran Alon <liran.alon@oracle.com>
15
-c "zrp offset nr_zones"
12
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
16
13
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
14
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
15
Reviewed-by: Max Reitz <mreitz@redhat.com>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Signed-off-by: Max Reitz <mreitz@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
30
---
18
block/vmdk.c | 13 +++++++------
31
meson.build | 5 +
19
tests/qemu-iotests/059.out | 2 +-
32
include/block/block-io.h | 9 +
20
2 files changed, 8 insertions(+), 7 deletions(-)
33
include/block/block_int-common.h | 21 ++
34
include/block/raw-aio.h | 6 +-
35
include/sysemu/block-backend-io.h | 18 ++
36
block/block-backend.c | 137 +++++++++++++
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
38
block/io.c | 41 ++++
39
qemu-io-cmds.c | 149 ++++++++++++++
40
9 files changed, 696 insertions(+), 3 deletions(-)
21
41
22
diff --git a/block/vmdk.c b/block/vmdk.c
42
diff --git a/meson.build b/meson.build
23
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
24
--- a/block/vmdk.c
44
--- a/meson.build
25
+++ b/block/vmdk.c
45
+++ b/meson.build
26
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
46
@@ -XXX,XX +XXX,XX @@ if rdma.found()
27
error_setg(errp, "Invalid granularity, image may be corrupt");
47
endif
28
return -EFBIG;
48
49
# has_header_symbol
50
+config_host_data.set('CONFIG_BLKZONED',
51
+ cc.has_header_symbol('linux/blkzoned.h', 'BLKOPENZONE'))
52
config_host_data.set('CONFIG_EPOLL_CREATE1',
53
cc.has_header_symbol('sys/epoll.h', 'epoll_create1'))
54
config_host_data.set('CONFIG_FALLOCATE_PUNCH_HOLE',
55
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
56
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
57
cc.has_member('struct stat', 'st_atim',
58
prefix: '#include <sys/stat.h>'))
59
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
60
+ cc.has_member('struct blk_zone', 'capacity',
61
+ prefix: '#include <linux/blkzoned.h>'))
62
63
# has_type
64
config_host_data.set('CONFIG_IOVEC',
65
diff --git a/include/block/block-io.h b/include/block/block-io.h
66
index XXXXXXX..XXXXXXX 100644
67
--- a/include/block/block-io.h
68
+++ b/include/block/block-io.h
69
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
70
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
71
int64_t bytes);
72
73
+/* Report zone information of zone block device. */
74
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
75
+ int64_t offset,
76
+ unsigned int *nr_zones,
77
+ BlockZoneDescriptor *zones);
78
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
79
+ BlockZoneOp op,
80
+ int64_t offset, int64_t len);
81
+
82
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
83
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
84
int64_t bytes, int64_t *pnum, int64_t *map,
85
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
86
index XXXXXXX..XXXXXXX 100644
87
--- a/include/block/block_int-common.h
88
+++ b/include/block/block_int-common.h
89
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
90
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
91
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
92
93
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
94
+ int64_t offset, unsigned int *nr_zones,
95
+ BlockZoneDescriptor *zones);
96
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
97
+ int64_t offset, int64_t len);
98
+
99
/* removable device specific */
100
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
101
BlockDriverState *bs);
102
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
103
104
/* device zone model */
105
BlockZoneModel zoned;
106
+
107
+ /* zone size expressed in bytes */
108
+ uint32_t zone_size;
109
+
110
+ /* total number of zones */
111
+ uint32_t nr_zones;
112
+
113
+ /* maximum sectors of a zone append write operation */
114
+ uint32_t max_append_sectors;
115
+
116
+ /* maximum number of open zones */
117
+ uint32_t max_open_zones;
118
+
119
+ /* maximum number of active zones */
120
+ uint32_t max_active_zones;
121
} BlockLimits;
122
123
typedef struct BdrvOpBlocker BdrvOpBlocker;
124
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
125
index XXXXXXX..XXXXXXX 100644
126
--- a/include/block/raw-aio.h
127
+++ b/include/block/raw-aio.h
128
@@ -XXX,XX +XXX,XX @@
129
#define QEMU_AIO_WRITE_ZEROES 0x0020
130
#define QEMU_AIO_COPY_RANGE 0x0040
131
#define QEMU_AIO_TRUNCATE 0x0080
132
+#define QEMU_AIO_ZONE_REPORT 0x0100
133
+#define QEMU_AIO_ZONE_MGMT 0x0200
134
#define QEMU_AIO_TYPE_MASK \
135
(QEMU_AIO_READ | \
136
QEMU_AIO_WRITE | \
137
@@ -XXX,XX +XXX,XX @@
138
QEMU_AIO_DISCARD | \
139
QEMU_AIO_WRITE_ZEROES | \
140
QEMU_AIO_COPY_RANGE | \
141
- QEMU_AIO_TRUNCATE)
142
+ QEMU_AIO_TRUNCATE | \
143
+ QEMU_AIO_ZONE_REPORT | \
144
+ QEMU_AIO_ZONE_MGMT)
145
146
/* AIO flags */
147
#define QEMU_AIO_MISALIGNED 0x1000
148
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
149
index XXXXXXX..XXXXXXX 100644
150
--- a/include/sysemu/block-backend-io.h
151
+++ b/include/sysemu/block-backend-io.h
152
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
153
BlockCompletionFunc *cb, void *opaque);
154
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
155
BlockCompletionFunc *cb, void *opaque);
156
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
157
+ unsigned int *nr_zones,
158
+ BlockZoneDescriptor *zones,
159
+ BlockCompletionFunc *cb, void *opaque);
160
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
161
+ int64_t offset, int64_t len,
162
+ BlockCompletionFunc *cb, void *opaque);
163
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
164
BlockCompletionFunc *cb, void *opaque);
165
void blk_aio_cancel_async(BlockAIOCB *acb);
166
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
167
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
168
int64_t bytes, BdrvRequestFlags flags);
169
170
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
171
+ unsigned int *nr_zones,
172
+ BlockZoneDescriptor *zones);
173
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
174
+ unsigned int *nr_zones,
175
+ BlockZoneDescriptor *zones);
176
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
177
+ int64_t offset, int64_t len);
178
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
179
+ int64_t offset, int64_t len);
180
+
181
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
182
int64_t bytes);
183
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
184
diff --git a/block/block-backend.c b/block/block-backend.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/block-backend.c
187
+++ b/block/block-backend.c
188
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
189
return ret;
190
}
191
192
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
193
+{
194
+ BlkAioEmAIOCB *acb = opaque;
195
+ BlkRwCo *rwco = &acb->rwco;
196
+
197
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
198
+ (unsigned int*)(uintptr_t)acb->bytes,
199
+ rwco->iobuf);
200
+ blk_aio_complete(acb);
201
+}
202
+
203
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
204
+ unsigned int *nr_zones,
205
+ BlockZoneDescriptor *zones,
206
+ BlockCompletionFunc *cb, void *opaque)
207
+{
208
+ BlkAioEmAIOCB *acb;
209
+ Coroutine *co;
210
+ IO_CODE();
211
+
212
+ blk_inc_in_flight(blk);
213
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
214
+ acb->rwco = (BlkRwCo) {
215
+ .blk = blk,
216
+ .offset = offset,
217
+ .iobuf = zones,
218
+ .ret = NOT_DONE,
219
+ };
220
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
221
+ acb->has_returned = false;
222
+
223
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
224
+ aio_co_enter(blk_get_aio_context(blk), co);
225
+
226
+ acb->has_returned = true;
227
+ if (acb->rwco.ret != NOT_DONE) {
228
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
229
+ blk_aio_complete_bh, acb);
230
+ }
231
+
232
+ return &acb->common;
233
+}
234
+
235
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
236
+{
237
+ BlkAioEmAIOCB *acb = opaque;
238
+ BlkRwCo *rwco = &acb->rwco;
239
+
240
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
241
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
242
+ rwco->offset, acb->bytes);
243
+ blk_aio_complete(acb);
244
+}
245
+
246
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
247
+ int64_t offset, int64_t len,
248
+ BlockCompletionFunc *cb, void *opaque) {
249
+ BlkAioEmAIOCB *acb;
250
+ Coroutine *co;
251
+ IO_CODE();
252
+
253
+ blk_inc_in_flight(blk);
254
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
255
+ acb->rwco = (BlkRwCo) {
256
+ .blk = blk,
257
+ .offset = offset,
258
+ .iobuf = (void *)(uintptr_t)op,
259
+ .ret = NOT_DONE,
260
+ };
261
+ acb->bytes = len;
262
+ acb->has_returned = false;
263
+
264
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
265
+ aio_co_enter(blk_get_aio_context(blk), co);
266
+
267
+ acb->has_returned = true;
268
+ if (acb->rwco.ret != NOT_DONE) {
269
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
270
+ blk_aio_complete_bh, acb);
271
+ }
272
+
273
+ return &acb->common;
274
+}
275
+
276
+/*
277
+ * Send a zone_report command.
278
+ * offset is a byte offset from the start of the device. No alignment
279
+ * required for offset.
280
+ * nr_zones represents IN maximum and OUT actual.
281
+ */
282
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
283
+ unsigned int *nr_zones,
284
+ BlockZoneDescriptor *zones)
285
+{
286
+ int ret;
287
+ IO_CODE();
288
+
289
+ blk_inc_in_flight(blk); /* increase before waiting */
290
+ blk_wait_while_drained(blk);
291
+ GRAPH_RDLOCK_GUARD();
292
+ if (!blk_is_available(blk)) {
293
+ blk_dec_in_flight(blk);
294
+ return -ENOMEDIUM;
295
+ }
296
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
297
+ blk_dec_in_flight(blk);
298
+ return ret;
299
+}
300
+
301
+/*
302
+ * Send a zone_management command.
303
+ * op is the zone operation;
304
+ * offset is the byte offset from the start of the zoned device;
305
+ * len is the maximum number of bytes the command should operate on. It
306
+ * should be aligned with the device zone size.
307
+ */
308
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
309
+ int64_t offset, int64_t len)
310
+{
311
+ int ret;
312
+ IO_CODE();
313
+
314
+ blk_inc_in_flight(blk);
315
+ blk_wait_while_drained(blk);
316
+ GRAPH_RDLOCK_GUARD();
317
+
318
+ ret = blk_check_byte_request(blk, offset, len);
319
+ if (ret < 0) {
320
+ blk_dec_in_flight(blk);
321
+ return ret;
322
+ }
323
+
324
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
325
+ blk_dec_in_flight(blk);
326
+ return ret;
327
+}
328
+
329
void blk_drain(BlockBackend *blk)
330
{
331
BlockDriverState *bs = blk_bs(blk);
332
diff --git a/block/file-posix.c b/block/file-posix.c
333
index XXXXXXX..XXXXXXX 100644
334
--- a/block/file-posix.c
335
+++ b/block/file-posix.c
336
@@ -XXX,XX +XXX,XX @@
337
#include <sys/param.h>
338
#include <sys/syscall.h>
339
#include <sys/vfs.h>
340
+#if defined(CONFIG_BLKZONED)
341
+#include <linux/blkzoned.h>
342
+#endif
343
#include <linux/cdrom.h>
344
#include <linux/fd.h>
345
#include <linux/fs.h>
346
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
347
PreallocMode prealloc;
348
Error **errp;
349
} truncate;
350
+ struct {
351
+ unsigned int *nr_zones;
352
+ BlockZoneDescriptor *zones;
353
+ } zone_report;
354
+ struct {
355
+ unsigned long op;
356
+ } zone_mgmt;
357
};
358
} RawPosixAIOData;
359
360
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
361
}
362
#endif
363
364
+#if defined(CONFIG_BLKZONED)
365
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
366
{
367
g_autofree char *val = NULL;
368
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
29
}
369
}
30
- if (l1_size > 512 * 1024 * 1024) {
370
return 0;
31
+ if (l1_size > 32 * 1024 * 1024) {
371
}
32
/*
372
+#endif /* defined(CONFIG_BLKZONED) */
33
* Although with big capacity and small l1_entry_sectors, we can get a
373
34
* big l1_size, we don't want unbounded value to allocate the table.
374
/*
35
- * Limit it to 512M, which is:
375
* Get a sysfs attribute value as a long integer.
36
- * 16PB - for default "Hosted Sparse Extent" (VMDK4)
376
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
37
- * cluster size: 64KB, L2 table size: 512 entries
377
#endif
38
- * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
378
}
39
- * cluster size: 512B, L2 table size: 4096 entries
379
40
+ * Limit it to 32M, which is enough to store:
380
+#if defined(CONFIG_BLKZONED)
41
+ * 8TB - for both VMDK3 & VMDK4 with
381
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
42
+ * minimal cluster size: 512B
382
Error **errp)
43
+ * minimal L2 table size: 512 entries
383
{
44
+ * 8 TB is still more than the maximal value supported for
384
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
45
+ * VMDK3 & VMDK4 which is 2TB.
385
return;
46
*/
386
}
47
error_setg(errp, "L1 size too big");
387
bs->bl.zoned = zoned;
48
return -EFBIG;
388
+
49
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
389
+ ret = get_sysfs_long_val(st, "max_open_zones");
390
+ if (ret >= 0) {
391
+ bs->bl.max_open_zones = ret;
392
+ }
393
+
394
+ ret = get_sysfs_long_val(st, "max_active_zones");
395
+ if (ret >= 0) {
396
+ bs->bl.max_active_zones = ret;
397
+ }
398
+
399
+ /*
400
+ * The zoned device must at least have zone size and nr_zones fields.
401
+ */
402
+ ret = get_sysfs_long_val(st, "chunk_sectors");
403
+ if (ret < 0) {
404
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
405
+ "sysfs attribute");
406
+ return;
407
+ } else if (!ret) {
408
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
409
+ return;
410
+ }
411
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
412
+
413
+ ret = get_sysfs_long_val(st, "nr_zones");
414
+ if (ret < 0) {
415
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
416
+ "sysfs attribute");
417
+ return;
418
+ } else if (!ret) {
419
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
420
+ return;
421
+ }
422
+ bs->bl.nr_zones = ret;
423
+
424
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
425
+ if (ret > 0) {
426
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
427
+ }
428
}
429
+#else /* !defined(CONFIG_BLKZONED) */
430
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
431
+ Error **errp)
432
+{
433
+ bs->bl.zoned = BLK_Z_NONE;
434
+}
435
+#endif /* !defined(CONFIG_BLKZONED) */
436
437
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
438
{
439
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
440
BDRVRawState *s = bs->opaque;
441
int ret;
442
443
- /* If DASD, get blocksizes */
444
+ /* If DASD or zoned devices, get blocksizes */
445
if (check_for_dasd(s->fd) < 0) {
446
- return -ENOTSUP;
447
+ /* zoned devices are not DASD */
448
+ if (bs->bl.zoned == BLK_Z_NONE) {
449
+ return -ENOTSUP;
450
+ }
451
}
452
ret = probe_logical_blocksize(s->fd, &bsz->log);
453
if (ret < 0) {
454
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
455
}
456
#endif
457
458
+/*
459
+ * parse_zone - Fill a zone descriptor
460
+ */
461
+#if defined(CONFIG_BLKZONED)
462
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
463
+ const struct blk_zone *blkz) {
464
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
465
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
466
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
467
+
468
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
469
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
470
+#else
471
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
472
+#endif
473
+
474
+ switch (blkz->type) {
475
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
476
+ zone->type = BLK_ZT_SWR;
477
+ break;
478
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
479
+ zone->type = BLK_ZT_SWP;
480
+ break;
481
+ case BLK_ZONE_TYPE_CONVENTIONAL:
482
+ zone->type = BLK_ZT_CONV;
483
+ break;
484
+ default:
485
+ error_report("Unsupported zone type: 0x%x", blkz->type);
486
+ return -ENOTSUP;
487
+ }
488
+
489
+ switch (blkz->cond) {
490
+ case BLK_ZONE_COND_NOT_WP:
491
+ zone->state = BLK_ZS_NOT_WP;
492
+ break;
493
+ case BLK_ZONE_COND_EMPTY:
494
+ zone->state = BLK_ZS_EMPTY;
495
+ break;
496
+ case BLK_ZONE_COND_IMP_OPEN:
497
+ zone->state = BLK_ZS_IOPEN;
498
+ break;
499
+ case BLK_ZONE_COND_EXP_OPEN:
500
+ zone->state = BLK_ZS_EOPEN;
501
+ break;
502
+ case BLK_ZONE_COND_CLOSED:
503
+ zone->state = BLK_ZS_CLOSED;
504
+ break;
505
+ case BLK_ZONE_COND_READONLY:
506
+ zone->state = BLK_ZS_RDONLY;
507
+ break;
508
+ case BLK_ZONE_COND_FULL:
509
+ zone->state = BLK_ZS_FULL;
510
+ break;
511
+ case BLK_ZONE_COND_OFFLINE:
512
+ zone->state = BLK_ZS_OFFLINE;
513
+ break;
514
+ default:
515
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
516
+ return -ENOTSUP;
517
+ }
518
+ return 0;
519
+}
520
+#endif
521
+
522
+#if defined(CONFIG_BLKZONED)
523
+static int handle_aiocb_zone_report(void *opaque)
524
+{
525
+ RawPosixAIOData *aiocb = opaque;
526
+ int fd = aiocb->aio_fildes;
527
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
528
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
529
+ /* zoned block devices use 512-byte sectors */
530
+ uint64_t sector = aiocb->aio_offset / 512;
531
+
532
+ struct blk_zone *blkz;
533
+ size_t rep_size;
534
+ unsigned int nrz;
535
+ int ret;
536
+ unsigned int n = 0, i = 0;
537
+
538
+ nrz = *nr_zones;
539
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
540
+ g_autofree struct blk_zone_report *rep = NULL;
541
+ rep = g_malloc(rep_size);
542
+
543
+ blkz = (struct blk_zone *)(rep + 1);
544
+ while (n < nrz) {
545
+ memset(rep, 0, rep_size);
546
+ rep->sector = sector;
547
+ rep->nr_zones = nrz - n;
548
+
549
+ do {
550
+ ret = ioctl(fd, BLKREPORTZONE, rep);
551
+ } while (ret != 0 && errno == EINTR);
552
+ if (ret != 0) {
553
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
554
+ fd, sector, errno);
555
+ return -errno;
556
+ }
557
+
558
+ if (!rep->nr_zones) {
559
+ break;
560
+ }
561
+
562
+ for (i = 0; i < rep->nr_zones; i++, n++) {
563
+ ret = parse_zone(&zones[n], &blkz[i]);
564
+ if (ret != 0) {
565
+ return ret;
566
+ }
567
+
568
+ /* The next report should start after the last zone reported */
569
+ sector = blkz[i].start + blkz[i].len;
570
+ }
571
+ }
572
+
573
+ *nr_zones = n;
574
+ return 0;
575
+}
576
+#endif
577
+
578
+#if defined(CONFIG_BLKZONED)
579
+static int handle_aiocb_zone_mgmt(void *opaque)
580
+{
581
+ RawPosixAIOData *aiocb = opaque;
582
+ int fd = aiocb->aio_fildes;
583
+ uint64_t sector = aiocb->aio_offset / 512;
584
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
585
+ struct blk_zone_range range;
586
+ int ret;
587
+
588
+ /* Execute the operation */
589
+ range.sector = sector;
590
+ range.nr_sectors = nr_sectors;
591
+ do {
592
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
593
+ } while (ret != 0 && errno == EINTR);
594
+
595
+ return ret;
596
+}
597
+#endif
598
+
599
static int handle_aiocb_copy_range(void *opaque)
600
{
601
RawPosixAIOData *aiocb = opaque;
602
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
603
}
604
}
605
606
+/*
607
+ * zone report - Get a zone block device's information in the form
608
+ * of an array of zone descriptors.
609
+ * zones is an array of zone descriptors to hold zone information on reply;
610
+ * offset can be any byte within the entire size of the device;
611
+ * nr_zones is the maxium number of sectors the command should operate on.
612
+ */
613
+#if defined(CONFIG_BLKZONED)
614
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
615
+ unsigned int *nr_zones,
616
+ BlockZoneDescriptor *zones) {
617
+ BDRVRawState *s = bs->opaque;
618
+ RawPosixAIOData acb = (RawPosixAIOData) {
619
+ .bs = bs,
620
+ .aio_fildes = s->fd,
621
+ .aio_type = QEMU_AIO_ZONE_REPORT,
622
+ .aio_offset = offset,
623
+ .zone_report = {
624
+ .nr_zones = nr_zones,
625
+ .zones = zones,
626
+ },
627
+ };
628
+
629
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
630
+}
631
+#endif
632
+
633
+/*
634
+ * zone management operations - Execute an operation on a zone
635
+ */
636
+#if defined(CONFIG_BLKZONED)
637
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
638
+ int64_t offset, int64_t len) {
639
+ BDRVRawState *s = bs->opaque;
640
+ RawPosixAIOData acb;
641
+ int64_t zone_size, zone_size_mask;
642
+ const char *op_name;
643
+ unsigned long zo;
644
+ int ret;
645
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
646
+
647
+ zone_size = bs->bl.zone_size;
648
+ zone_size_mask = zone_size - 1;
649
+ if (offset & zone_size_mask) {
650
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
651
+ "%" PRId64 "", offset / 512, zone_size / 512);
652
+ return -EINVAL;
653
+ }
654
+
655
+ if (((offset + len) < capacity && len & zone_size_mask) ||
656
+ offset + len > capacity) {
657
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
658
+ " %" PRId64 "", len / 512, zone_size / 512);
659
+ return -EINVAL;
660
+ }
661
+
662
+ switch (op) {
663
+ case BLK_ZO_OPEN:
664
+ op_name = "BLKOPENZONE";
665
+ zo = BLKOPENZONE;
666
+ break;
667
+ case BLK_ZO_CLOSE:
668
+ op_name = "BLKCLOSEZONE";
669
+ zo = BLKCLOSEZONE;
670
+ break;
671
+ case BLK_ZO_FINISH:
672
+ op_name = "BLKFINISHZONE";
673
+ zo = BLKFINISHZONE;
674
+ break;
675
+ case BLK_ZO_RESET:
676
+ op_name = "BLKRESETZONE";
677
+ zo = BLKRESETZONE;
678
+ break;
679
+ default:
680
+ error_report("Unsupported zone op: 0x%x", op);
681
+ return -ENOTSUP;
682
+ }
683
+
684
+ acb = (RawPosixAIOData) {
685
+ .bs = bs,
686
+ .aio_fildes = s->fd,
687
+ .aio_type = QEMU_AIO_ZONE_MGMT,
688
+ .aio_offset = offset,
689
+ .aio_nbytes = len,
690
+ .zone_mgmt = {
691
+ .op = zo,
692
+ },
693
+ };
694
+
695
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
696
+ if (ret != 0) {
697
+ error_report("ioctl %s failed %d", op_name, ret);
698
+ }
699
+
700
+ return ret;
701
+}
702
+#endif
703
+
704
static coroutine_fn int
705
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
706
bool blkdev)
707
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
708
#ifdef __linux__
709
.bdrv_co_ioctl = hdev_co_ioctl,
710
#endif
711
+
712
+ /* zoned device */
713
+#if defined(CONFIG_BLKZONED)
714
+ /* zone management operations */
715
+ .bdrv_co_zone_report = raw_co_zone_report,
716
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
717
+#endif
718
};
719
720
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
721
diff --git a/block/io.c b/block/io.c
50
index XXXXXXX..XXXXXXX 100644
722
index XXXXXXX..XXXXXXX 100644
51
--- a/tests/qemu-iotests/059.out
723
--- a/block/io.c
52
+++ b/tests/qemu-iotests/059.out
724
+++ b/block/io.c
53
@@ -XXX,XX +XXX,XX @@ Offset Length Mapped to File
725
@@ -XXX,XX +XXX,XX @@ out:
54
0x140000000 0x10000 0x50000 TEST_DIR/t-s003.vmdk
726
return co.ret;
55
727
}
56
=== Testing afl image with a very large capacity ===
728
57
-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
729
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
58
+qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
730
+ unsigned int *nr_zones,
59
*** done
731
+ BlockZoneDescriptor *zones)
732
+{
733
+ BlockDriver *drv = bs->drv;
734
+ CoroutineIOCompletion co = {
735
+ .coroutine = qemu_coroutine_self(),
736
+ };
737
+ IO_CODE();
738
+
739
+ bdrv_inc_in_flight(bs);
740
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
741
+ co.ret = -ENOTSUP;
742
+ goto out;
743
+ }
744
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
745
+out:
746
+ bdrv_dec_in_flight(bs);
747
+ return co.ret;
748
+}
749
+
750
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
751
+ int64_t offset, int64_t len)
752
+{
753
+ BlockDriver *drv = bs->drv;
754
+ CoroutineIOCompletion co = {
755
+ .coroutine = qemu_coroutine_self(),
756
+ };
757
+ IO_CODE();
758
+
759
+ bdrv_inc_in_flight(bs);
760
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
761
+ co.ret = -ENOTSUP;
762
+ goto out;
763
+ }
764
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
765
+out:
766
+ bdrv_dec_in_flight(bs);
767
+ return co.ret;
768
+}
769
+
770
void *qemu_blockalign(BlockDriverState *bs, size_t size)
771
{
772
IO_CODE();
773
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
774
index XXXXXXX..XXXXXXX 100644
775
--- a/qemu-io-cmds.c
776
+++ b/qemu-io-cmds.c
777
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
778
.oneline = "flush all in-core file state to disk",
779
};
780
781
+static inline int64_t tosector(int64_t bytes)
782
+{
783
+ return bytes >> BDRV_SECTOR_BITS;
784
+}
785
+
786
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
787
+{
788
+ int ret;
789
+ int64_t offset;
790
+ unsigned int nr_zones;
791
+
792
+ ++optind;
793
+ offset = cvtnum(argv[optind]);
794
+ ++optind;
795
+ nr_zones = cvtnum(argv[optind]);
796
+
797
+ g_autofree BlockZoneDescriptor *zones = NULL;
798
+ zones = g_new(BlockZoneDescriptor, nr_zones);
799
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
800
+ if (ret < 0) {
801
+ printf("zone report failed: %s\n", strerror(-ret));
802
+ } else {
803
+ for (int i = 0; i < nr_zones; ++i) {
804
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
805
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
806
+ "zcond:%u, [type: %u]\n",
807
+ tosector(zones[i].start), tosector(zones[i].length),
808
+ tosector(zones[i].cap), tosector(zones[i].wp),
809
+ zones[i].state, zones[i].type);
810
+ }
811
+ }
812
+ return ret;
813
+}
814
+
815
+static const cmdinfo_t zone_report_cmd = {
816
+ .name = "zone_report",
817
+ .altname = "zrp",
818
+ .cfunc = zone_report_f,
819
+ .argmin = 2,
820
+ .argmax = 2,
821
+ .args = "offset number",
822
+ .oneline = "report zone information",
823
+};
824
+
825
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
826
+{
827
+ int ret;
828
+ int64_t offset, len;
829
+ ++optind;
830
+ offset = cvtnum(argv[optind]);
831
+ ++optind;
832
+ len = cvtnum(argv[optind]);
833
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
834
+ if (ret < 0) {
835
+ printf("zone open failed: %s\n", strerror(-ret));
836
+ }
837
+ return ret;
838
+}
839
+
840
+static const cmdinfo_t zone_open_cmd = {
841
+ .name = "zone_open",
842
+ .altname = "zo",
843
+ .cfunc = zone_open_f,
844
+ .argmin = 2,
845
+ .argmax = 2,
846
+ .args = "offset len",
847
+ .oneline = "explicit open a range of zones in zone block device",
848
+};
849
+
850
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
851
+{
852
+ int ret;
853
+ int64_t offset, len;
854
+ ++optind;
855
+ offset = cvtnum(argv[optind]);
856
+ ++optind;
857
+ len = cvtnum(argv[optind]);
858
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
859
+ if (ret < 0) {
860
+ printf("zone close failed: %s\n", strerror(-ret));
861
+ }
862
+ return ret;
863
+}
864
+
865
+static const cmdinfo_t zone_close_cmd = {
866
+ .name = "zone_close",
867
+ .altname = "zc",
868
+ .cfunc = zone_close_f,
869
+ .argmin = 2,
870
+ .argmax = 2,
871
+ .args = "offset len",
872
+ .oneline = "close a range of zones in zone block device",
873
+};
874
+
875
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
876
+{
877
+ int ret;
878
+ int64_t offset, len;
879
+ ++optind;
880
+ offset = cvtnum(argv[optind]);
881
+ ++optind;
882
+ len = cvtnum(argv[optind]);
883
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
884
+ if (ret < 0) {
885
+ printf("zone finish failed: %s\n", strerror(-ret));
886
+ }
887
+ return ret;
888
+}
889
+
890
+static const cmdinfo_t zone_finish_cmd = {
891
+ .name = "zone_finish",
892
+ .altname = "zf",
893
+ .cfunc = zone_finish_f,
894
+ .argmin = 2,
895
+ .argmax = 2,
896
+ .args = "offset len",
897
+ .oneline = "finish a range of zones in zone block device",
898
+};
899
+
900
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
901
+{
902
+ int ret;
903
+ int64_t offset, len;
904
+ ++optind;
905
+ offset = cvtnum(argv[optind]);
906
+ ++optind;
907
+ len = cvtnum(argv[optind]);
908
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
909
+ if (ret < 0) {
910
+ printf("zone reset failed: %s\n", strerror(-ret));
911
+ }
912
+ return ret;
913
+}
914
+
915
+static const cmdinfo_t zone_reset_cmd = {
916
+ .name = "zone_reset",
917
+ .altname = "zrs",
918
+ .cfunc = zone_reset_f,
919
+ .argmin = 2,
920
+ .argmax = 2,
921
+ .args = "offset len",
922
+ .oneline = "reset a zone write pointer in zone block device",
923
+};
924
+
925
static int truncate_f(BlockBackend *blk, int argc, char **argv);
926
static const cmdinfo_t truncate_cmd = {
927
.name = "truncate",
928
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
929
qemuio_add_command(&aio_write_cmd);
930
qemuio_add_command(&aio_flush_cmd);
931
qemuio_add_command(&flush_cmd);
932
+ qemuio_add_command(&zone_report_cmd);
933
+ qemuio_add_command(&zone_open_cmd);
934
+ qemuio_add_command(&zone_close_cmd);
935
+ qemuio_add_command(&zone_finish_cmd);
936
+ qemuio_add_command(&zone_reset_cmd);
937
qemuio_add_command(&truncate_cmd);
938
qemuio_add_command(&length_cmd);
939
qemuio_add_command(&info_cmd);
60
--
940
--
61
2.21.0
941
2.40.1
62
942
63
943
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
3
raw-format driver usually sits on top of file-posix driver. It needs to
4
extended the l1_size check from VMDK4 to VMDK3 but did not update the
4
pass through requests of zone commands.
5
default coverage in the moved comment.
6
5
7
The previous vmdk4 calculation:
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230508045533.175575-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
block/raw-format.c | 17 +++++++++++++++++
21
1 file changed, 17 insertions(+)
8
22
9
(512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
23
diff --git a/block/raw-format.c b/block/raw-format.c
10
11
The added vmdk3 calculation:
12
13
(512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
14
15
Adding the calculation of vmdk3 to the comment.
16
17
In any case, VMware does not offer virtual disks more than 2TB for
18
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
19
not implemented yet in qemu.
20
21
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
22
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
23
Reviewed-by: Liran Alon <liran.alon@oracle.com>
24
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
25
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
26
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
27
Reviewed-by: yuchenlin <yuchenlin@synology.com>
28
Reviewed-by: Max Reitz <mreitz@redhat.com>
29
Signed-off-by: Max Reitz <mreitz@redhat.com>
30
---
31
block/vmdk.c | 11 ++++++++---
32
1 file changed, 8 insertions(+), 3 deletions(-)
33
34
diff --git a/block/vmdk.c b/block/vmdk.c
35
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
36
--- a/block/vmdk.c
25
--- a/block/raw-format.c
37
+++ b/block/vmdk.c
26
+++ b/block/raw-format.c
38
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
39
return -EFBIG;
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
40
}
29
}
41
if (l1_size > 512 * 1024 * 1024) {
30
42
- /* Although with big capacity and small l1_entry_sectors, we can get a
31
+static int coroutine_fn GRAPH_RDLOCK
43
+ /*
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
44
+ * Although with big capacity and small l1_entry_sectors, we can get a
33
+ unsigned int *nr_zones,
45
* big l1_size, we don't want unbounded value to allocate the table.
34
+ BlockZoneDescriptor *zones)
46
- * Limit it to 512M, which is 16PB for default cluster and L2 table
35
+{
47
- * size */
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
48
+ * Limit it to 512M, which is:
37
+}
49
+ * 16PB - for default "Hosted Sparse Extent" (VMDK4)
38
+
50
+ * cluster size: 64KB, L2 table size: 512 entries
39
+static int coroutine_fn GRAPH_RDLOCK
51
+ * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
52
+ * cluster size: 512B, L2 table size: 4096 entries
41
+ int64_t offset, int64_t len)
53
+ */
42
+{
54
error_setg(errp, "L1 size too big");
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
55
return -EFBIG;
44
+}
56
}
45
+
46
static int64_t coroutine_fn GRAPH_RDLOCK
47
raw_co_getlength(BlockDriverState *bs)
48
{
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
50
.bdrv_co_pwritev = &raw_co_pwritev,
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
55
.bdrv_co_block_status = &raw_co_block_status,
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
57
--
58
--
58
2.21.0
59
2.40.1
59
60
60
61
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
We forget to enable it for transaction .prepare, while it is already
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
enabled in do_drive_backup since commit a2d665c1bc362
4
allowed.
5
"blockdev: loosen restrictions on drive-backup source node"
6
5
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: John Snow <jsnow@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org> and clarify that the check is about zoned
16
BlockDrivers.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
19
---
12
blockdev.c | 2 +-
20
include/block/block_int-common.h | 5 +++++
13
1 file changed, 1 insertion(+), 1 deletion(-)
21
block.c | 19 +++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
23
block/raw-format.c | 1 +
24
4 files changed, 37 insertions(+)
14
25
15
diff --git a/blockdev.c b/blockdev.c
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
16
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
17
--- a/blockdev.c
28
--- a/include/block/block_int-common.h
18
+++ b/blockdev.c
29
+++ b/include/block/block_int-common.h
19
@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
20
assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
31
*/
21
backup = common->action->u.drive_backup.data;
32
bool is_format;
22
33
23
- bs = qmp_get_root_bs(backup->device, errp);
34
+ /*
24
+ bs = bdrv_lookup_bs(backup->device, backup->device, errp);
35
+ * Set to true if the BlockDriver supports zoned children.
25
if (!bs) {
36
+ */
37
+ bool supports_zoned_children;
38
+
39
/*
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
41
* this field set to true, except ones that are defined only by their
42
diff --git a/block.c b/block.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block.c
45
+++ b/block.c
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
26
return;
47
return;
27
}
48
}
49
50
+ /*
51
+ * Non-zoned block drivers do not follow zoned storage constraints
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
53
+ * drivers in a graph.
54
+ */
55
+ if (!parent_bs->drv->supports_zoned_children &&
56
+ child_bs->bl.zoned == BLK_Z_HM) {
57
+ /*
58
+ * The host-aware model allows zoned storage constraints and random
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
60
+ * host-aware device as a regular device.
61
+ */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
64
+ parent_bs->drv->supports_zoned_children ?
65
+ "support zoned children" : "not support zoned children");
66
+ return;
67
+ }
68
+
69
if (!QLIST_EMPTY(&child_bs->parents)) {
70
error_setg(errp, "The node %s already has a parent",
71
child_bs->node_name);
72
diff --git a/block/file-posix.c b/block/file-posix.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/block/file-posix.c
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
90
+ }
91
+#endif
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/block/raw-format.c
98
+++ b/block/raw-format.c
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
100
BlockDriver bdrv_raw = {
101
.format_name = "raw",
102
.instance_size = sizeof(BDRVRawState),
103
+ .supports_zoned_children = true,
104
.bdrv_probe = &raw_probe,
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
106
.bdrv_reopen_commit = &raw_reopen_commit,
28
--
107
--
29
2.21.0
108
2.40.1
30
109
31
110
diff view generated by jsdifflib
1
From: Klaus Birkelund Jensen <klaus@birkelund.eu>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
The device mistakenly reports that the Weighted Round Robin with Urgent
3
The new block layer APIs of zoned block devices can be tested by:
4
Priority Class arbitration mechanism is supported.
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
5
7
6
It is not.
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-7-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org>.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
21
2 files changed, 142 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
7
24
8
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
9
Message-id: 20190606092530.14206-1-klaus@birkelund.eu
26
new file mode 100755
10
Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
27
index XXXXXXX..XXXXXXX
11
Signed-off-by: Max Reitz <mreitz@redhat.com>
28
--- /dev/null
12
---
29
+++ b/tests/qemu-iotests/tests/zoned
13
hw/block/nvme.c | 1 -
30
@@ -XXX,XX +XXX,XX @@
14
1 file changed, 1 deletion(-)
31
+#!/usr/bin/env bash
15
32
+#
16
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
33
+# Test zone management operations.
17
index XXXXXXX..XXXXXXX 100644
34
+#
18
--- a/hw/block/nvme.c
35
+
19
+++ b/hw/block/nvme.c
36
+seq="$(basename $0)"
20
@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
37
+echo "QA output created by $seq"
21
n->bar.cap = 0;
38
+status=1 # failure is the default!
22
NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
39
+
23
NVME_CAP_SET_CQR(n->bar.cap, 1);
40
+_cleanup()
24
- NVME_CAP_SET_AMS(n->bar.cap, 1);
41
+{
25
NVME_CAP_SET_TO(n->bar.cap, 0xf);
42
+ _cleanup_test_img
26
NVME_CAP_SET_CSS(n->bar.cap, 1);
43
+ sudo -n rmmod null_blk
27
NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
46
+
47
+# get standard environment, filters and checks
48
+. ../common.rc
49
+. ../common.filter
50
+. ../common.qemu
51
+
52
+# This test only runs on Linux hosts with raw image files.
53
+_supported_fmt raw
54
+_supported_proto file
55
+_supported_os Linux
56
+
57
+sudo -n true || \
58
+ _notrun 'Password-less sudo required'
59
+
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
62
+
63
+echo "Testing a null_blk device:"
64
+echo "case 1: if the operations work"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
66
+sudo -n chmod 0666 /dev/nullb0
67
+
68
+echo "(1) report the first zone:"
69
+$QEMU_IO $IMG -c "zrp 0 1"
70
+echo
71
+echo "report the first 10 zones"
72
+$QEMU_IO $IMG -c "zrp 0 10"
73
+echo
74
+echo "report the last zone:"
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
76
+echo
77
+echo
78
+echo "(2) opening the first zone"
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
80
+echo "report after:"
81
+$QEMU_IO $IMG -c "zrp 0 1"
82
+echo
83
+echo "opening the second zone"
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
85
+echo "report after:"
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
87
+echo
88
+echo "opening the last zone"
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
90
+echo "report after:"
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
92
+echo
93
+echo
94
+echo "(3) closing the first zone"
95
+$QEMU_IO $IMG -c "zc 0 268435456"
96
+echo "report after:"
97
+$QEMU_IO $IMG -c "zrp 0 1"
98
+echo
99
+echo "closing the last zone"
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
101
+echo "report after:"
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
103
+echo
104
+echo
105
+echo "(4) finishing the second zone"
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
107
+echo "After finishing a zone:"
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
109
+echo
110
+echo
111
+echo "(5) resetting the second zone"
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
113
+echo "After resetting a zone:"
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
115
+
116
+# success, all done
117
+echo "*** done"
118
+rm -f $seq.full
119
+status=0
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
121
new file mode 100644
122
index XXXXXXX..XXXXXXX
123
--- /dev/null
124
+++ b/tests/qemu-iotests/tests/zoned.out
125
@@ -XXX,XX +XXX,XX @@
126
+QA output created by zoned
127
+Testing a null_blk device:
128
+case 1: if the operations work
129
+(1) report the first zone:
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
131
+
132
+report the first 10 zones
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
143
+
144
+report the last zone:
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
146
+
147
+
148
+(2) opening the first zone
149
+report after:
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
151
+
152
+opening the second zone
153
+report after:
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
155
+
156
+opening the last zone
157
+report after:
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
159
+
160
+
161
+(3) closing the first zone
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
164
+
165
+closing the last zone
166
+report after:
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
168
+
169
+
170
+(4) finishing the second zone
171
+After finishing a zone:
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
173
+
174
+
175
+(5) resetting the second zone
176
+After resetting a zone:
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
178
+*** done
28
--
179
--
29
2.21.0
180
2.40.1
30
181
31
182
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20230508045533.175575-8-faithilikerun@gmail.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/file-posix.c | 3 +++
13
block/trace-events | 2 ++
14
2 files changed, 5 insertions(+)
15
16
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/file-posix.c
19
+++ b/block/file-posix.c
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
21
},
22
};
23
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
26
}
27
#endif
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
29
},
30
};
31
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
33
+ len >> BDRV_SECTOR_BITS);
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
35
if (ret != 0) {
36
error_report("ioctl %s failed %d", op_name, ret);
37
diff --git a/block/trace-events b/block/trace-events
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/trace-events
40
+++ b/block/trace-events
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
44
file_flush_fdatasync_failed(int err) "errno %d"
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
47
48
# ssh.c
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
50
--
51
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-9-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
15
any toctree" error and fix pre-formatted code syntax.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
docs/devel/index-api.rst | 1 +
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
22
3 files changed, 50 insertions(+)
23
create mode 100644 docs/devel/zoned-storage.rst
24
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
26
index XXXXXXX..XXXXXXX 100644
27
--- a/docs/devel/index-api.rst
28
+++ b/docs/devel/index-api.rst
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
30
memory
31
modules
32
ui
33
+ zoned-storage
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
35
new file mode 100644
36
index XXXXXXX..XXXXXXX
37
--- /dev/null
38
+++ b/docs/devel/zoned-storage.rst
39
@@ -XXX,XX +XXX,XX @@
40
+=============
41
+zoned-storage
42
+=============
43
+
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
45
+that are larger than the LBA size. They can only allow sequential writes, which
46
+can reduce write amplification in SSDs, and potentially lead to higher
47
+throughput and increased capacity. More details about ZBDs can be found at:
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/docs/system/qemu-block-drivers.rst.inc
86
+++ b/docs/system/qemu-block-drivers.rst.inc
87
@@ -XXX,XX +XXX,XX @@ Hard disks
88
you may corrupt your host data (use the ``-snapshot`` command
89
line option or modify the device permissions accordingly).
90
91
+Zoned block devices
92
+ Zoned block devices can be passed through to the guest if the emulated storage
93
+ controller supports zoned storage. Use ``--blockdev host_device,
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
95
+ ``/dev/nullb0`` as ``drive0``.
96
+
97
Windows
98
^^^^^^^
99
100
--
101
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Since Linux doesn't have a user API to issue zone append operations to
4
zoned devices from user space, the file-posix driver is modified to add
5
zone append emulation using regular writes. To do this, the file-posix
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
9
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
19
--Stefan]
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
22
include/block/block-common.h | 14 +++
23
include/block/block_int-common.h | 5 +
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
25
3 files changed, 193 insertions(+), 4 deletions(-)
26
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
28
index XXXXXXX..XXXXXXX 100644
29
--- a/include/block/block-common.h
30
+++ b/include/block/block-common.h
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
32
BlockZoneState state;
33
} BlockZoneDescriptor;
34
35
+/*
36
+ * Track write pointers of a zone in bytes.
37
+ */
38
+typedef struct BlockZoneWps {
39
+ CoMutex colock;
40
+ uint64_t wp[];
41
+} BlockZoneWps;
42
+
43
typedef struct BlockDriverInfo {
44
/* in bytes, 0 if irrelevant */
45
int cluster_size;
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
47
#define BDRV_SECTOR_BITS 9
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
49
50
+/*
51
+ * Get the first most significant bit of wp. If it is zero, then
52
+ * the zone type is SWR.
53
+ */
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
55
+
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
57
INT_MAX >> BDRV_SECTOR_BITS)
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
60
index XXXXXXX..XXXXXXX 100644
61
--- a/include/block/block_int-common.h
62
+++ b/include/block/block_int-common.h
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
64
65
/* maximum number of active zones */
66
uint32_t max_active_zones;
67
+
68
+ uint32_t write_granularity;
69
} BlockLimits;
70
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
73
CoMutex bsc_modify_lock;
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
75
BdrvBlockStatusCache *block_status_cache;
76
+
77
+ /* array of write pointers' location of each zone in the zoned device. */
78
+ BlockZoneWps *wps;
79
};
80
81
struct BlockBackendRootState {
82
diff --git a/block/file-posix.c b/block/file-posix.c
83
index XXXXXXX..XXXXXXX 100644
84
--- a/block/file-posix.c
85
+++ b/block/file-posix.c
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
87
}
88
89
#if defined(CONFIG_BLKZONED)
90
+/*
91
+ * If the reset_all flag is true, then the wps of zone whose state is
92
+ * not readonly or offline should be all reset to the start sector.
93
+ * Else, take the real wp of the device.
94
+ */
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
96
+ unsigned int nrz, bool reset_all)
97
+{
98
+ struct blk_zone *blkz;
99
+ size_t rep_size;
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
101
+ BlockZoneWps *wps = bs->wps;
102
+ unsigned int j = offset / bs->bl.zone_size;
103
+ unsigned int n = 0, i = 0;
104
+ int ret;
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
106
+ g_autofree struct blk_zone_report *rep = NULL;
107
+
108
+ rep = g_malloc(rep_size);
109
+ blkz = (struct blk_zone *)(rep + 1);
110
+ while (n < nrz) {
111
+ memset(rep, 0, rep_size);
112
+ rep->sector = sector;
113
+ rep->nr_zones = nrz - n;
114
+
115
+ do {
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
117
+ } while (ret != 0 && errno == EINTR);
118
+ if (ret != 0) {
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
120
+ fd, offset, errno);
121
+ return -errno;
122
+ }
123
+
124
+ if (!rep->nr_zones) {
125
+ break;
126
+ }
127
+
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
129
+ /*
130
+ * The wp tracking cares only about sequential writes required and
131
+ * sequential write preferred zones so that the wp can advance to
132
+ * the right location.
133
+ * Use the most significant bit of the wp location to indicate the
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
135
+ */
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
137
+ wps->wp[j] |= 1ULL << 63;
138
+ } else {
139
+ switch(blkz[i].cond) {
140
+ case BLK_ZONE_COND_FULL:
141
+ case BLK_ZONE_COND_READONLY:
142
+ /* Zone not writable */
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
144
+ break;
145
+ case BLK_ZONE_COND_OFFLINE:
146
+ /* Zone not writable nor readable */
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
148
+ break;
149
+ default:
150
+ if (reset_all) {
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
152
+ } else {
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
154
+ }
155
+ break;
156
+ }
157
+ }
158
+ }
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
160
+ }
161
+
162
+ return 0;
163
+}
164
+
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
166
+ unsigned int nrz)
167
+{
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
169
+ error_report("update zone wp failed");
170
+ }
171
+}
172
+
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
174
Error **errp)
175
{
176
+ BDRVRawState *s = bs->opaque;
177
BlockZoneModel zoned;
178
int ret;
179
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
181
if (ret > 0) {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
183
}
184
+
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
186
+ if (ret >= 0) {
187
+ bs->bl.write_granularity = ret;
188
+ }
189
+
190
+ /* The refresh_limits() function can be called multiple times. */
191
+ g_free(bs->wps);
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
193
+ sizeof(int64_t) * bs->bl.nr_zones);
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
195
+ if (ret < 0) {
196
+ error_setg_errno(errp, -ret, "report wps failed");
197
+ bs->wps = NULL;
198
+ return;
199
+ }
200
+ qemu_co_mutex_init(&bs->wps->colock);
201
}
202
#else /* !defined(CONFIG_BLKZONED) */
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
206
} while (ret != 0 && errno == EINTR);
207
208
- return ret;
209
+ return ret < 0 ? -errno : ret;
210
}
211
#endif
212
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
214
{
215
BDRVRawState *s = bs->opaque;
216
RawPosixAIOData acb;
217
+ int ret;
218
219
if (fd_open(bs) < 0)
220
return -EIO;
221
+#if defined(CONFIG_BLKZONED)
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
223
+ qemu_co_mutex_lock(&bs->wps->colock);
224
+ }
225
+#endif
226
227
/*
228
* When using O_DIRECT, the request must be aligned to be able to use
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
230
#ifdef CONFIG_LINUX_IO_URING
231
} else if (s->use_linux_io_uring) {
232
assert(qiov->size == bytes);
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
246
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
248
};
249
250
assert(qiov->size == bytes);
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
253
+ goto out; /* Avoid the compiler err of unused label */
254
+
255
+out:
256
+#if defined(CONFIG_BLKZONED)
257
+{
258
+ BlockZoneWps *wps = bs->wps;
259
+ if (ret == 0) {
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
279
+#endif
280
+ return ret;
281
}
282
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
285
BDRVRawState *s = bs->opaque;
286
287
if (s->fd >= 0) {
288
+#if defined(CONFIG_BLKZONED)
289
+ g_free(bs->wps);
290
+#endif
291
qemu_close(s->fd);
292
s->fd = -1;
293
}
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
295
const char *op_name;
296
unsigned long zo;
297
int ret;
298
+ BlockZoneWps *wps = bs->wps;
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
300
301
zone_size = bs->bl.zone_size;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
303
return -EINVAL;
304
}
305
306
+ uint32_t i = offset / bs->bl.zone_size;
307
+ uint32_t nrz = len / bs->bl.zone_size;
308
+ uint64_t *wp = &wps->wp[i];
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
311
+ return -EIO;
312
+ }
313
+
314
switch (op) {
315
case BLK_ZO_OPEN:
316
op_name = "BLKOPENZONE";
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
318
len >> BDRV_SECTOR_BITS);
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
320
if (ret != 0) {
321
+ update_zones_wp(bs, s->fd, offset, i);
322
error_report("ioctl %s failed %d", op_name, ret);
323
+ return ret;
324
+ }
325
+
326
+ if (zo == BLKRESETZONE && len == capacity) {
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
328
+ if (ret < 0) {
329
+ error_report("reporting single wp failed");
330
+ return ret;
331
+ }
332
+ } else if (zo == BLKRESETZONE) {
333
+ for (unsigned int j = 0; j < nrz; ++j) {
334
+ wp[j] = offset + j * zone_size;
335
+ }
336
+ } else if (zo == BLKFINISHZONE) {
337
+ for (unsigned int j = 0; j < nrz; ++j) {
338
+ /* The zoned device allows the last zone smaller that the
339
+ * zone size. */
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
341
+ }
342
}
343
344
return ret;
345
--
346
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
A zone append command is a write operation that specifies the first
4
logical block of a zone as the write position. When writing to a zoned
5
block device using zone append, the byte offset of the call may point at
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
include/block/block-io.h | 4 ++
17
include/block/block_int-common.h | 3 ++
18
include/block/raw-aio.h | 4 +-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
27
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
183
{
184
BlockDriverState *bs = blk_bs(blk);
185
diff --git a/block/file-posix.c b/block/file-posix.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
216
}
217
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
317
diff --git a/block/io.c b/block/io.c
318
index XXXXXXX..XXXXXXX 100644
319
--- a/block/io.c
320
+++ b/block/io.c
321
@@ -XXX,XX +XXX,XX @@ out:
322
return co.ret;
323
}
324
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
326
+ QEMUIOVector *qiov,
327
+ BdrvRequestFlags flags)
328
+{
329
+ int ret;
330
+ BlockDriver *drv = bs->drv;
331
+ CoroutineIOCompletion co = {
332
+ .coroutine = qemu_coroutine_self(),
333
+ };
334
+ IO_CODE();
335
+
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
337
+ if (ret < 0) {
338
+ return ret;
339
+ }
340
+
341
+ bdrv_inc_in_flight(bs);
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
343
+ co.ret = -ENOTSUP;
344
+ goto out;
345
+ }
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
347
+out:
348
+ bdrv_dec_in_flight(bs);
349
+ return co.ret;
350
+}
351
+
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
353
{
354
IO_CODE();
355
diff --git a/block/io_uring.c b/block/io_uring.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/block/io_uring.c
358
+++ b/block/io_uring.c
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
361
luringcb->qiov->niov, offset);
362
break;
363
+ case QEMU_AIO_ZONE_APPEND:
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
410
--
411
2.40.1
diff view generated by jsdifflib
1
From: Pino Toscano <ptoscano@redhat.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Rewrite the implementation of the ssh block driver to use libssh instead
3
The patch tests zone append writes by reporting the zone wp after
4
of libssh2. The libssh library has various advantages over libssh2:
4
the completion of the call. "zap -p" option can print the sector
5
- easier API for authentication (for example for using ssh-agent)
5
offset value after completion, which should be the start sector
6
- easier API for known_hosts handling
6
where the append write begins.
7
- supports newer types of keys in known_hosts
8
7
9
Use APIs/features available in libssh 0.8 conditionally, to support
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
10
older versions (which are not recommended though).
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230508051510.177850-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
11
17
12
Adjust the iotest 207 according to the different error message, and to
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
13
find the default key type for localhost (to properly compare the
14
fingerprint with).
15
Contributed-by: Max Reitz <mreitz@redhat.com>
16
17
Adjust the various Docker/Travis scripts to use libssh when available
18
instead of libssh2. The mingw/mxe testing is dropped for now, as there
19
are no packages for it.
20
21
Signed-off-by: Pino Toscano <ptoscano@redhat.com>
22
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
23
Acked-by: Alex Bennée <alex.bennee@linaro.org>
24
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
25
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
26
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
27
Signed-off-by: Max Reitz <mreitz@redhat.com>
28
---
29
configure | 65 +-
30
block/Makefile.objs | 6 +-
31
block/ssh.c | 652 ++++++++++--------
32
.travis.yml | 4 +-
33
block/trace-events | 14 +-
34
docs/qemu-block-drivers.texi | 2 +-
35
.../dockerfiles/debian-win32-cross.docker | 1 -
36
.../dockerfiles/debian-win64-cross.docker | 1 -
37
tests/docker/dockerfiles/fedora.docker | 4 +-
38
tests/docker/dockerfiles/ubuntu.docker | 2 +-
39
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
40
tests/qemu-iotests/207 | 54 +-
41
tests/qemu-iotests/207.out | 2 +-
42
13 files changed, 449 insertions(+), 360 deletions(-)
43
44
diff --git a/configure b/configure
45
index XXXXXXX..XXXXXXX 100755
46
--- a/configure
47
+++ b/configure
48
@@ -XXX,XX +XXX,XX @@ auth_pam=""
49
vte=""
50
virglrenderer=""
51
tpm=""
52
-libssh2=""
53
+libssh=""
54
live_block_migration="yes"
55
numa=""
56
tcmalloc="no"
57
@@ -XXX,XX +XXX,XX @@ for opt do
58
;;
59
--enable-tpm) tpm="yes"
60
;;
61
- --disable-libssh2) libssh2="no"
62
+ --disable-libssh) libssh="no"
63
;;
64
- --enable-libssh2) libssh2="yes"
65
+ --enable-libssh) libssh="yes"
66
;;
67
--disable-live-block-migration) live_block_migration="no"
68
;;
69
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
70
coroutine-pool coroutine freelist (better performance)
71
glusterfs GlusterFS backend
72
tpm TPM support
73
- libssh2 ssh block device support
74
+ libssh ssh block device support
75
numa libnuma support
76
libxml2 for Parallels image format
77
tcmalloc tcmalloc support
78
@@ -XXX,XX +XXX,XX @@ EOF
79
fi
80
81
##########################################
82
-# libssh2 probe
83
-min_libssh2_version=1.2.8
84
-if test "$libssh2" != "no" ; then
85
- if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
86
- libssh2_cflags=$($pkg_config libssh2 --cflags)
87
- libssh2_libs=$($pkg_config libssh2 --libs)
88
- libssh2=yes
89
+# libssh probe
90
+if test "$libssh" != "no" ; then
91
+ if $pkg_config --exists libssh; then
92
+ libssh_cflags=$($pkg_config libssh --cflags)
93
+ libssh_libs=$($pkg_config libssh --libs)
94
+ libssh=yes
95
else
96
- if test "$libssh2" = "yes" ; then
97
- error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
98
+ if test "$libssh" = "yes" ; then
99
+ error_exit "libssh required for --enable-libssh"
100
fi
101
- libssh2=no
102
+ libssh=no
103
fi
104
fi
105
106
##########################################
107
-# libssh2_sftp_fsync probe
108
+# Check for libssh 0.8
109
+# This is done like this instead of using the LIBSSH_VERSION_* and
110
+# SSH_VERSION_* macros because some distributions in the past shipped
111
+# snapshots of the future 0.8 from Git, and those snapshots did not
112
+# have updated version numbers (still referring to 0.7.0).
113
114
-if test "$libssh2" = "yes"; then
115
+if test "$libssh" = "yes"; then
116
cat > $TMPC <<EOF
117
-#include <stdio.h>
118
-#include <libssh2.h>
119
-#include <libssh2_sftp.h>
120
-int main(void) {
121
- LIBSSH2_SESSION *session;
122
- LIBSSH2_SFTP *sftp;
123
- LIBSSH2_SFTP_HANDLE *sftp_handle;
124
- session = libssh2_session_init ();
125
- sftp = libssh2_sftp_init (session);
126
- sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
127
- libssh2_sftp_fsync (sftp_handle);
128
- return 0;
129
-}
130
+#include <libssh/libssh.h>
131
+int main(void) { return ssh_get_server_publickey(NULL, NULL); }
132
EOF
133
- # libssh2_cflags/libssh2_libs defined in previous test.
134
- if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
135
- QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
136
+ if compile_prog "$libssh_cflags" "$libssh_libs"; then
137
+ libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
138
fi
139
fi
140
141
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
142
echo "gcov $gcov_tool"
143
echo "gcov enabled $gcov"
144
echo "TPM support $tpm"
145
-echo "libssh2 support $libssh2"
146
+echo "libssh support $libssh"
147
echo "QOM debugging $qom_cast_debug"
148
echo "Live block migration $live_block_migration"
149
echo "lzo support $lzo"
150
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
151
echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
152
fi
153
154
-if test "$libssh2" = "yes" ; then
155
- echo "CONFIG_LIBSSH2=m" >> $config_host_mak
156
- echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
157
- echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
158
+if test "$libssh" = "yes" ; then
159
+ echo "CONFIG_LIBSSH=m" >> $config_host_mak
160
+ echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
161
+ echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
162
fi
163
164
if test "$live_block_migration" = "yes" ; then
165
diff --git a/block/Makefile.objs b/block/Makefile.objs
166
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
167
--- a/block/Makefile.objs
20
--- a/qemu-io-cmds.c
168
+++ b/block/Makefile.objs
21
+++ b/qemu-io-cmds.c
169
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
170
block-obj-$(CONFIG_RBD) += rbd.o
23
.oneline = "reset a zone write pointer in zone block device",
171
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
24
};
172
block-obj-$(CONFIG_VXHS) += vxhs.o
25
173
-block-obj-$(CONFIG_LIBSSH2) += ssh.o
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
174
+block-obj-$(CONFIG_LIBSSH) += ssh.o
27
+ int64_t *offset, int flags, int *total)
175
block-obj-y += accounting.o dirty-bitmap.o
28
+{
176
block-obj-y += write-threshold.o
29
+ int async_ret = NOT_DONE;
177
block-obj-y += backup.o
30
+
178
@@ -XXX,XX +XXX,XX @@ rbd.o-libs := $(RBD_LIBS)
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
179
gluster.o-cflags := $(GLUSTERFS_CFLAGS)
32
+ while (async_ret == NOT_DONE) {
180
gluster.o-libs := $(GLUSTERFS_LIBS)
33
+ main_loop_wait(false);
181
vxhs.o-libs := $(VXHS_LIBS)
182
-ssh.o-cflags := $(LIBSSH2_CFLAGS)
183
-ssh.o-libs := $(LIBSSH2_LIBS)
184
+ssh.o-cflags := $(LIBSSH_CFLAGS)
185
+ssh.o-libs := $(LIBSSH_LIBS)
186
block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
187
block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
188
dmg-bz2.o-libs := $(BZIP2_LIBS)
189
diff --git a/block/ssh.c b/block/ssh.c
190
index XXXXXXX..XXXXXXX 100644
191
--- a/block/ssh.c
192
+++ b/block/ssh.c
193
@@ -XXX,XX +XXX,XX @@
194
195
#include "qemu/osdep.h"
196
197
-#include <libssh2.h>
198
-#include <libssh2_sftp.h>
199
+#include <libssh/libssh.h>
200
+#include <libssh/sftp.h>
201
202
#include "block/block_int.h"
203
#include "block/qdict.h"
204
@@ -XXX,XX +XXX,XX @@
205
#include "trace.h"
206
207
/*
208
- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself. Note
209
- * that this requires that libssh2 was specially compiled with the
210
- * `./configure --enable-debug' option, so most likely you will have
211
- * to compile it yourself. The meaning of <bitmask> is described
212
- * here: http://www.libssh2.org/libssh2_trace.html
213
+ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
214
+ * The meaning of <level> is described here:
215
+ * http://api.libssh.org/master/group__libssh__log.html
216
*/
217
-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
218
+#define TRACE_LIBSSH 0 /* see: SSH_LOG_* */
219
220
typedef struct BDRVSSHState {
221
/* Coroutine. */
222
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
223
224
/* SSH connection. */
225
int sock; /* socket */
226
- LIBSSH2_SESSION *session; /* ssh session */
227
- LIBSSH2_SFTP *sftp; /* sftp session */
228
- LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
229
+ ssh_session session; /* ssh session */
230
+ sftp_session sftp; /* sftp session */
231
+ sftp_file sftp_handle; /* sftp remote file handle */
232
233
- /* See ssh_seek() function below. */
234
- int64_t offset;
235
- bool offset_op_read;
236
-
237
- /* File attributes at open. We try to keep the .filesize field
238
+ /*
239
+ * File attributes at open. We try to keep the .size field
240
* updated if it changes (eg by writing at the end of the file).
241
*/
242
- LIBSSH2_SFTP_ATTRIBUTES attrs;
243
+ sftp_attributes attrs;
244
245
InetSocketAddress *inet;
246
247
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
248
{
249
memset(s, 0, sizeof *s);
250
s->sock = -1;
251
- s->offset = -1;
252
qemu_co_mutex_init(&s->lock);
253
}
254
255
@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
256
{
257
g_free(s->user);
258
259
+ if (s->attrs) {
260
+ sftp_attributes_free(s->attrs);
261
+ }
34
+ }
262
if (s->sftp_handle) {
35
+
263
- libssh2_sftp_close(s->sftp_handle);
36
+ *total = qiov->size;
264
+ sftp_close(s->sftp_handle);
37
+ return async_ret < 0 ? async_ret : 1;
265
}
38
+}
266
if (s->sftp) {
39
+
267
- libssh2_sftp_shutdown(s->sftp);
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
268
+ sftp_free(s->sftp);
41
+{
269
}
270
if (s->session) {
271
- libssh2_session_disconnect(s->session,
272
- "from qemu ssh client: "
273
- "user closed the connection");
274
- libssh2_session_free(s->session);
275
- }
276
- if (s->sock >= 0) {
277
- close(s->sock);
278
+ ssh_disconnect(s->session);
279
+ ssh_free(s->session); /* This frees s->sock */
280
}
281
}
282
283
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
284
va_end(args);
285
286
if (s->session) {
287
- char *ssh_err;
288
+ const char *ssh_err;
289
int ssh_err_code;
290
291
- /* This is not an errno. See <libssh2.h>. */
292
- ssh_err_code = libssh2_session_last_error(s->session,
293
- &ssh_err, NULL, 0);
294
- error_setg(errp, "%s: %s (libssh2 error code: %d)",
295
+ /* This is not an errno. See <libssh/libssh.h>. */
296
+ ssh_err = ssh_get_error(s->session);
297
+ ssh_err_code = ssh_get_error_code(s->session);
298
+ error_setg(errp, "%s: %s (libssh error code: %d)",
299
msg, ssh_err, ssh_err_code);
300
} else {
301
error_setg(errp, "%s", msg);
302
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
303
va_end(args);
304
305
if (s->sftp) {
306
- char *ssh_err;
307
+ const char *ssh_err;
308
int ssh_err_code;
309
- unsigned long sftp_err_code;
310
+ int sftp_err_code;
311
312
- /* This is not an errno. See <libssh2.h>. */
313
- ssh_err_code = libssh2_session_last_error(s->session,
314
- &ssh_err, NULL, 0);
315
- /* See <libssh2_sftp.h>. */
316
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
317
+ /* This is not an errno. See <libssh/libssh.h>. */
318
+ ssh_err = ssh_get_error(s->session);
319
+ ssh_err_code = ssh_get_error_code(s->session);
320
+ /* See <libssh/sftp.h>. */
321
+ sftp_err_code = sftp_get_error(s->sftp);
322
323
error_setg(errp,
324
- "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
325
+ "%s: %s (libssh error code: %d, sftp error code: %d)",
326
msg, ssh_err, ssh_err_code, sftp_err_code);
327
} else {
328
error_setg(errp, "%s", msg);
329
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
330
331
static void sftp_error_trace(BDRVSSHState *s, const char *op)
332
{
333
- char *ssh_err;
334
+ const char *ssh_err;
335
int ssh_err_code;
336
- unsigned long sftp_err_code;
337
+ int sftp_err_code;
338
339
- /* This is not an errno. See <libssh2.h>. */
340
- ssh_err_code = libssh2_session_last_error(s->session,
341
- &ssh_err, NULL, 0);
342
- /* See <libssh2_sftp.h>. */
343
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
344
+ /* This is not an errno. See <libssh/libssh.h>. */
345
+ ssh_err = ssh_get_error(s->session);
346
+ ssh_err_code = ssh_get_error_code(s->session);
347
+ /* See <libssh/sftp.h>. */
348
+ sftp_err_code = sftp_get_error(s->sftp);
349
350
trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
351
}
352
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
353
parse_uri(filename, options, errp);
354
}
355
356
-static int check_host_key_knownhosts(BDRVSSHState *s,
357
- const char *host, int port, Error **errp)
358
+static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
359
{
360
- const char *home;
361
- char *knh_file = NULL;
362
- LIBSSH2_KNOWNHOSTS *knh = NULL;
363
- struct libssh2_knownhost *found;
364
- int ret, r;
365
- const char *hostkey;
366
- size_t len;
367
- int type;
368
-
369
- hostkey = libssh2_session_hostkey(s->session, &len, &type);
370
- if (!hostkey) {
371
+ int ret;
42
+ int ret;
372
+#ifdef HAVE_LIBSSH_0_8
43
+ bool pflag = false;
373
+ enum ssh_known_hosts_e state;
44
+ int flags = 0;
374
+ int r;
45
+ int total = 0;
375
+ ssh_key pubkey;
46
+ int64_t offset;
376
+ enum ssh_keytypes_e pubkey_type;
47
+ char *buf;
377
+ unsigned char *server_hash = NULL;
48
+ int c, nr_iov;
378
+ size_t server_hash_len;
49
+ int pattern = 0xcd;
379
+ char *fingerprint = NULL;
50
+ QEMUIOVector qiov;
380
+
51
+
381
+ state = ssh_session_is_known_server(s->session);
52
+ if (optind > argc - 3) {
382
+ trace_ssh_server_status(state);
383
+
384
+ switch (state) {
385
+ case SSH_KNOWN_HOSTS_OK:
386
+ /* OK */
387
+ trace_ssh_check_host_key_knownhosts();
388
+ break;
389
+ case SSH_KNOWN_HOSTS_CHANGED:
390
ret = -EINVAL;
391
- session_error_setg(errp, s, "failed to read remote host key");
392
+ r = ssh_get_server_publickey(s->session, &pubkey);
393
+ if (r == 0) {
394
+ r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
395
+ &server_hash, &server_hash_len);
396
+ pubkey_type = ssh_key_type(pubkey);
397
+ ssh_key_free(pubkey);
398
+ }
399
+ if (r == 0) {
400
+ fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
401
+ server_hash,
402
+ server_hash_len);
403
+ ssh_clean_pubkey_hash(&server_hash);
404
+ }
405
+ if (fingerprint) {
406
+ error_setg(errp,
407
+ "host key (%s key with fingerprint %s) does not match "
408
+ "the one in known_hosts; this may be a possible attack",
409
+ ssh_key_type_to_char(pubkey_type), fingerprint);
410
+ ssh_string_free_char(fingerprint);
411
+ } else {
412
+ error_setg(errp,
413
+ "host key does not match the one in known_hosts; this "
414
+ "may be a possible attack");
415
+ }
416
goto out;
417
- }
418
-
419
- knh = libssh2_knownhost_init(s->session);
420
- if (!knh) {
421
+ case SSH_KNOWN_HOSTS_OTHER:
422
ret = -EINVAL;
423
- session_error_setg(errp, s,
424
- "failed to initialize known hosts support");
425
+ error_setg(errp,
426
+ "host key for this server not found, another type exists");
427
+ goto out;
428
+ case SSH_KNOWN_HOSTS_UNKNOWN:
429
+ ret = -EINVAL;
430
+ error_setg(errp, "no host key was found in known_hosts");
431
+ goto out;
432
+ case SSH_KNOWN_HOSTS_NOT_FOUND:
433
+ ret = -ENOENT;
434
+ error_setg(errp, "known_hosts file not found");
435
+ goto out;
436
+ case SSH_KNOWN_HOSTS_ERROR:
437
+ ret = -EINVAL;
438
+ error_setg(errp, "error while checking the host");
439
+ goto out;
440
+ default:
441
+ ret = -EINVAL;
442
+ error_setg(errp, "error while checking for known server (%d)", state);
443
goto out;
444
}
445
+#else /* !HAVE_LIBSSH_0_8 */
446
+ int state;
447
448
- home = getenv("HOME");
449
- if (home) {
450
- knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
451
- } else {
452
- knh_file = g_strdup_printf("/root/.ssh/known_hosts");
453
- }
454
-
455
- /* Read all known hosts from OpenSSH-style known_hosts file. */
456
- libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
457
+ state = ssh_is_server_known(s->session);
458
+ trace_ssh_server_status(state);
459
460
- r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
461
- LIBSSH2_KNOWNHOST_TYPE_PLAIN|
462
- LIBSSH2_KNOWNHOST_KEYENC_RAW,
463
- &found);
464
- switch (r) {
465
- case LIBSSH2_KNOWNHOST_CHECK_MATCH:
466
+ switch (state) {
467
+ case SSH_SERVER_KNOWN_OK:
468
/* OK */
469
- trace_ssh_check_host_key_knownhosts(found->key);
470
+ trace_ssh_check_host_key_knownhosts();
471
break;
472
- case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
473
+ case SSH_SERVER_KNOWN_CHANGED:
474
ret = -EINVAL;
475
- session_error_setg(errp, s,
476
- "host key does not match the one in known_hosts"
477
- " (found key %s)", found->key);
478
+ error_setg(errp,
479
+ "host key does not match the one in known_hosts; this "
480
+ "may be a possible attack");
481
goto out;
482
- case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
483
+ case SSH_SERVER_FOUND_OTHER:
484
ret = -EINVAL;
485
- session_error_setg(errp, s, "no host key was found in known_hosts");
486
+ error_setg(errp,
487
+ "host key for this server not found, another type exists");
488
+ goto out;
489
+ case SSH_SERVER_FILE_NOT_FOUND:
490
+ ret = -ENOENT;
491
+ error_setg(errp, "known_hosts file not found");
492
goto out;
493
- case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
494
+ case SSH_SERVER_NOT_KNOWN:
495
ret = -EINVAL;
496
- session_error_setg(errp, s,
497
- "failure matching the host key with known_hosts");
498
+ error_setg(errp, "no host key was found in known_hosts");
499
+ goto out;
500
+ case SSH_SERVER_ERROR:
501
+ ret = -EINVAL;
502
+ error_setg(errp, "server error");
503
goto out;
504
default:
505
ret = -EINVAL;
506
- session_error_setg(errp, s, "unknown error matching the host key"
507
- " with known_hosts (%d)", r);
508
+ error_setg(errp, "error while checking for known server (%d)", state);
509
goto out;
510
}
511
+#endif /* !HAVE_LIBSSH_0_8 */
512
513
/* known_hosts checking successful. */
514
ret = 0;
515
516
out:
517
- if (knh != NULL) {
518
- libssh2_knownhost_free(knh);
519
- }
520
- g_free(knh_file);
521
return ret;
522
}
523
524
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
525
526
static int
527
check_host_key_hash(BDRVSSHState *s, const char *hash,
528
- int hash_type, size_t fingerprint_len, Error **errp)
529
+ enum ssh_publickey_hash_type type, Error **errp)
530
{
531
- const char *fingerprint;
532
-
533
- fingerprint = libssh2_hostkey_hash(s->session, hash_type);
534
- if (!fingerprint) {
535
+ int r;
536
+ ssh_key pubkey;
537
+ unsigned char *server_hash;
538
+ size_t server_hash_len;
539
+
540
+#ifdef HAVE_LIBSSH_0_8
541
+ r = ssh_get_server_publickey(s->session, &pubkey);
542
+#else
543
+ r = ssh_get_publickey(s->session, &pubkey);
544
+#endif
545
+ if (r != SSH_OK) {
546
session_error_setg(errp, s, "failed to read remote host key");
547
return -EINVAL;
548
}
549
550
- if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
551
- hash) != 0) {
552
+ r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
553
+ ssh_key_free(pubkey);
554
+ if (r != 0) {
555
+ session_error_setg(errp, s,
556
+ "failed reading the hash of the server SSH key");
557
+ return -EINVAL;
53
+ return -EINVAL;
558
+ }
54
+ }
559
+
55
+
560
+ r = compare_fingerprint(server_hash, server_hash_len, hash);
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
561
+ ssh_clean_pubkey_hash(&server_hash);
57
+ pflag = true;
562
+ if (r != 0) {
563
error_setg(errp, "remote host key does not match host_key_check '%s'",
564
hash);
565
return -EPERM;
566
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
567
return 0;
568
}
569
570
-static int check_host_key(BDRVSSHState *s, const char *host, int port,
571
- SshHostKeyCheck *hkc, Error **errp)
572
+static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
573
{
574
SshHostKeyCheckMode mode;
575
576
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
577
case SSH_HOST_KEY_CHECK_MODE_HASH:
578
if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
579
return check_host_key_hash(s, hkc->u.hash.hash,
580
- LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
581
+ SSH_PUBLICKEY_HASH_MD5, errp);
582
} else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
583
return check_host_key_hash(s, hkc->u.hash.hash,
584
- LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
585
+ SSH_PUBLICKEY_HASH_SHA1, errp);
586
}
587
g_assert_not_reached();
588
break;
589
case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
590
- return check_host_key_knownhosts(s, host, port, errp);
591
+ return check_host_key_knownhosts(s, errp);
592
default:
593
g_assert_not_reached();
594
}
595
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
596
return -EINVAL;
597
}
598
599
-static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
600
+static int authenticate(BDRVSSHState *s, Error **errp)
601
{
602
int r, ret;
603
- const char *userauthlist;
604
- LIBSSH2_AGENT *agent = NULL;
605
- struct libssh2_agent_publickey *identity;
606
- struct libssh2_agent_publickey *prev_identity = NULL;
607
+ int method;
608
609
- userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
610
- if (strstr(userauthlist, "publickey") == NULL) {
611
+ /* Try to authenticate with the "none" method. */
612
+ r = ssh_userauth_none(s->session, NULL);
613
+ if (r == SSH_AUTH_ERROR) {
614
ret = -EPERM;
615
- error_setg(errp,
616
- "remote server does not support \"publickey\" authentication");
617
+ session_error_setg(errp, s, "failed to authenticate using none "
618
+ "authentication");
619
goto out;
620
- }
621
-
622
- /* Connect to ssh-agent and try each identity in turn. */
623
- agent = libssh2_agent_init(s->session);
624
- if (!agent) {
625
- ret = -EINVAL;
626
- session_error_setg(errp, s, "failed to initialize ssh-agent support");
627
- goto out;
628
- }
629
- if (libssh2_agent_connect(agent)) {
630
- ret = -ECONNREFUSED;
631
- session_error_setg(errp, s, "failed to connect to ssh-agent");
632
- goto out;
633
- }
634
- if (libssh2_agent_list_identities(agent)) {
635
- ret = -EINVAL;
636
- session_error_setg(errp, s,
637
- "failed requesting identities from ssh-agent");
638
+ } else if (r == SSH_AUTH_SUCCESS) {
639
+ /* Authenticated! */
640
+ ret = 0;
641
goto out;
642
}
643
644
- for(;;) {
645
- r = libssh2_agent_get_identity(agent, &identity, prev_identity);
646
- if (r == 1) { /* end of list */
647
- break;
648
- }
649
- if (r < 0) {
650
+ method = ssh_userauth_list(s->session, NULL);
651
+ trace_ssh_auth_methods(method);
652
+
653
+ /*
654
+ * Try to authenticate with publickey, using the ssh-agent
655
+ * if available.
656
+ */
657
+ if (method & SSH_AUTH_METHOD_PUBLICKEY) {
658
+ r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
659
+ if (r == SSH_AUTH_ERROR) {
660
ret = -EINVAL;
661
- session_error_setg(errp, s,
662
- "failed to obtain identity from ssh-agent");
663
+ session_error_setg(errp, s, "failed to authenticate using "
664
+ "publickey authentication");
665
goto out;
666
- }
667
- r = libssh2_agent_userauth(agent, user, identity);
668
- if (r == 0) {
669
+ } else if (r == SSH_AUTH_SUCCESS) {
670
/* Authenticated! */
671
ret = 0;
672
goto out;
673
}
674
- /* Failed to authenticate with this identity, try the next one. */
675
- prev_identity = identity;
676
}
677
678
ret = -EPERM;
679
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
680
"and the identities held by your ssh-agent");
681
682
out:
683
- if (agent != NULL) {
684
- /* Note: libssh2 implementation implicitly calls
685
- * libssh2_agent_disconnect if necessary.
686
- */
687
- libssh2_agent_free(agent);
688
- }
689
-
690
return ret;
691
}
692
693
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
694
int ssh_flags, int creat_mode, Error **errp)
695
{
696
int r, ret;
697
- long port = 0;
698
+ unsigned int port = 0;
699
+ int new_sock = -1;
700
701
if (opts->has_user) {
702
s->user = g_strdup(opts->user);
703
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
704
s->inet = opts->server;
705
opts->server = NULL;
706
707
- if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
708
+ if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
709
error_setg(errp, "Use only numeric port value");
710
ret = -EINVAL;
711
goto err;
712
}
713
714
/* Open the socket and connect. */
715
- s->sock = inet_connect_saddr(s->inet, errp);
716
- if (s->sock < 0) {
717
+ new_sock = inet_connect_saddr(s->inet, errp);
718
+ if (new_sock < 0) {
719
ret = -EIO;
720
goto err;
721
}
722
723
+ /*
724
+ * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
725
+ * but do not fail if it cannot be disabled.
726
+ */
727
+ r = socket_set_nodelay(new_sock);
728
+ if (r < 0) {
729
+ warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
730
+ s->inet->host, strerror(errno));
731
+ }
58
+ }
732
+
59
+
733
/* Create SSH session. */
60
+ offset = cvtnum(argv[optind]);
734
- s->session = libssh2_session_init();
61
+ if (offset < 0) {
735
+ s->session = ssh_new();
62
+ print_cvtnum_err(offset, argv[optind]);
736
if (!s->session) {
63
+ return offset;
737
ret = -EINVAL;
64
+ }
738
- session_error_setg(errp, s, "failed to initialize libssh2 session");
65
+ optind++;
739
+ session_error_setg(errp, s, "failed to initialize libssh session");
66
+ nr_iov = argc - optind;
740
goto err;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
741
}
68
+ flags & BDRV_REQ_REGISTERED_BUF);
742
69
+ if (buf == NULL) {
743
-#if TRACE_LIBSSH2 != 0
70
+ return -EINVAL;
744
- libssh2_trace(s->session, TRACE_LIBSSH2);
71
+ }
745
-#endif
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
746
+ /*
73
+ if (ret < 0) {
747
+ * Make sure we are in blocking mode during the connection and
74
+ printf("zone append failed: %s\n", strerror(-ret));
748
+ * authentication phases.
75
+ goto out;
749
+ */
750
+ ssh_set_blocking(s->session, 1);
751
752
- r = libssh2_session_handshake(s->session, s->sock);
753
- if (r != 0) {
754
+ r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
755
+ if (r < 0) {
756
+ ret = -EINVAL;
757
+ session_error_setg(errp, s,
758
+ "failed to set the user in the libssh session");
759
+ goto err;
760
+ }
76
+ }
761
+
77
+
762
+ r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
78
+ if (pflag) {
763
+ if (r < 0) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
764
+ ret = -EINVAL;
80
+ tosector(offset));
765
+ session_error_setg(errp, s,
766
+ "failed to set the host in the libssh session");
767
+ goto err;
768
+ }
81
+ }
769
+
82
+
770
+ if (port > 0) {
83
+out:
771
+ r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
84
+ qemu_io_free(blk, buf, qiov.size,
772
+ if (r < 0) {
85
+ flags & BDRV_REQ_REGISTERED_BUF);
773
+ ret = -EINVAL;
86
+ qemu_iovec_destroy(&qiov);
774
+ session_error_setg(errp, s,
87
+ return ret;
775
+ "failed to set the port in the libssh session");
88
+}
776
+ goto err;
777
+ }
778
+ }
779
+
89
+
780
+ r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
90
+static const cmdinfo_t zone_append_cmd = {
781
+ if (r < 0) {
91
+ .name = "zone_append",
782
+ ret = -EINVAL;
92
+ .altname = "zap",
783
+ session_error_setg(errp, s,
93
+ .cfunc = zone_append_f,
784
+ "failed to disable the compression in the libssh "
94
+ .argmin = 3,
785
+ "session");
95
+ .argmax = 4,
786
+ goto err;
96
+ .args = "offset len [len..]",
787
+ }
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
788
+
99
+
789
+ /* Read ~/.ssh/config. */
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
790
+ r = ssh_options_parse_config(s->session, NULL);
101
static const cmdinfo_t truncate_cmd = {
791
+ if (r < 0) {
102
.name = "truncate",
792
+ ret = -EINVAL;
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
793
+ session_error_setg(errp, s, "failed to parse ~/.ssh/config");
104
qemuio_add_command(&zone_close_cmd);
794
+ goto err;
105
qemuio_add_command(&zone_finish_cmd);
795
+ }
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
796
+
146
+
797
+ r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
798
+ if (r < 0) {
799
+ ret = -EINVAL;
800
+ session_error_setg(errp, s,
801
+ "failed to set the socket in the libssh session");
802
+ goto err;
803
+ }
804
+ /* libssh took ownership of the socket. */
805
+ s->sock = new_sock;
806
+ new_sock = -1;
807
+
147
+
808
+ /* Connect. */
148
+(6) append write
809
+ r = ssh_connect(s->session);
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
810
+ if (r != SSH_OK) {
150
+After zap done, the append sector is 0x0
811
ret = -EINVAL;
151
+After appending the first zone firstly:
812
session_error_setg(errp, s, "failed to establish SSH session");
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
813
goto err;
153
+After zap done, the append sector is 0x18
814
}
154
+After appending the first zone secondly:
815
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
816
/* Check the remote host's key against known_hosts. */
156
+After zap done, the append sector is 0x80000
817
- ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
157
+After appending the second zone firstly:
818
+ ret = check_host_key(s, opts->host_key_check, errp);
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
819
if (ret < 0) {
159
+After zap done, the append sector is 0x80018
820
goto err;
160
+After appending the second zone secondly:
821
}
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
822
162
*** done
823
/* Authenticate. */
824
- ret = authenticate(s, s->user, errp);
825
+ ret = authenticate(s, errp);
826
if (ret < 0) {
827
goto err;
828
}
829
830
/* Start SFTP. */
831
- s->sftp = libssh2_sftp_init(s->session);
832
+ s->sftp = sftp_new(s->session);
833
if (!s->sftp) {
834
- session_error_setg(errp, s, "failed to initialize sftp handle");
835
+ session_error_setg(errp, s, "failed to create sftp handle");
836
+ ret = -EINVAL;
837
+ goto err;
838
+ }
839
+
840
+ r = sftp_init(s->sftp);
841
+ if (r < 0) {
842
+ sftp_error_setg(errp, s, "failed to initialize sftp handle");
843
ret = -EINVAL;
844
goto err;
845
}
846
847
/* Open the remote file. */
848
trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
849
- s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
850
- creat_mode);
851
+ s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
852
if (!s->sftp_handle) {
853
- session_error_setg(errp, s, "failed to open remote file '%s'",
854
- opts->path);
855
+ sftp_error_setg(errp, s, "failed to open remote file '%s'",
856
+ opts->path);
857
ret = -EINVAL;
858
goto err;
859
}
860
861
- r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
862
- if (r < 0) {
863
+ /* Make sure the SFTP file is handled in blocking mode. */
864
+ sftp_file_set_blocking(s->sftp_handle);
865
+
866
+ s->attrs = sftp_fstat(s->sftp_handle);
867
+ if (!s->attrs) {
868
sftp_error_setg(errp, s, "failed to read file attributes");
869
return -EINVAL;
870
}
871
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
872
return 0;
873
874
err:
875
+ if (s->attrs) {
876
+ sftp_attributes_free(s->attrs);
877
+ }
878
+ s->attrs = NULL;
879
if (s->sftp_handle) {
880
- libssh2_sftp_close(s->sftp_handle);
881
+ sftp_close(s->sftp_handle);
882
}
883
s->sftp_handle = NULL;
884
if (s->sftp) {
885
- libssh2_sftp_shutdown(s->sftp);
886
+ sftp_free(s->sftp);
887
}
888
s->sftp = NULL;
889
if (s->session) {
890
- libssh2_session_disconnect(s->session,
891
- "from qemu ssh client: "
892
- "error opening connection");
893
- libssh2_session_free(s->session);
894
+ ssh_disconnect(s->session);
895
+ ssh_free(s->session);
896
}
897
s->session = NULL;
898
+ s->sock = -1;
899
+ if (new_sock >= 0) {
900
+ close(new_sock);
901
+ }
902
903
return ret;
904
}
905
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
906
907
ssh_state_init(s);
908
909
- ssh_flags = LIBSSH2_FXF_READ;
910
+ ssh_flags = 0;
911
if (bdrv_flags & BDRV_O_RDWR) {
912
- ssh_flags |= LIBSSH2_FXF_WRITE;
913
+ ssh_flags |= O_RDWR;
914
+ } else {
915
+ ssh_flags |= O_RDONLY;
916
}
917
918
opts = ssh_parse_options(options, errp);
919
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
920
}
921
922
/* Go non-blocking. */
923
- libssh2_session_set_blocking(s->session, 0);
924
+ ssh_set_blocking(s->session, 0);
925
926
qapi_free_BlockdevOptionsSsh(opts);
927
928
return 0;
929
930
err:
931
- if (s->sock >= 0) {
932
- close(s->sock);
933
- }
934
- s->sock = -1;
935
-
936
qapi_free_BlockdevOptionsSsh(opts);
937
938
return ret;
939
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
940
{
941
ssize_t ret;
942
char c[1] = { '\0' };
943
- int was_blocking = libssh2_session_get_blocking(s->session);
944
+ int was_blocking = ssh_is_blocking(s->session);
945
946
/* offset must be strictly greater than the current size so we do
947
* not overwrite anything */
948
- assert(offset > 0 && offset > s->attrs.filesize);
949
+ assert(offset > 0 && offset > s->attrs->size);
950
951
- libssh2_session_set_blocking(s->session, 1);
952
+ ssh_set_blocking(s->session, 1);
953
954
- libssh2_sftp_seek64(s->sftp_handle, offset - 1);
955
- ret = libssh2_sftp_write(s->sftp_handle, c, 1);
956
+ sftp_seek64(s->sftp_handle, offset - 1);
957
+ ret = sftp_write(s->sftp_handle, c, 1);
958
959
- libssh2_session_set_blocking(s->session, was_blocking);
960
+ ssh_set_blocking(s->session, was_blocking);
961
962
if (ret < 0) {
963
sftp_error_setg(errp, s, "Failed to grow file");
964
return -EIO;
965
}
966
967
- s->attrs.filesize = offset;
968
+ s->attrs->size = offset;
969
return 0;
970
}
971
972
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
973
ssh_state_init(&s);
974
975
ret = connect_to_ssh(&s, opts->location,
976
- LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
977
- LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
978
+ O_RDWR | O_CREAT | O_TRUNC,
979
0644, errp);
980
if (ret < 0) {
981
goto fail;
982
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
983
/* Assume false, unless we can positively prove it's true. */
984
int has_zero_init = 0;
985
986
- if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
987
- if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
988
- has_zero_init = 1;
989
- }
990
+ if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
991
+ has_zero_init = 1;
992
}
993
994
return has_zero_init;
995
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
996
.co = qemu_coroutine_self()
997
};
998
999
- r = libssh2_session_block_directions(s->session);
1000
+ r = ssh_get_poll_flags(s->session);
1001
1002
- if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
1003
+ if (r & SSH_READ_PENDING) {
1004
rd_handler = restart_coroutine;
1005
}
1006
- if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
1007
+ if (r & SSH_WRITE_PENDING) {
1008
wr_handler = restart_coroutine;
1009
}
1010
1011
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
1012
trace_ssh_co_yield_back(s->sock);
1013
}
1014
1015
-/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
1016
- * in the remote file. Notice that it just updates a field in the
1017
- * sftp_handle structure, so there is no network traffic and it cannot
1018
- * fail.
1019
- *
1020
- * However, `libssh2_sftp_seek64' does have a catastrophic effect on
1021
- * performance since it causes the handle to throw away all in-flight
1022
- * reads and buffered readahead data. Therefore this function tries
1023
- * to be intelligent about when to call the underlying libssh2 function.
1024
- */
1025
-#define SSH_SEEK_WRITE 0
1026
-#define SSH_SEEK_READ 1
1027
-#define SSH_SEEK_FORCE 2
1028
-
1029
-static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
1030
-{
1031
- bool op_read = (flags & SSH_SEEK_READ) != 0;
1032
- bool force = (flags & SSH_SEEK_FORCE) != 0;
1033
-
1034
- if (force || op_read != s->offset_op_read || offset != s->offset) {
1035
- trace_ssh_seek(offset);
1036
- libssh2_sftp_seek64(s->sftp_handle, offset);
1037
- s->offset = offset;
1038
- s->offset_op_read = op_read;
1039
- }
1040
-}
1041
-
1042
static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1043
int64_t offset, size_t size,
1044
QEMUIOVector *qiov)
1045
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1046
1047
trace_ssh_read(offset, size);
1048
1049
- ssh_seek(s, offset, SSH_SEEK_READ);
1050
+ trace_ssh_seek(offset);
1051
+ sftp_seek64(s->sftp_handle, offset);
1052
1053
/* This keeps track of the current iovec element ('i'), where we
1054
* will write to next ('buf'), and the end of the current iovec
1055
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1056
buf = i->iov_base;
1057
end_of_vec = i->iov_base + i->iov_len;
1058
1059
- /* libssh2 has a hard-coded limit of 2000 bytes per request,
1060
- * although it will also do readahead behind our backs. Therefore
1061
- * we may have to do repeated reads here until we have read 'size'
1062
- * bytes.
1063
- */
1064
for (got = 0; got < size; ) {
1065
+ size_t request_read_size;
1066
again:
1067
- trace_ssh_read_buf(buf, end_of_vec - buf);
1068
- r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
1069
- trace_ssh_read_return(r);
1070
+ /*
1071
+ * The size of SFTP packets is limited to 32K bytes, so limit
1072
+ * the amount of data requested to 16K, as libssh currently
1073
+ * does not handle multiple requests on its own.
1074
+ */
1075
+ request_read_size = MIN(end_of_vec - buf, 16384);
1076
+ trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
1077
+ r = sftp_read(s->sftp_handle, buf, request_read_size);
1078
+ trace_ssh_read_return(r, sftp_get_error(s->sftp));
1079
1080
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1081
+ if (r == SSH_AGAIN) {
1082
co_yield(s, bs);
1083
goto again;
1084
}
1085
- if (r < 0) {
1086
- sftp_error_trace(s, "read");
1087
- s->offset = -1;
1088
- return -EIO;
1089
- }
1090
- if (r == 0) {
1091
+ if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
1092
/* EOF: Short read so pad the buffer with zeroes and return it. */
1093
qemu_iovec_memset(qiov, got, 0, size - got);
1094
return 0;
1095
}
1096
+ if (r <= 0) {
1097
+ sftp_error_trace(s, "read");
1098
+ return -EIO;
1099
+ }
1100
1101
got += r;
1102
buf += r;
1103
- s->offset += r;
1104
if (buf >= end_of_vec && got < size) {
1105
i++;
1106
buf = i->iov_base;
1107
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1108
1109
trace_ssh_write(offset, size);
1110
1111
- ssh_seek(s, offset, SSH_SEEK_WRITE);
1112
+ trace_ssh_seek(offset);
1113
+ sftp_seek64(s->sftp_handle, offset);
1114
1115
/* This keeps track of the current iovec element ('i'), where we
1116
* will read from next ('buf'), and the end of the current iovec
1117
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1118
end_of_vec = i->iov_base + i->iov_len;
1119
1120
for (written = 0; written < size; ) {
1121
+ size_t request_write_size;
1122
again:
1123
- trace_ssh_write_buf(buf, end_of_vec - buf);
1124
- r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
1125
- trace_ssh_write_return(r);
1126
+ /*
1127
+ * Avoid too large data packets, as libssh currently does not
1128
+ * handle multiple requests on its own.
1129
+ */
1130
+ request_write_size = MIN(end_of_vec - buf, 131072);
1131
+ trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
1132
+ r = sftp_write(s->sftp_handle, buf, request_write_size);
1133
+ trace_ssh_write_return(r, sftp_get_error(s->sftp));
1134
1135
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1136
+ if (r == SSH_AGAIN) {
1137
co_yield(s, bs);
1138
goto again;
1139
}
1140
if (r < 0) {
1141
sftp_error_trace(s, "write");
1142
- s->offset = -1;
1143
return -EIO;
1144
}
1145
- /* The libssh2 API is very unclear about this. A comment in
1146
- * the code says "nothing was acked, and no EAGAIN was
1147
- * received!" which apparently means that no data got sent
1148
- * out, and the underlying channel didn't return any EAGAIN
1149
- * indication. I think this is a bug in either libssh2 or
1150
- * OpenSSH (server-side). In any case, forcing a seek (to
1151
- * discard libssh2 internal buffers), and then trying again
1152
- * works for me.
1153
- */
1154
- if (r == 0) {
1155
- ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
1156
- co_yield(s, bs);
1157
- goto again;
1158
- }
1159
1160
written += r;
1161
buf += r;
1162
- s->offset += r;
1163
if (buf >= end_of_vec && written < size) {
1164
i++;
1165
buf = i->iov_base;
1166
end_of_vec = i->iov_base + i->iov_len;
1167
}
1168
1169
- if (offset + written > s->attrs.filesize)
1170
- s->attrs.filesize = offset + written;
1171
+ if (offset + written > s->attrs->size) {
1172
+ s->attrs->size = offset + written;
1173
+ }
1174
}
1175
1176
return 0;
1177
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
1178
}
1179
}
1180
1181
-#ifdef HAS_LIBSSH2_SFTP_FSYNC
1182
+#ifdef HAVE_LIBSSH_0_8
1183
1184
static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
1185
{
1186
int r;
1187
1188
trace_ssh_flush();
1189
+
1190
+ if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
1191
+ unsafe_flush_warning(s, "OpenSSH >= 6.3");
1192
+ return 0;
1193
+ }
1194
again:
1195
- r = libssh2_sftp_fsync(s->sftp_handle);
1196
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1197
+ r = sftp_fsync(s->sftp_handle);
1198
+ if (r == SSH_AGAIN) {
1199
co_yield(s, bs);
1200
goto again;
1201
}
1202
- if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
1203
- libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
1204
- unsafe_flush_warning(s, "OpenSSH >= 6.3");
1205
- return 0;
1206
- }
1207
if (r < 0) {
1208
sftp_error_trace(s, "fsync");
1209
return -EIO;
1210
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
1211
return ret;
1212
}
1213
1214
-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
1215
+#else /* !HAVE_LIBSSH_0_8 */
1216
1217
static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
1218
{
1219
BDRVSSHState *s = bs->opaque;
1220
1221
- unsafe_flush_warning(s, "libssh2 >= 1.4.4");
1222
+ unsafe_flush_warning(s, "libssh >= 0.8.0");
1223
return 0;
1224
}
1225
1226
-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
1227
+#endif /* !HAVE_LIBSSH_0_8 */
1228
1229
static int64_t ssh_getlength(BlockDriverState *bs)
1230
{
1231
BDRVSSHState *s = bs->opaque;
1232
int64_t length;
1233
1234
- /* Note we cannot make a libssh2 call here. */
1235
- length = (int64_t) s->attrs.filesize;
1236
+ /* Note we cannot make a libssh call here. */
1237
+ length = (int64_t) s->attrs->size;
1238
trace_ssh_getlength(length);
1239
1240
return length;
1241
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
1242
return -ENOTSUP;
1243
}
1244
1245
- if (offset < s->attrs.filesize) {
1246
+ if (offset < s->attrs->size) {
1247
error_setg(errp, "ssh driver does not support shrinking files");
1248
return -ENOTSUP;
1249
}
1250
1251
- if (offset == s->attrs.filesize) {
1252
+ if (offset == s->attrs->size) {
1253
return 0;
1254
}
1255
1256
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
1257
{
1258
int r;
1259
1260
- r = libssh2_init(0);
1261
+ r = ssh_init();
1262
if (r != 0) {
1263
- fprintf(stderr, "libssh2 initialization failed, %d\n", r);
1264
+ fprintf(stderr, "libssh initialization failed, %d\n", r);
1265
exit(EXIT_FAILURE);
1266
}
1267
1268
+#if TRACE_LIBSSH != 0
1269
+ ssh_set_log_level(TRACE_LIBSSH);
1270
+#endif
1271
+
1272
bdrv_register(&bdrv_ssh);
1273
}
1274
1275
diff --git a/.travis.yml b/.travis.yml
1276
index XXXXXXX..XXXXXXX 100644
1277
--- a/.travis.yml
1278
+++ b/.travis.yml
1279
@@ -XXX,XX +XXX,XX @@ addons:
1280
- libseccomp-dev
1281
- libspice-protocol-dev
1282
- libspice-server-dev
1283
- - libssh2-1-dev
1284
+ - libssh-dev
1285
- liburcu-dev
1286
- libusb-1.0-0-dev
1287
- libvte-2.91-dev
1288
@@ -XXX,XX +XXX,XX @@ matrix:
1289
- libseccomp-dev
1290
- libspice-protocol-dev
1291
- libspice-server-dev
1292
- - libssh2-1-dev
1293
+ - libssh-dev
1294
- liburcu-dev
1295
- libusb-1.0-0-dev
1296
- libvte-2.91-dev
1297
diff --git a/block/trace-events b/block/trace-events
1298
index XXXXXXX..XXXXXXX 100644
1299
--- a/block/trace-events
1300
+++ b/block/trace-events
1301
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
1302
# ssh.c
1303
ssh_restart_coroutine(void *co) "co=%p"
1304
ssh_flush(void) "fsync"
1305
-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
1306
+ssh_check_host_key_knownhosts(void) "host key OK"
1307
ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
1308
ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
1309
ssh_co_yield_back(int sock) "s->sock=%d - back"
1310
ssh_getlength(int64_t length) "length=%" PRIi64
1311
ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
1312
ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
1313
-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
1314
-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
1315
+ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
1316
+ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
1317
ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
1318
-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
1319
-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
1320
+ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
1321
+ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
1322
ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
1323
+ssh_auth_methods(int methods) "auth methods=0x%x"
1324
+ssh_server_status(int status) "server status=%d"
1325
1326
# curl.c
1327
curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
1328
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
1329
sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
1330
1331
# ssh.c
1332
-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
1333
+sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
1334
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
1335
index XXXXXXX..XXXXXXX 100644
1336
--- a/docs/qemu-block-drivers.texi
1337
+++ b/docs/qemu-block-drivers.texi
1338
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
1339
1340
warning: ssh server @code{ssh.example.com:22} does not support fsync
1341
1342
-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
1343
+With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
1344
supported.
1345
1346
@node disk_images_nvme
1347
diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
1348
index XXXXXXX..XXXXXXX 100644
1349
--- a/tests/docker/dockerfiles/debian-win32-cross.docker
1350
+++ b/tests/docker/dockerfiles/debian-win32-cross.docker
1351
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
1352
mxe-$TARGET-w64-mingw32.shared-curl \
1353
mxe-$TARGET-w64-mingw32.shared-glib \
1354
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
1355
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
1356
mxe-$TARGET-w64-mingw32.shared-libusb1 \
1357
mxe-$TARGET-w64-mingw32.shared-lzo \
1358
mxe-$TARGET-w64-mingw32.shared-nettle \
1359
diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
1360
index XXXXXXX..XXXXXXX 100644
1361
--- a/tests/docker/dockerfiles/debian-win64-cross.docker
1362
+++ b/tests/docker/dockerfiles/debian-win64-cross.docker
1363
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
1364
mxe-$TARGET-w64-mingw32.shared-curl \
1365
mxe-$TARGET-w64-mingw32.shared-glib \
1366
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
1367
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
1368
mxe-$TARGET-w64-mingw32.shared-libusb1 \
1369
mxe-$TARGET-w64-mingw32.shared-lzo \
1370
mxe-$TARGET-w64-mingw32.shared-nettle \
1371
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
1372
index XXXXXXX..XXXXXXX 100644
1373
--- a/tests/docker/dockerfiles/fedora.docker
1374
+++ b/tests/docker/dockerfiles/fedora.docker
1375
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
1376
libpng-devel \
1377
librbd-devel \
1378
libseccomp-devel \
1379
- libssh2-devel \
1380
+ libssh-devel \
1381
libubsan \
1382
libusbx-devel \
1383
libxml2-devel \
1384
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
1385
mingw32-gtk3 \
1386
mingw32-libjpeg-turbo \
1387
mingw32-libpng \
1388
- mingw32-libssh2 \
1389
mingw32-libtasn1 \
1390
mingw32-nettle \
1391
mingw32-pixman \
1392
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
1393
mingw64-gtk3 \
1394
mingw64-libjpeg-turbo \
1395
mingw64-libpng \
1396
- mingw64-libssh2 \
1397
mingw64-libtasn1 \
1398
mingw64-nettle \
1399
mingw64-pixman \
1400
diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
1401
index XXXXXXX..XXXXXXX 100644
1402
--- a/tests/docker/dockerfiles/ubuntu.docker
1403
+++ b/tests/docker/dockerfiles/ubuntu.docker
1404
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
1405
libsnappy-dev \
1406
libspice-protocol-dev \
1407
libspice-server-dev \
1408
- libssh2-1-dev \
1409
+ libssh-dev \
1410
libusb-1.0-0-dev \
1411
libusbredirhost-dev \
1412
libvdeplug-dev \
1413
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
1414
index XXXXXXX..XXXXXXX 100644
1415
--- a/tests/docker/dockerfiles/ubuntu1804.docker
1416
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
1417
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
1418
libsnappy-dev \
1419
libspice-protocol-dev \
1420
libspice-server-dev \
1421
- libssh2-1-dev \
1422
+ libssh-dev \
1423
libusb-1.0-0-dev \
1424
libusbredirhost-dev \
1425
libvdeplug-dev \
1426
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
1427
index XXXXXXX..XXXXXXX 100755
1428
--- a/tests/qemu-iotests/207
1429
+++ b/tests/qemu-iotests/207
1430
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1431
1432
iotests.img_info_log(remote_path)
1433
1434
- md5_key = subprocess.check_output(
1435
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
1436
- 'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
1437
- shell=True).rstrip().decode('ascii')
1438
+ keys = subprocess.check_output(
1439
+ 'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
1440
+ 'cut -d" " -f3',
1441
+ shell=True).rstrip().decode('ascii').split('\n')
1442
+
1443
+ # Mappings of base64 representations to digests
1444
+ md5_keys = {}
1445
+ sha1_keys = {}
1446
+
1447
+ for key in keys:
1448
+ md5_keys[key] = subprocess.check_output(
1449
+ 'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
1450
+ shell=True).rstrip().decode('ascii')
1451
+
1452
+ sha1_keys[key] = subprocess.check_output(
1453
+ 'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
1454
+ shell=True).rstrip().decode('ascii')
1455
1456
vm.launch()
1457
+
1458
+ # Find correct key first
1459
+ matching_key = None
1460
+ for key in keys:
1461
+ result = vm.qmp('blockdev-add',
1462
+ driver='ssh', node_name='node0', path=disk_path,
1463
+ server={
1464
+ 'host': '127.0.0.1',
1465
+ 'port': '22',
1466
+ }, host_key_check={
1467
+ 'mode': 'hash',
1468
+ 'type': 'md5',
1469
+ 'hash': md5_keys[key],
1470
+ })
1471
+
1472
+ if 'error' not in result:
1473
+ vm.qmp('blockdev-del', node_name='node0')
1474
+ matching_key = key
1475
+ break
1476
+
1477
+ if matching_key is None:
1478
+ vm.shutdown()
1479
+ iotests.notrun('Did not find a key that fits 127.0.0.1')
1480
+
1481
blockdev_create(vm, { 'driver': 'ssh',
1482
'location': {
1483
'path': disk_path,
1484
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1485
'host-key-check': {
1486
'mode': 'hash',
1487
'type': 'md5',
1488
- 'hash': md5_key,
1489
+ 'hash': md5_keys[matching_key],
1490
}
1491
},
1492
'size': 8388608 })
1493
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1494
1495
iotests.img_info_log(remote_path)
1496
1497
- sha1_key = subprocess.check_output(
1498
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
1499
- 'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
1500
- shell=True).rstrip().decode('ascii')
1501
-
1502
vm.launch()
1503
blockdev_create(vm, { 'driver': 'ssh',
1504
'location': {
1505
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1506
'host-key-check': {
1507
'mode': 'hash',
1508
'type': 'sha1',
1509
- 'hash': sha1_key,
1510
+ 'hash': sha1_keys[matching_key],
1511
}
1512
},
1513
'size': 4194304 })
1514
diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
1515
index XXXXXXX..XXXXXXX 100644
1516
--- a/tests/qemu-iotests/207.out
1517
+++ b/tests/qemu-iotests/207.out
1518
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
1519
1520
{"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
1521
{"return": {}}
1522
-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
1523
+Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
1524
{"execute": "job-dismiss", "arguments": {"id": "job0"}}
1525
{"return": {}}
1526
1527
--
163
--
1528
2.21.0
164
2.40.1
1529
1530
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230508051510.177850-5-faithilikerun@gmail.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
block/file-posix.c | 3 +++
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
12
13
diff --git a/block/file-posix.c b/block/file-posix.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
16
+++ b/block/file-posix.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
28
}
29
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
32
}
33
#endif
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
48
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
This patch extends virtio-blk emulation to handle zoned device commands
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
7
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
9
support zoned block devices. Regular block devices(conventional zones)
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
hw/block/virtio-blk-common.c | 2 +
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
25
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
27
+++ b/hw/block/virtio-blk-common.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
35
};
36
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/hw/block/virtio-blk.c
40
+++ b/hw/block/virtio-blk.c
41
@@ -XXX,XX +XXX,XX @@
42
#include "qemu/module.h"
43
#include "qemu/error-report.h"
44
#include "qemu/main-loop.h"
45
+#include "block/block_int.h"
46
#include "trace.h"
47
#include "hw/block/block.h"
48
#include "hw/qdev-properties.h"
49
@@ -XXX,XX +XXX,XX @@ err:
50
return err_status;
51
}
52
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
463
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
return;
466
}
467
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
+ if (bs->bl.zoned != BLK_Z_NONE) {
470
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
471
+ if (bs->bl.zoned == BLK_Z_HM) {
472
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
473
+ }
474
+ }
475
+
476
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
477
(!conf->max_discard_sectors ||
478
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
479
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
480
index XXXXXXX..XXXXXXX 100644
481
--- a/hw/virtio/virtio-qmp.c
482
+++ b/hw/virtio/virtio-qmp.c
483
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
484
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
485
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
486
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
487
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
488
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
489
#ifndef VIRTIO_BLK_NO_LEGACY
490
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
491
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
492
--
493
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Taking account of the new zone append write operation for zoned devices,
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
5
write, flush).
6
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
12
qapi/block.json | 4 +++
13
include/block/accounting.h | 1 +
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
tests/qemu-iotests/227.out | 18 ++++++++++
18
7 files changed, 113 insertions(+), 11 deletions(-)
19
20
diff --git a/qapi/block-core.json b/qapi/block-core.json
21
index XXXXXXX..XXXXXXX 100644
22
--- a/qapi/block-core.json
23
+++ b/qapi/block-core.json
24
@@ -XXX,XX +XXX,XX @@
25
# @min_wr_latency_ns: Minimum latency of write operations in the
26
# defined interval, in nanoseconds.
27
#
28
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
29
+# in the defined interval, in nanoseconds
30
+# (since 8.1)
31
+#
32
# @min_flush_latency_ns: Minimum latency of flush operations in the
33
# defined interval, in nanoseconds.
34
#
35
@@ -XXX,XX +XXX,XX @@
36
# @max_wr_latency_ns: Maximum latency of write operations in the
37
# defined interval, in nanoseconds.
38
#
39
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
40
+# in the defined interval, in nanoseconds
41
+# (since 8.1)
42
+#
43
# @max_flush_latency_ns: Maximum latency of flush operations in the
44
# defined interval, in nanoseconds.
45
#
46
@@ -XXX,XX +XXX,XX @@
47
# @avg_wr_latency_ns: Average latency of write operations in the
48
# defined interval, in nanoseconds.
49
#
50
+# @avg_zone_append_latency_ns: Average latency of zone append operations
51
+# in the defined interval, in nanoseconds
52
+# (since 8.1)
53
+#
54
# @avg_flush_latency_ns: Average latency of flush operations in the
55
# defined interval, in nanoseconds.
56
#
57
@@ -XXX,XX +XXX,XX @@
58
# @avg_wr_queue_depth: Average number of pending write operations in
59
# the defined interval.
60
#
61
+# @avg_zone_append_queue_depth: Average number of pending zone append
62
+# operations in the defined interval
63
+# (since 8.1).
64
+#
65
# Since: 2.5
66
##
67
{ 'struct': 'BlockDeviceTimedStats',
68
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
69
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
70
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
71
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
72
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
73
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
74
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
75
+ 'max_zone_append_latency_ns': 'int',
76
+ 'avg_zone_append_latency_ns': 'int',
77
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
78
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
79
+ 'avg_wr_queue_depth': 'number',
80
+ 'avg_zone_append_queue_depth': 'number' } }
81
82
##
83
# @BlockDeviceStats:
84
@@ -XXX,XX +XXX,XX @@
85
#
86
# @wr_bytes: The number of bytes written by the device.
87
#
88
+# @zone_append_bytes: The number of bytes appended by the zoned devices
89
+# (since 8.1)
90
+#
91
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
92
#
93
# @rd_operations: The number of read operations performed by the
94
@@ -XXX,XX +XXX,XX @@
95
# @wr_operations: The number of write operations performed by the
96
# device.
97
#
98
+# @zone_append_operations: The number of zone append operations performed
99
+# by the zoned devices (since 8.1)
100
+#
101
# @flush_operations: The number of cache flush operations performed by
102
# the device (since 0.15)
103
#
104
@@ -XXX,XX +XXX,XX @@
105
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since
106
# 0.15).
107
#
108
+# @zone_append_total_time_ns: Total time spent on zone append writes
109
+# in nanoseconds (since 8.1)
110
+#
111
# @flush_total_time_ns: Total time spent on cache flushes in
112
# nanoseconds (since 0.15).
113
#
114
@@ -XXX,XX +XXX,XX @@
115
# @wr_merged: Number of write requests that have been merged into
116
# another request (Since 2.3).
117
#
118
+# @zone_append_merged: Number of zone append requests that have been merged
119
+# into another request (since 8.1)
120
+#
121
# @unmap_merged: Number of unmap requests that have been merged into
122
# another request (Since 4.2)
123
#
124
@@ -XXX,XX +XXX,XX @@
125
# @failed_wr_operations: The number of failed write operations
126
# performed by the device (Since 2.5)
127
#
128
+# @failed_zone_append_operations: The number of failed zone append write
129
+# operations performed by the zoned devices
130
+# (since 8.1)
131
+#
132
# @failed_flush_operations: The number of failed flush operations
133
# performed by the device (Since 2.5)
134
#
135
@@ -XXX,XX +XXX,XX @@
136
# @invalid_wr_operations: The number of invalid write operations
137
# performed by the device (Since 2.5)
138
#
139
+# @invalid_zone_append_operations: The number of invalid zone append operations
140
+# performed by the zoned device (since 8.1)
141
+#
142
# @invalid_flush_operations: The number of invalid flush operations
143
# performed by the device (Since 2.5)
144
#
145
@@ -XXX,XX +XXX,XX @@
146
#
147
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
148
#
149
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
150
+#
151
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
152
#
153
# Since: 0.14
154
##
155
{ 'struct': 'BlockDeviceStats',
156
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
157
- 'rd_operations': 'int', 'wr_operations': 'int',
158
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
159
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
160
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
161
'flush_operations': 'int', 'unmap_operations': 'int',
162
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
163
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
164
- 'wr_highest_offset': 'int',
165
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
166
- '*idle_time_ns': 'int',
167
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
168
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
169
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
170
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
171
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
172
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
173
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
174
+ 'failed_zone_append_operations': 'int',
175
+ 'failed_flush_operations': 'int',
176
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
177
+ 'invalid_wr_operations': 'int',
178
+ 'invalid_zone_append_operations': 'int',
179
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
180
'account_invalid': 'bool', 'account_failed': 'bool',
181
'timed_stats': ['BlockDeviceTimedStats'],
182
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
184
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
185
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
186
187
##
188
diff --git a/qapi/block.json b/qapi/block.json
189
index XXXXXXX..XXXXXXX 100644
190
--- a/qapi/block.json
191
+++ b/qapi/block.json
192
@@ -XXX,XX +XXX,XX @@
193
# @boundaries-write: list of interval boundary values for write
194
# latency histogram.
195
#
196
+# @boundaries-zap: list of interval boundary values for zone append write
197
+# latency histogram.
198
+#
199
# @boundaries-flush: list of interval boundary values for flush
200
# latency histogram.
201
#
202
@@ -XXX,XX +XXX,XX @@
203
'*boundaries': ['uint64'],
204
'*boundaries-read': ['uint64'],
205
'*boundaries-write': ['uint64'],
206
+ '*boundaries-zap': ['uint64'],
207
'*boundaries-flush': ['uint64'] },
208
'allow-preconfig': true }
209
diff --git a/include/block/accounting.h b/include/block/accounting.h
210
index XXXXXXX..XXXXXXX 100644
211
--- a/include/block/accounting.h
212
+++ b/include/block/accounting.h
213
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
214
BLOCK_ACCT_READ,
215
BLOCK_ACCT_WRITE,
216
BLOCK_ACCT_FLUSH,
217
+ BLOCK_ACCT_ZONE_APPEND,
218
BLOCK_ACCT_UNMAP,
219
BLOCK_MAX_IOTYPE,
220
};
221
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
222
index XXXXXXX..XXXXXXX 100644
223
--- a/block/qapi-sysemu.c
224
+++ b/block/qapi-sysemu.c
225
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
226
bool has_boundaries, uint64List *boundaries,
227
bool has_boundaries_read, uint64List *boundaries_read,
228
bool has_boundaries_write, uint64List *boundaries_write,
229
+ bool has_boundaries_append, uint64List *boundaries_append,
230
bool has_boundaries_flush, uint64List *boundaries_flush,
231
Error **errp)
232
{
233
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
234
}
235
}
236
237
+ if (has_boundaries || has_boundaries_append) {
238
+ ret = block_latency_histogram_set(
239
+ stats, BLOCK_ACCT_ZONE_APPEND,
240
+ has_boundaries_append ? boundaries_append : boundaries);
241
+ if (ret) {
242
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
243
+ return;
244
+ }
245
+ }
246
+
247
if (has_boundaries || has_boundaries_flush) {
248
ret = block_latency_histogram_set(
249
stats, BLOCK_ACCT_FLUSH,
250
diff --git a/block/qapi.c b/block/qapi.c
251
index XXXXXXX..XXXXXXX 100644
252
--- a/block/qapi.c
253
+++ b/block/qapi.c
254
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
255
256
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
257
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
258
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
259
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
260
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
261
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
262
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
263
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
264
265
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
266
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
267
+ ds->failed_zone_append_operations =
268
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
269
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
270
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
271
272
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
273
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
274
+ ds->invalid_zone_append_operations =
275
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
276
ds->invalid_flush_operations =
277
stats->invalid_ops[BLOCK_ACCT_FLUSH];
278
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
279
280
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
281
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
282
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
283
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
284
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
285
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
286
+ ds->zone_append_total_time_ns =
287
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
288
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
289
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
290
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
291
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
292
293
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
294
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
295
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
296
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
297
298
dev_stats->interval_length = ts->interval_length;
299
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
300
dev_stats->max_wr_latency_ns = timed_average_max(wr);
301
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
302
303
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
304
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
305
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
306
+
307
dev_stats->min_flush_latency_ns = timed_average_min(fl);
308
dev_stats->max_flush_latency_ns = timed_average_max(fl);
309
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
310
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
311
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
312
dev_stats->avg_wr_queue_depth =
313
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
314
+ dev_stats->avg_zone_append_queue_depth =
315
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
316
317
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
318
}
319
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
321
ds->wr_latency_histogram
322
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
323
+ ds->zone_append_latency_histogram
324
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
325
ds->flush_latency_histogram
326
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
327
}
328
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
329
index XXXXXXX..XXXXXXX 100644
330
--- a/hw/block/virtio-blk.c
331
+++ b/hw/block/virtio-blk.c
332
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
333
data->in_num = in_num;
334
data->zone_append_data.offset = offset;
335
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
336
+
337
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
338
+ BLOCK_ACCT_ZONE_APPEND);
339
+
340
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
341
virtio_blk_zone_append_complete, data);
342
return 0;
343
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
344
index XXXXXXX..XXXXXXX 100644
345
--- a/tests/qemu-iotests/227.out
346
+++ b/tests/qemu-iotests/227.out
347
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
348
"stats": {
349
"unmap_operations": 0,
350
"unmap_merged": 0,
351
+ "failed_zone_append_operations": 0,
352
"flush_total_time_ns": 0,
353
"wr_highest_offset": 0,
354
"wr_total_time_ns": 0,
355
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
356
"timed_stats": [
357
],
358
"failed_unmap_operations": 0,
359
+ "zone_append_merged": 0,
360
"failed_flush_operations": 0,
361
"account_invalid": true,
362
"rd_total_time_ns": 0,
363
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
364
"unmap_total_time_ns": 0,
365
"invalid_flush_operations": 0,
366
"account_failed": true,
367
+ "zone_append_total_time_ns": 0,
368
+ "zone_append_operations": 0,
369
"rd_operations": 0,
370
+ "zone_append_bytes": 0,
371
+ "invalid_zone_append_operations": 0,
372
"invalid_wr_operations": 0,
373
"invalid_rd_operations": 0
374
},
375
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
376
"stats": {
377
"unmap_operations": 0,
378
"unmap_merged": 0,
379
+ "failed_zone_append_operations": 0,
380
"flush_total_time_ns": 0,
381
"wr_highest_offset": 0,
382
"wr_total_time_ns": 0,
383
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
384
"timed_stats": [
385
],
386
"failed_unmap_operations": 0,
387
+ "zone_append_merged": 0,
388
"failed_flush_operations": 0,
389
"account_invalid": true,
390
"rd_total_time_ns": 0,
391
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
392
"unmap_total_time_ns": 0,
393
"invalid_flush_operations": 0,
394
"account_failed": true,
395
+ "zone_append_total_time_ns": 0,
396
+ "zone_append_operations": 0,
397
"rd_operations": 0,
398
+ "zone_append_bytes": 0,
399
+ "invalid_zone_append_operations": 0,
400
"invalid_wr_operations": 0,
401
"invalid_rd_operations": 0
402
},
403
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
404
"stats": {
405
"unmap_operations": 0,
406
"unmap_merged": 0,
407
+ "failed_zone_append_operations": 0,
408
"flush_total_time_ns": 0,
409
"wr_highest_offset": 0,
410
"wr_total_time_ns": 0,
411
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
412
"timed_stats": [
413
],
414
"failed_unmap_operations": 0,
415
+ "zone_append_merged": 0,
416
"failed_flush_operations": 0,
417
"account_invalid": true,
418
"rd_total_time_ns": 0,
419
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
420
"unmap_total_time_ns": 0,
421
"invalid_flush_operations": 0,
422
"account_failed": true,
423
+ "zone_append_total_time_ns": 0,
424
+ "zone_append_operations": 0,
425
"rd_operations": 0,
426
+ "zone_append_bytes": 0,
427
+ "invalid_zone_append_operations": 0,
428
"invalid_wr_operations": 0,
429
"invalid_rd_operations": 0
430
},
431
--
432
2.40.1
diff view generated by jsdifflib
1
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
COW (even empty/zero) areas require encryption too
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
5
Message-id: 20230508051916.178322-4-faithilikerun@gmail.com
6
Reviewed-by: Eric Blake <eblake@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Max Reitz <mreitz@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
---
7
---
12
tests/qemu-iotests/134 | 9 +++++++++
8
hw/block/virtio-blk.c | 12 ++++++++++++
13
tests/qemu-iotests/134.out | 10 ++++++++++
9
hw/block/trace-events | 7 +++++++
14
2 files changed, 19 insertions(+)
10
2 files changed, 19 insertions(+)
15
11
16
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/134
19
+++ b/tests/qemu-iotests/134
20
@@ -XXX,XX +XXX,XX @@ echo
21
echo "== reading whole image =="
22
$QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
23
24
+echo
25
+echo "== rewriting cluster part =="
26
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
27
+
28
+echo
29
+echo "== verify pattern =="
30
+$QEMU_IO --object $SECRET -c "read -P 0 0 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
31
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
32
+
33
echo
34
echo "== rewriting whole image =="
35
$QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
36
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
37
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
38
--- a/tests/qemu-iotests/134.out
14
--- a/hw/block/virtio-blk.c
39
+++ b/tests/qemu-iotests/134.out
15
+++ b/hw/block/virtio-blk.c
40
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
41
read 134217728/134217728 bytes at offset 0
17
int64_t nz = data->zone_report_data.nr_zones;
42
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
18
int8_t err_status = VIRTIO_BLK_S_OK;
43
19
44
+== rewriting cluster part ==
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
45
+wrote 512/512 bytes at offset 512
21
if (ret) {
46
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
47
+
23
goto out;
48
+== verify pattern ==
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
49
+read 512/512 bytes at offset 0
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
50
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
26
sizeof(struct virtio_blk_zone_report)) /
51
+read 512/512 bytes at offset 512
27
sizeof(struct virtio_blk_zone_descriptor);
52
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
53
+
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
54
== rewriting whole image ==
30
55
wrote 134217728/134217728 bytes at offset 0
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
56
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
65
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
57
--
99
--
58
2.21.0
100
2.40.1
59
60
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Message-id: 20230508051916.178322-5-faithilikerun@gmail.com
8
[Fix pre-formatted code syntax
9
--Stefan]
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
13
1 file changed, 19 insertions(+)
14
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/docs/devel/zoned-storage.rst
18
+++ b/docs/devel/zoned-storage.rst
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
20
For example, to test zone_report on a null_blk device using qemu-io is::
21
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
23
+
24
+To expose the host's zoned block device through virtio-blk, the command line
25
+can be (includes the -device parameter)::
26
+
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
29
+
30
+Or only use the -drive parameter::
31
+
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
33
+
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
41
+attached to the HBA.
42
--
43
2.40.1
diff view generated by jsdifflib