1
The following changes since commit c1eb2ddf0f8075faddc5f7c3d39feae3e8e9d6b4:
1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
2
2
3
Update version for v8.0.0 release (2023-04-19 17:27:13 +0100)
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 36e5e9b22abe56aa00ca067851555ad8127a7966:
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
10
10
11
tracing: install trace events file only if necessary (2023-04-20 07:39:43 -0400)
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
Sam Li's zoned storage work and fixes I collected during the 8.0 freeze.
16
v2:
17
* Fix format string issues on 32-bit hosts [Peter]
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
19
* Fix missing eventfd.h header on macOS [Peter]
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
17
26
18
----------------------------------------------------------------
27
----------------------------------------------------------------
19
28
20
Carlos Santos (1):
29
Coiby Xu (6):
21
tracing: install trace events file only if necessary
30
libvhost-user: Allow vu_message_read to be replaced
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
32
util/vhost-user-server: generic vhost user server
33
block: move logical block size check function to a common utility
34
function
35
block/export: vhost-user block device backend server
36
MAINTAINERS: Add vhost-user block device backend server maintainer
22
37
23
Philippe Mathieu-Daudé (1):
38
Philippe Mathieu-Daudé (1):
24
block/dmg: Declare a type definition for DMG uncompress function
39
block/nvme: Add driver statistics for access alignment and hw errors
25
40
26
Sam Li (17):
41
Stefan Hajnoczi (16):
27
block/block-common: add zoned device structs
42
util/vhost-user-server: s/fileds/fields/ typo fix
28
block/file-posix: introduce helper functions for sysfs attributes
43
util/vhost-user-server: drop unnecessary QOM cast
29
block/block-backend: add block layer APIs resembling Linux
44
util/vhost-user-server: drop unnecessary watch deletion
30
ZonedBlockDevice ioctls
45
block/export: consolidate request structs into VuBlockReq
31
block/raw-format: add zone operations to pass through requests
46
util/vhost-user-server: drop unused DevicePanicNotifier
32
block: add zoned BlockDriver check to block layer
47
util/vhost-user-server: fix memory leak in vu_message_read()
33
iotests: test new zone operations
48
util/vhost-user-server: check EOF when reading payload
34
block: add some trace events for new block layer APIs
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
35
docs/zoned-storage: add zoned device documentation
50
block/export: report flush errors
36
file-posix: add tracking of the zone write pointers
51
block/export: convert vhost-user-blk server to block export API
37
block: introduce zone append write for zoned devices
52
util/vhost-user-server: move header to include/
38
qemu-iotests: test zone append operation
53
util/vhost-user-server: use static library in meson.build
39
block: add some trace events for zone append
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
40
include: update virtio_blk headers to v6.3-rc1
55
block: move block exports to libblockdev
41
virtio-blk: add zoned storage emulation for zoned devices
56
block/export: add iothread and fixed-iothread options
42
block: add accounting for zone append operation
57
block/export: add vhost-user-blk multi-queue support
43
virtio-blk: add some trace events for zoned emulation
44
docs/zoned-storage:add zoned emulation use case
45
58
46
Thomas De Schampheleire (1):
59
Vladimir Sementsov-Ogievskiy (5):
47
tracetool: use relative paths for '#line' preprocessor directives
60
block/io: fix bdrv_co_block_status_above
61
block/io: bdrv_common_block_status_above: support include_base
62
block/io: bdrv_common_block_status_above: support bs == base
63
block/io: fix bdrv_is_allocated_above
64
iotests: add commit top->base cases to 274
48
65
49
docs/devel/index-api.rst | 1 +
66
MAINTAINERS | 9 +
50
docs/devel/zoned-storage.rst | 62 ++
67
qapi/block-core.json | 24 +-
51
qapi/block-core.json | 68 +-
68
qapi/block-export.json | 36 +-
52
qapi/block.json | 4 +
69
block/coroutines.h | 2 +
53
meson.build | 4 +
70
block/export/vhost-user-blk-server.h | 19 +
54
block/dmg.h | 8 +-
71
contrib/libvhost-user/libvhost-user.h | 21 +
55
include/block/accounting.h | 1 +
72
include/qemu/vhost-user-server.h | 65 +++
56
include/block/block-common.h | 57 ++
73
util/block-helpers.h | 19 +
57
include/block/block-io.h | 13 +
74
block/export/export.c | 37 +-
58
include/block/block_int-common.h | 37 +
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
59
include/block/raw-aio.h | 8 +-
76
block/io.c | 132 +++---
60
include/standard-headers/drm/drm_fourcc.h | 12 +
77
block/nvme.c | 27 ++
61
include/standard-headers/linux/ethtool.h | 48 +-
78
block/qcow2.c | 16 +-
62
include/standard-headers/linux/fuse.h | 45 +-
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
63
include/standard-headers/linux/pci_regs.h | 1 +
80
contrib/libvhost-user/libvhost-user.c | 15 +-
64
include/standard-headers/linux/vhost_types.h | 2 +
81
hw/core/qdev-properties-system.c | 31 +-
65
include/standard-headers/linux/virtio_blk.h | 105 +++
82
nbd/server.c | 2 -
66
include/sysemu/block-backend-io.h | 27 +
83
qemu-nbd.c | 21 +-
67
linux-headers/asm-arm64/kvm.h | 1 +
84
softmmu/vl.c | 4 +
68
linux-headers/asm-x86/kvm.h | 34 +-
85
stubs/blk-exp-close-all.c | 7 +
69
linux-headers/linux/kvm.h | 9 +
86
tests/vhost-user-bridge.c | 2 +
70
linux-headers/linux/vfio.h | 15 +-
87
tools/virtiofsd/fuse_virtio.c | 4 +-
71
linux-headers/linux/vhost.h | 8 +
88
util/block-helpers.c | 46 +++
72
block.c | 19 +
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
73
block/block-backend.c | 193 ++++++
90
block/export/meson.build | 3 +-
74
block/dmg.c | 7 +-
91
contrib/libvhost-user/meson.build | 1 +
75
block/file-posix.c | 677 +++++++++++++++++--
92
meson.build | 22 +-
76
block/io.c | 68 ++
93
nbd/meson.build | 2 +
77
block/io_uring.c | 4 +
94
storage-daemon/meson.build | 3 +-
78
block/linux-aio.c | 3 +
95
stubs/meson.build | 1 +
79
block/qapi-sysemu.c | 11 +
96
tests/qemu-iotests/274 | 20 +
80
block/qapi.c | 18 +
97
tests/qemu-iotests/274.out | 68 ++++
81
block/raw-format.c | 26 +
98
util/meson.build | 4 +
82
hw/block/virtio-blk-common.c | 2 +
99
33 files changed, 1420 insertions(+), 122 deletions(-)
83
hw/block/virtio-blk.c | 405 +++++++++++
100
create mode 100644 block/export/vhost-user-blk-server.h
84
hw/virtio/virtio-qmp.c | 2 +
101
create mode 100644 include/qemu/vhost-user-server.h
85
qemu-io-cmds.c | 224 ++++++
102
create mode 100644 util/block-helpers.h
86
block/trace-events | 4 +
103
create mode 100644 block/export/vhost-user-blk-server.c
87
docs/system/qemu-block-drivers.rst.inc | 6 +
104
create mode 100644 stubs/blk-exp-close-all.c
88
hw/block/trace-events | 7 +
105
create mode 100644 util/block-helpers.c
89
scripts/tracetool/backend/ftrace.py | 4 +-
106
create mode 100644 util/vhost-user-server.c
90
scripts/tracetool/backend/log.py | 4 +-
91
scripts/tracetool/backend/syslog.py | 4 +-
92
tests/qemu-iotests/tests/zoned | 105 +++
93
tests/qemu-iotests/tests/zoned.out | 69 ++
94
trace/meson.build | 2 +-
95
46 files changed, 2353 insertions(+), 81 deletions(-)
96
create mode 100644 docs/devel/zoned-storage.rst
97
create mode 100755 tests/qemu-iotests/tests/zoned
98
create mode 100644 tests/qemu-iotests/tests/zoned.out
99
107
100
--
108
--
101
2.39.2
109
2.26.2
102
110
103
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
2
2
3
Taking account of the new zone append write operation for zoned devices,
3
Keep statistics of some hardware errors, and number of
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
4
aligned/unaligned I/O accesses.
5
write, flush).
6
5
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
QMP example booting a full RHEL 8.3 aarch64 guest:
8
Message-id: 20230407082528.18841-4-faithilikerun@gmail.com
7
8
{ "execute": "query-blockstats" }
9
{
10
"return": [
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
50
---
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
12
qapi/block.json | 4 +++
52
block/nvme.c | 27 +++++++++++++++++++++++++++
13
include/block/accounting.h | 1 +
53
2 files changed, 50 insertions(+), 1 deletion(-)
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
6 files changed, 95 insertions(+), 11 deletions(-)
18
54
19
diff --git a/qapi/block-core.json b/qapi/block-core.json
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
index XXXXXXX..XXXXXXX 100644
56
index XXXXXXX..XXXXXXX 100644
21
--- a/qapi/block-core.json
57
--- a/qapi/block-core.json
22
+++ b/qapi/block-core.json
58
+++ b/qapi/block-core.json
23
@@ -XXX,XX +XXX,XX @@
59
@@ -XXX,XX +XXX,XX @@
24
# @min_wr_latency_ns: Minimum latency of write operations in the
60
'discard-nb-failed': 'uint64',
25
# defined interval, in nanoseconds.
61
'discard-bytes-ok': 'uint64' } }
26
#
62
27
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
63
+##
28
+# in the defined interval, in nanoseconds
64
+# @BlockStatsSpecificNvme:
29
+# (since 8.1)
30
+#
65
+#
31
# @min_flush_latency_ns: Minimum latency of flush operations in the
66
+# NVMe driver statistics
32
# defined interval, in nanoseconds.
67
+#
68
+# @completion-errors: The number of completion errors.
69
+#
70
+# @aligned-accesses: The number of aligned accesses performed by
71
+# the driver.
72
+#
73
+# @unaligned-accesses: The number of unaligned accesses performed by
74
+# the driver.
75
+#
76
+# Since: 5.2
77
+##
78
+{ 'struct': 'BlockStatsSpecificNvme',
79
+ 'data': {
80
+ 'completion-errors': 'uint64',
81
+ 'aligned-accesses': 'uint64',
82
+ 'unaligned-accesses': 'uint64' } }
83
+
84
##
85
# @BlockStatsSpecific:
33
#
86
#
34
@@ -XXX,XX +XXX,XX @@
87
@@ -XXX,XX +XXX,XX @@
35
# @max_wr_latency_ns: Maximum latency of write operations in the
88
'discriminator': 'driver',
36
# defined interval, in nanoseconds.
89
'data': {
37
#
90
'file': 'BlockStatsSpecificFile',
38
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
91
- 'host_device': 'BlockStatsSpecificFile' } }
39
+# in the defined interval, in nanoseconds
92
+ 'host_device': 'BlockStatsSpecificFile',
40
+# (since 8.1)
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
41
+#
94
42
# @max_flush_latency_ns: Maximum latency of flush operations in the
43
# defined interval, in nanoseconds.
44
#
45
@@ -XXX,XX +XXX,XX @@
46
# @avg_wr_latency_ns: Average latency of write operations in the
47
# defined interval, in nanoseconds.
48
#
49
+# @avg_zone_append_latency_ns: Average latency of zone append operations
50
+# in the defined interval, in nanoseconds
51
+# (since 8.1)
52
+#
53
# @avg_flush_latency_ns: Average latency of flush operations in the
54
# defined interval, in nanoseconds.
55
#
56
@@ -XXX,XX +XXX,XX @@
57
# @avg_wr_queue_depth: Average number of pending write operations
58
# in the defined interval.
59
#
60
+# @avg_zone_append_queue_depth: Average number of pending zone append
61
+# operations in the defined interval
62
+# (since 8.1).
63
+#
64
# Since: 2.5
65
##
95
##
66
{ 'struct': 'BlockDeviceTimedStats',
96
# @BlockStats:
67
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
97
diff --git a/block/nvme.c b/block/nvme.c
68
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
69
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
70
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
71
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
72
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
73
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
74
+ 'max_zone_append_latency_ns': 'int',
75
+ 'avg_zone_append_latency_ns': 'int',
76
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
77
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
78
+ 'avg_wr_queue_depth': 'number',
79
+ 'avg_zone_append_queue_depth': 'number' } }
80
81
##
82
# @BlockDeviceStats:
83
@@ -XXX,XX +XXX,XX @@
84
#
85
# @wr_bytes: The number of bytes written by the device.
86
#
87
+# @zone_append_bytes: The number of bytes appended by the zoned devices
88
+# (since 8.1)
89
+#
90
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
91
#
92
# @rd_operations: The number of read operations performed by the device.
93
#
94
# @wr_operations: The number of write operations performed by the device.
95
#
96
+# @zone_append_operations: The number of zone append operations performed
97
+# by the zoned devices (since 8.1)
98
+#
99
# @flush_operations: The number of cache flush operations performed by the
100
# device (since 0.15)
101
#
102
@@ -XXX,XX +XXX,XX @@
103
#
104
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15).
105
#
106
+# @zone_append_total_time_ns: Total time spent on zone append writes
107
+# in nanoseconds (since 8.1)
108
+#
109
# @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
110
# (since 0.15).
111
#
112
@@ -XXX,XX +XXX,XX @@
113
# @wr_merged: Number of write requests that have been merged into another
114
# request (Since 2.3).
115
#
116
+# @zone_append_merged: Number of zone append requests that have been merged
117
+# into another request (since 8.1)
118
+#
119
# @unmap_merged: Number of unmap requests that have been merged into another
120
# request (Since 4.2)
121
#
122
@@ -XXX,XX +XXX,XX @@
123
# @failed_wr_operations: The number of failed write operations
124
# performed by the device (Since 2.5)
125
#
126
+# @failed_zone_append_operations: The number of failed zone append write
127
+# operations performed by the zoned devices
128
+# (since 8.1)
129
+#
130
# @failed_flush_operations: The number of failed flush operations
131
# performed by the device (Since 2.5)
132
#
133
@@ -XXX,XX +XXX,XX @@
134
# @invalid_wr_operations: The number of invalid write operations
135
# performed by the device (Since 2.5)
136
#
137
+# @invalid_zone_append_operations: The number of invalid zone append operations
138
+# performed by the zoned device (since 8.1)
139
+#
140
# @invalid_flush_operations: The number of invalid flush operations
141
# performed by the device (Since 2.5)
142
#
143
@@ -XXX,XX +XXX,XX @@
144
#
145
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
146
#
147
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
148
+#
149
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
150
#
151
# Since: 0.14
152
##
153
{ 'struct': 'BlockDeviceStats',
154
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
155
- 'rd_operations': 'int', 'wr_operations': 'int',
156
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
157
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
158
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
159
'flush_operations': 'int', 'unmap_operations': 'int',
160
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
161
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
162
- 'wr_highest_offset': 'int',
163
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
164
- '*idle_time_ns': 'int',
165
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
166
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
167
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
168
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
169
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
170
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
171
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
172
+ 'failed_zone_append_operations': 'int',
173
+ 'failed_flush_operations': 'int',
174
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
175
+ 'invalid_wr_operations': 'int',
176
+ 'invalid_zone_append_operations': 'int',
177
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
178
'account_invalid': 'bool', 'account_failed': 'bool',
179
'timed_stats': ['BlockDeviceTimedStats'],
180
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
181
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
182
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
184
185
##
186
diff --git a/qapi/block.json b/qapi/block.json
187
index XXXXXXX..XXXXXXX 100644
98
index XXXXXXX..XXXXXXX 100644
188
--- a/qapi/block.json
99
--- a/block/nvme.c
189
+++ b/qapi/block.json
100
+++ b/block/nvme.c
190
@@ -XXX,XX +XXX,XX @@
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
191
# @boundaries-write: list of interval boundary values for write latency
102
192
# histogram.
103
/* PCI address (required for nvme_refresh_filename()) */
193
#
104
char *device;
194
+# @boundaries-zap: list of interval boundary values for zone append write
105
+
195
+# latency histogram.
106
+ struct {
196
+#
107
+ uint64_t completion_errors;
197
# @boundaries-flush: list of interval boundary values for flush latency
108
+ uint64_t aligned_accesses;
198
# histogram.
109
+ uint64_t unaligned_accesses;
199
#
110
+ } stats;
200
@@ -XXX,XX +XXX,XX @@
201
'*boundaries': ['uint64'],
202
'*boundaries-read': ['uint64'],
203
'*boundaries-write': ['uint64'],
204
+ '*boundaries-zap': ['uint64'],
205
'*boundaries-flush': ['uint64'] },
206
'allow-preconfig': true }
207
diff --git a/include/block/accounting.h b/include/block/accounting.h
208
index XXXXXXX..XXXXXXX 100644
209
--- a/include/block/accounting.h
210
+++ b/include/block/accounting.h
211
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
212
BLOCK_ACCT_READ,
213
BLOCK_ACCT_WRITE,
214
BLOCK_ACCT_FLUSH,
215
+ BLOCK_ACCT_ZONE_APPEND,
216
BLOCK_ACCT_UNMAP,
217
BLOCK_MAX_IOTYPE,
218
};
111
};
219
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
112
220
index XXXXXXX..XXXXXXX 100644
113
#define NVME_BLOCK_OPT_DEVICE "device"
221
--- a/block/qapi-sysemu.c
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
222
+++ b/block/qapi-sysemu.c
115
break;
223
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
224
bool has_boundaries, uint64List *boundaries,
225
bool has_boundaries_read, uint64List *boundaries_read,
226
bool has_boundaries_write, uint64List *boundaries_write,
227
+ bool has_boundaries_append, uint64List *boundaries_append,
228
bool has_boundaries_flush, uint64List *boundaries_flush,
229
Error **errp)
230
{
231
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
232
}
116
}
117
ret = nvme_translate_error(c);
118
+ if (ret) {
119
+ s->stats.completion_errors++;
120
+ }
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
233
}
130
}
234
131
+ s->stats.unaligned_accesses++;
235
+ if (has_boundaries || has_boundaries_append) {
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
236
+ ret = block_latency_histogram_set(
133
buf = qemu_try_memalign(s->page_size, bytes);
237
+ stats, BLOCK_ACCT_ZONE_APPEND,
134
238
+ has_boundaries_append ? boundaries_append : boundaries);
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
239
+ if (ret) {
136
qemu_vfio_dma_unmap(s->vfio, host);
240
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
137
}
241
+ return;
138
242
+ }
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
243
+ }
140
+{
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
142
+ BDRVNVMeState *s = bs->opaque;
244
+
143
+
245
if (has_boundaries || has_boundaries_flush) {
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
246
ret = block_latency_histogram_set(
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
247
stats, BLOCK_ACCT_FLUSH,
146
+ .completion_errors = s->stats.completion_errors,
248
diff --git a/block/qapi.c b/block/qapi.c
147
+ .aligned_accesses = s->stats.aligned_accesses,
249
index XXXXXXX..XXXXXXX 100644
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
250
--- a/block/qapi.c
149
+ };
251
+++ b/block/qapi.c
252
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
253
254
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
255
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
256
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
257
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
258
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
259
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
260
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
261
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
262
263
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
264
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
265
+ ds->failed_zone_append_operations =
266
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
267
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
268
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
269
270
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
271
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
272
+ ds->invalid_zone_append_operations =
273
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
274
ds->invalid_flush_operations =
275
stats->invalid_ops[BLOCK_ACCT_FLUSH];
276
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
277
278
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
279
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
280
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
281
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
282
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
283
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
284
+ ds->zone_append_total_time_ns =
285
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
286
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
287
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
288
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
289
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
290
291
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
292
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
293
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
294
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
295
296
dev_stats->interval_length = ts->interval_length;
297
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
298
dev_stats->max_wr_latency_ns = timed_average_max(wr);
299
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
300
301
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
302
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
303
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
304
+
150
+
305
dev_stats->min_flush_latency_ns = timed_average_min(fl);
151
+ return stats;
306
dev_stats->max_flush_latency_ns = timed_average_max(fl);
152
+}
307
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
308
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
309
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
310
dev_stats->avg_wr_queue_depth =
311
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
312
+ dev_stats->avg_zone_append_queue_depth =
313
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
314
315
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
316
}
317
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
318
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
319
ds->wr_latency_histogram
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
321
+ ds->zone_append_latency_histogram
322
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
323
ds->flush_latency_histogram
324
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
325
}
326
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
327
index XXXXXXX..XXXXXXX 100644
328
--- a/hw/block/virtio-blk.c
329
+++ b/hw/block/virtio-blk.c
330
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
331
data->in_num = in_num;
332
data->zone_append_data.offset = offset;
333
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
334
+
153
+
335
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
154
static const char *const nvme_strong_runtime_opts[] = {
336
+ BLOCK_ACCT_ZONE_APPEND);
155
NVME_BLOCK_OPT_DEVICE,
337
+
156
NVME_BLOCK_OPT_NAMESPACE,
338
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
339
virtio_blk_zone_append_complete, data);
158
.bdrv_refresh_filename = nvme_refresh_filename,
340
return 0;
159
.bdrv_refresh_limits = nvme_refresh_limits,
160
.strong_runtime_opts = nvme_strong_runtime_opts,
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
162
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
341
--
165
--
342
2.39.2
166
2.26.2
167
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
3
Allow vu_message_read to be replaced by one which will make use of the
4
QIOChannel functions. Thus reading vhost-user message won't stall the
5
guest. For slave channel, we still use the default vu_message_read.
6
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20230407082528.18841-5-faithilikerun@gmail.com
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
12
---
8
hw/block/virtio-blk.c | 12 ++++++++++++
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
9
hw/block/trace-events | 7 +++++++
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
10
2 files changed, 19 insertions(+)
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
16
tests/vhost-user-bridge.c | 2 ++
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
5 files changed, 33 insertions(+), 10 deletions(-)
11
19
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
13
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/block/virtio-blk.c
22
--- a/contrib/libvhost-user/libvhost-user.h
15
+++ b/hw/block/virtio-blk.c
23
+++ b/contrib/libvhost-user/libvhost-user.h
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
24
@@ -XXX,XX +XXX,XX @@
17
int64_t nz = data->zone_report_data.nr_zones;
25
*/
18
int8_t err_status = VIRTIO_BLK_S_OK;
26
#define VHOST_USER_MAX_RAM_SLOTS 32
19
27
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
21
if (ret) {
29
+
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
30
typedef enum VhostSetConfigType {
23
goto out;
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
26
sizeof(struct virtio_blk_zone_report)) /
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
27
sizeof(struct virtio_blk_zone_descriptor);
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
36
int *do_reply);
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
30
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
32
data = g_malloc(sizeof(ZoneCmdData));
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
42
bool broken;
43
uint16_t max_queues;
44
45
+ /* @read_msg: custom method to read vhost-user message
46
+ *
47
+ * Read data from vhost_user socket fd and fill up
48
+ * the passed VhostUserMsg *vmsg struct.
49
+ *
50
+ * If reading fails, it should close the received set of file
51
+ * descriptors as socket message's auxiliary data.
52
+ *
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
54
+ * which will be used by default if not custom method is provided when
55
+ * calling vu_init
56
+ *
57
+ * Returns: true if vhost-user message successfully received,
58
+ * otherwise return false.
59
+ *
60
+ */
61
+ vu_read_msg_cb read_msg;
62
/* @set_watch: add or update the given fd to the watch set,
63
* call cb when condition is met */
64
vu_set_watch_cb set_watch;
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
66
uint16_t max_queues,
67
int socket,
68
vu_panic_cb panic,
69
+ vu_read_msg_cb read_msg,
70
vu_set_watch_cb set_watch,
71
vu_remove_watch_cb remove_watch,
72
const VuDevIface *iface);
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
g_assert(dev);
79
g_assert(iface);
80
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
remove_watch, iface)) {
84
return false;
85
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/contrib/libvhost-user/libvhost-user.c
89
+++ b/contrib/libvhost-user/libvhost-user.c
90
@@ -XXX,XX +XXX,XX @@
91
/* The version of inflight buffer */
92
#define INFLIGHT_VERSION 1
93
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
95
-
96
/* The version of the protocol we support */
97
#define VHOST_USER_VERSION 1
98
#define LIBVHOST_USER_DEBUG 0
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
100
}
101
102
static bool
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
34
{
105
{
35
VirtIOBlockReq *req = opaque;
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
36
VirtIOBlock *s = req->dev;
107
struct iovec iov = {
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
109
goto out;
65
}
110
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
111
67
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
68
out:
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
114
goto out;
77
}
115
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
116
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
118
/* Wait for QEMU to confirm that it's registered the handler for the
119
* faults.
120
*/
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
123
vmsg->size != sizeof(vmsg->payload.u64) ||
124
vmsg->payload.u64 != 0) {
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
127
int reply_requested;
128
bool need_reply, success = false;
129
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
79
index XXXXXXX..XXXXXXX 100644
161
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
162
--- a/tests/vhost-user-bridge.c
81
+++ b/hw/block/trace-events
163
+++ b/tests/vhost-user-bridge.c
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
83
# virtio-blk.c
165
VHOST_USER_BRIDGE_MAX_QUEUES,
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
166
conn_fd,
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
167
vubr_panic,
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
168
+ NULL,
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
169
vubr_set_watch,
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
170
vubr_remove_watch,
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
171
&vuiface)) {
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
173
VHOST_USER_BRIDGE_MAX_QUEUES,
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
174
dev->sock,
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
175
vubr_panic,
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
176
+ NULL,
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
177
vubr_set_watch,
96
178
vubr_remove_watch,
97
# hd-geometry.c
179
&vuiface)) {
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
99
--
195
--
100
2.39.2
196
2.26.2
197
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
When the client is running in gdb and quit command is run in gdb,
4
QEMU will still dispatch the event which will cause segment fault in
5
the callback function.
6
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.c | 1 +
14
1 file changed, 1 insertion(+)
15
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/contrib/libvhost-user/libvhost-user.c
19
+++ b/contrib/libvhost-user/libvhost-user.c
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
21
}
22
23
if (vq->kick_fd != -1) {
24
+ dev->remove_watch(dev, vq->kick_fd);
25
close(vq->kick_fd);
26
vq->kick_fd = -1;
27
}
28
--
29
2.26.2
30
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Add the documentation about the zoned device support to virtio-blk
3
Sharing QEMU devices via vhost-user protocol.
4
emulation.
5
4
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
5
Only one vhost-user client can connect to the server one time.
6
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
13
[Fixed size_t %lu -> %zu format string compiler error.
11
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
12
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
13
any toctree" error.
14
--Stefan]
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
16
---
17
docs/devel/index-api.rst | 1 +
17
util/vhost-user-server.h | 65 ++++++
18
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
19
docs/system/qemu-block-drivers.rst.inc | 6 ++++
19
util/meson.build | 1 +
20
3 files changed, 50 insertions(+)
20
3 files changed, 494 insertions(+)
21
create mode 100644 docs/devel/zoned-storage.rst
21
create mode 100644 util/vhost-user-server.h
22
create mode 100644 util/vhost-user-server.c
22
23
23
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
24
index XXXXXXX..XXXXXXX 100644
25
--- a/docs/devel/index-api.rst
26
+++ b/docs/devel/index-api.rst
27
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
28
memory
29
modules
30
ui
31
+ zoned-storage
32
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
33
new file mode 100644
25
new file mode 100644
34
index XXXXXXX..XXXXXXX
26
index XXXXXXX..XXXXXXX
35
--- /dev/null
27
--- /dev/null
36
+++ b/docs/devel/zoned-storage.rst
28
+++ b/util/vhost-user-server.h
37
@@ -XXX,XX +XXX,XX @@
29
@@ -XXX,XX +XXX,XX @@
38
+=============
30
+/*
39
+zoned-storage
31
+ * Sharing QEMU devices via vhost-user protocol
40
+=============
32
+ *
41
+
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
42
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
34
+ * Copyright (c) 2020 Red Hat, Inc.
43
+that are larger than the LBA size. They can only allow sequential writes, which
35
+ *
44
+can reduce write amplification in SSDs, and potentially lead to higher
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
45
+throughput and increased capacity. More details about ZBDs can be found at:
37
+ * later. See the COPYING file in the top-level directory.
46
+
38
+ */
47
+https://zonedstorage.io/docs/introduction/zoned-storage
39
+
48
+
40
+#ifndef VHOST_USER_SERVER_H
49
+1. Block layer APIs for zoned storage
41
+#define VHOST_USER_SERVER_H
50
+-------------------------------------
42
+
51
+QEMU block layer supports three zoned storage models:
43
+#include "contrib/libvhost-user/libvhost-user.h"
52
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
44
+#include "io/channel-socket.h"
53
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
45
+#include "io/channel-file.h"
54
+manage the zones of a device.
46
+#include "io/net-listener.h"
55
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
47
+#include "qemu/error-report.h"
56
+zones, making it backward compatible with regular block devices.
48
+#include "qapi/error.h"
57
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
49
+#include "standard-headers/linux/virtio_blk.h"
58
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
50
+
59
+supported.
51
+typedef struct VuFdWatch {
60
+
52
+ VuDev *vu_dev;
61
+The block device information resides inside BlockDriverState. QEMU uses
53
+ int fd; /*kick fd*/
62
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
54
+ void *pvt;
63
+block layer while processing I/O requests. A BlockBackend has a root pointer to
55
+ vu_watch_cb cb;
64
+a BlockDriverState graph(for example, raw format on top of file-posix). The
56
+ bool processing;
65
+zoned storage information can be propagated from the leaf BlockDriverState all
57
+ QTAILQ_ENTRY(VuFdWatch) next;
66
+the way up to the BlockBackend. If the zoned storage model in file-posix is
58
+} VuFdWatch;
67
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
59
+
68
+
60
+typedef struct VuServer VuServer;
69
+The block layer APIs support commands needed for zoned storage devices,
61
+typedef void DevicePanicNotifierFn(VuServer *server);
70
+including report zones, four zone operations, and zone append.
62
+
71
+
63
+struct VuServer {
72
+2. Emulating zoned storage controllers
64
+ QIONetListener *listener;
73
+--------------------------------------
65
+ AioContext *ctx;
74
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
66
+ DevicePanicNotifierFn *device_panic_notifier;
75
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
67
+ int max_queues;
76
+APIs for zoned storage emulation or testing.
68
+ const VuDevIface *vu_iface;
77
+
69
+ VuDev vu_dev;
78
+For example, to test zone_report on a null_blk device using qemu-io is:
70
+ QIOChannel *ioc; /* The I/O channel with the client */
79
+$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
80
+-c "zrp offset nr_zones"
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
81
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
81
+
82
+bool vhost_user_server_start(VuServer *server,
83
+ SocketAddress *unix_socket,
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
97
index XXXXXXX..XXXXXXX
98
--- /dev/null
99
+++ b/util/vhost-user-server.c
100
@@ -XXX,XX +XXX,XX @@
101
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
103
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
105
+ * Copyright (c) 2020 Red Hat, Inc.
106
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
108
+ * later. See the COPYING file in the top-level directory.
109
+ */
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
112
+#include "vhost-user-server.h"
113
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
115
+{
116
+ int i;
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
121
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
123
+{
124
+ int i;
125
+ for (i = 0; i < vmsg->fd_num; i++) {
126
+ qemu_set_nonblock(vmsg->fds[i]);
127
+ }
128
+}
129
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
131
+ gpointer opaque);
132
+
133
+static void close_client(VuServer *server)
134
+{
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
147
+ QIOChannelSocket *sioc = server->sioc;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
149
+ server->sioc = NULL;
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
179
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
181
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
208
+
209
+static bool coroutine_fn
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
211
+{
212
+ struct iovec iov = {
213
+ .iov_base = (char *)vmsg,
214
+ .iov_len = VHOST_USER_HDR_SIZE,
215
+ };
216
+ int rc, read_bytes = 0;
217
+ Error *local_err = NULL;
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
233
+ if (!ioc) {
234
+ error_report_err(local_err);
235
+ goto fail;
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
305
+
306
+static void vu_client_start(VuServer *server);
307
+static coroutine_fn void vu_client_trip(void *opaque)
308
+{
309
+ VuServer *server = opaque;
310
+
311
+ while (!server->aio_context_changed && server->sioc) {
312
+ server->processing_msg = true;
313
+ vu_dispatch(&server->vu_dev);
314
+ server->processing_msg = false;
315
+ }
316
+
317
+ if (server->aio_context_changed && server->sioc) {
318
+ server->aio_context_changed = false;
319
+ vu_client_start(server);
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
344
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
346
+{
347
+
348
+ VuFdWatch *vu_fd_watch, *next;
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
350
+ if (vu_fd_watch->fd == fd) {
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
384
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
386
+{
387
+ VuServer *server;
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
404
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
406
+ gpointer opaque)
407
+{
408
+ VuServer *server = opaque;
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
82
index XXXXXXX..XXXXXXX 100644
530
index XXXXXXX..XXXXXXX 100644
83
--- a/docs/system/qemu-block-drivers.rst.inc
531
--- a/util/meson.build
84
+++ b/docs/system/qemu-block-drivers.rst.inc
532
+++ b/util/meson.build
85
@@ -XXX,XX +XXX,XX @@ Hard disks
533
@@ -XXX,XX +XXX,XX @@ if have_block
86
you may corrupt your host data (use the ``-snapshot`` command
534
util_ss.add(files('main-loop.c'))
87
line option or modify the device permissions accordingly).
535
util_ss.add(files('nvdimm-utils.c'))
88
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
89
+Zoned block devices
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
90
+ Zoned block devices can be passed through to the guest if the emulated storage
538
util_ss.add(files('qemu-coroutine-sleep.c'))
91
+ controller supports zoned storage. Use ``--blockdev host_device,
539
util_ss.add(files('qemu-co-shared-resource.c'))
92
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
93
+ ``/dev/nullb0`` as ``drive0``.
94
+
95
Windows
96
^^^^^^^
97
98
--
541
--
99
2.39.2
542
2.26.2
543
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
3
Move the constants from hw/core/qdev-properties.c to
4
allowed.
4
util/block-helpers.h so that knowledge of the min/max values is
5
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
11
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org> and clarify that the check is about zoned
14
BlockDrivers.
15
--Stefan]
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
13
---
18
include/block/block_int-common.h | 5 +++++
14
util/block-helpers.h | 19 +++++++++++++
19
block.c | 19 +++++++++++++++++++
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
20
block/file-posix.c | 12 ++++++++++++
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
21
block/raw-format.c | 1 +
17
util/meson.build | 1 +
22
4 files changed, 37 insertions(+)
18
4 files changed, 71 insertions(+), 26 deletions(-)
19
create mode 100644 util/block-helpers.h
20
create mode 100644 util/block-helpers.c
23
21
24
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
23
new file mode 100644
24
index XXXXXXX..XXXXXXX
25
--- /dev/null
26
+++ b/util/block-helpers.h
27
@@ -XXX,XX +XXX,XX @@
28
+#ifndef BLOCK_HELPERS_H
29
+#define BLOCK_HELPERS_H
30
+
31
+#include "qemu/units.h"
32
+
33
+/* lower limit is sector size */
34
+#define MIN_BLOCK_SIZE INT64_C(512)
35
+#define MIN_BLOCK_SIZE_STR "512 B"
36
+/*
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
38
+ * matches qcow2 cluster size limit
39
+ */
40
+#define MAX_BLOCK_SIZE (2 * MiB)
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
42
+
43
+void check_block_size(const char *id, const char *name, int64_t value,
44
+ Error **errp);
45
+
46
+#endif /* BLOCK_HELPERS_H */
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
25
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
26
--- a/include/block/block_int-common.h
49
--- a/hw/core/qdev-properties-system.c
27
+++ b/include/block/block_int-common.h
50
+++ b/hw/core/qdev-properties-system.c
28
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
51
@@ -XXX,XX +XXX,XX @@
29
*/
52
#include "sysemu/blockdev.h"
30
bool is_format;
53
#include "net/net.h"
31
54
#include "hw/pci/pci.h"
32
+ /*
55
+#include "util/block-helpers.h"
33
+ * Set to true if the BlockDriver supports zoned children.
56
34
+ */
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
35
+ bool supports_zoned_children;
58
const void *old_val, const char *new_val,
36
+
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
37
/*
60
38
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
61
/* --- blocksize --- */
39
* this field set to true, except ones that are defined only by their
62
40
diff --git a/block.c b/block.c
63
-/* lower limit is sector size */
41
index XXXXXXX..XXXXXXX 100644
64
-#define MIN_BLOCK_SIZE 512
42
--- a/block.c
65
-#define MIN_BLOCK_SIZE_STR "512 B"
43
+++ b/block.c
66
-/*
44
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
68
- * matches qcow2 cluster size limit
69
- */
70
-#define MAX_BLOCK_SIZE (2 * MiB)
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
72
-
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
74
void *opaque, Error **errp)
75
{
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
77
Property *prop = opaque;
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
79
uint64_t value;
80
+ Error *local_err = NULL;
81
82
if (dev->realized) {
83
qdev_prop_set_after_realize(dev, name, errp);
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
85
if (!visit_type_size(v, name, &value, errp)) {
45
return;
86
return;
46
}
87
}
47
88
- /* value of 0 means "unset" */
48
+ /*
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
49
+ * Non-zoned block drivers do not follow zoned storage constraints
90
- error_setg(errp,
50
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
91
- "Property %s.%s doesn't take value %" PRIu64
51
+ * drivers in a graph.
92
- " (minimum: " MIN_BLOCK_SIZE_STR
52
+ */
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
53
+ if (!parent_bs->drv->supports_zoned_children &&
94
- dev->id ? : "", name, value);
54
+ child_bs->bl.zoned == BLK_Z_HM) {
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
55
+ /*
96
+ if (local_err) {
56
+ * The host-aware model allows zoned storage constraints and random
97
+ error_propagate(errp, local_err);
57
+ * write. Allow mixing host-aware and non-zoned drivers. Using
98
return;
58
+ * host-aware device as a regular device.
99
}
59
+ */
100
-
60
+ error_setg(errp, "Cannot add a %s child to a %s parent",
101
- /* We rely on power-of-2 blocksizes for bitmasks */
61
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
102
- if ((value & (value - 1)) != 0) {
62
+ parent_bs->drv->supports_zoned_children ?
103
- error_setg(errp,
63
+ "support zoned children" : "not support zoned children");
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
114
index XXXXXXX..XXXXXXX
115
--- /dev/null
116
+++ b/util/block-helpers.c
117
@@ -XXX,XX +XXX,XX @@
118
+/*
119
+ * Block utility functions
120
+ *
121
+ * Copyright IBM, Corp. 2011
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
123
+ *
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
125
+ * See the COPYING file in the top-level directory.
126
+ */
127
+
128
+#include "qemu/osdep.h"
129
+#include "qapi/error.h"
130
+#include "qapi/qmp/qerror.h"
131
+#include "block-helpers.h"
132
+
133
+/**
134
+ * check_block_size:
135
+ * @id: The unique ID of the object
136
+ * @name: The name of the property being validated
137
+ * @value: The block size in bytes
138
+ * @errp: A pointer to an area to store an error
139
+ *
140
+ * This function checks that the block size meets the following conditions:
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
144
+ */
145
+void check_block_size(const char *id, const char *name, int64_t value,
146
+ Error **errp)
147
+{
148
+ /* value of 0 means "unset" */
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
64
+ return;
152
+ return;
65
+ }
153
+ }
66
+
154
+
67
if (!QLIST_EMPTY(&child_bs->parents)) {
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
68
error_setg(errp, "The node %s already has a parent",
156
+ if ((value & (value - 1)) != 0) {
69
child_bs->node_name);
157
+ error_setg(errp,
70
diff --git a/block/file-posix.c b/block/file-posix.c
158
+ "Property %s.%s doesn't take value '%" PRId64
159
+ "', it's not a power of 2",
160
+ id, name, value);
161
+ return;
162
+ }
163
+}
164
diff --git a/util/meson.build b/util/meson.build
71
index XXXXXXX..XXXXXXX 100644
165
index XXXXXXX..XXXXXXX 100644
72
--- a/block/file-posix.c
166
--- a/util/meson.build
73
+++ b/block/file-posix.c
167
+++ b/util/meson.build
74
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
168
@@ -XXX,XX +XXX,XX @@ if have_block
75
goto fail;
169
util_ss.add(files('nvdimm-utils.c'))
76
}
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
77
}
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
78
+#ifdef CONFIG_BLKZONED
172
+ util_ss.add(files('block-helpers.c'))
79
+ /*
173
util_ss.add(files('qemu-coroutine-sleep.c'))
80
+ * The kernel page cache does not reliably work for writes to SWR zones
174
util_ss.add(files('qemu-co-shared-resource.c'))
81
+ * of zoned block device because it can not guarantee the order of writes.
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
82
+ */
83
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
84
+ (!(s->open_flags & O_DIRECT))) {
85
+ error_setg(errp, "The driver supports zoned devices, and it requires "
86
+ "cache.direct=on, which was not specified.");
87
+ return -EINVAL; /* No host kernel page cache */
88
+ }
89
+#endif
90
91
if (S_ISBLK(st.st_mode)) {
92
#ifdef __linux__
93
diff --git a/block/raw-format.c b/block/raw-format.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/block/raw-format.c
96
+++ b/block/raw-format.c
97
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
98
BlockDriver bdrv_raw = {
99
.format_name = "raw",
100
.instance_size = sizeof(BDRVRawState),
101
+ .supports_zoned_children = true,
102
.bdrv_probe = &raw_probe,
103
.bdrv_reopen_prepare = &raw_reopen_prepare,
104
.bdrv_reopen_commit = &raw_reopen_commit,
105
--
176
--
106
2.39.2
177
2.26.2
107
178
108
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
The new block layer APIs of zoned block devices can be tested by:
3
By making use of libvhost-user, block device drive can be shared to
4
$ tests/qemu-iotests/check zoned
4
the connected vhost-user client. Only one client can connect to the
5
Run each zone operation on a newly created null_blk device
5
server one time.
6
and see whether it outputs the same zone information.
7
6
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Since vhost-user-server needs a block drive to be created first, delay
8
the creation of this object.
9
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
11
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
13
<philmd@linaro.org>.
17
following compiler warning:
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
20
32-bit hosts.
14
--Stefan]
21
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
23
---
17
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
24
block/export/vhost-user-blk-server.h | 36 ++
18
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
19
2 files changed, 142 insertions(+)
26
softmmu/vl.c | 4 +
20
create mode 100755 tests/qemu-iotests/tests/zoned
27
block/meson.build | 1 +
21
create mode 100644 tests/qemu-iotests/tests/zoned.out
28
4 files changed, 702 insertions(+)
29
create mode 100644 block/export/vhost-user-blk-server.h
30
create mode 100644 block/export/vhost-user-blk-server.c
22
31
23
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
24
new file mode 100755
25
index XXXXXXX..XXXXXXX
26
--- /dev/null
27
+++ b/tests/qemu-iotests/tests/zoned
28
@@ -XXX,XX +XXX,XX @@
29
+#!/usr/bin/env bash
30
+#
31
+# Test zone management operations.
32
+#
33
+
34
+seq="$(basename $0)"
35
+echo "QA output created by $seq"
36
+status=1 # failure is the default!
37
+
38
+_cleanup()
39
+{
40
+ _cleanup_test_img
41
+ sudo -n rmmod null_blk
42
+}
43
+trap "_cleanup; exit \$status" 0 1 2 3 15
44
+
45
+# get standard environment, filters and checks
46
+. ../common.rc
47
+. ../common.filter
48
+. ../common.qemu
49
+
50
+# This test only runs on Linux hosts with raw image files.
51
+_supported_fmt raw
52
+_supported_proto file
53
+_supported_os Linux
54
+
55
+sudo -n true || \
56
+ _notrun 'Password-less sudo required'
57
+
58
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
59
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
60
+
61
+echo "Testing a null_blk device:"
62
+echo "case 1: if the operations work"
63
+sudo -n modprobe null_blk nr_devices=1 zoned=1
64
+sudo -n chmod 0666 /dev/nullb0
65
+
66
+echo "(1) report the first zone:"
67
+$QEMU_IO $IMG -c "zrp 0 1"
68
+echo
69
+echo "report the first 10 zones"
70
+$QEMU_IO $IMG -c "zrp 0 10"
71
+echo
72
+echo "report the last zone:"
73
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
74
+echo
75
+echo
76
+echo "(2) opening the first zone"
77
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
78
+echo "report after:"
79
+$QEMU_IO $IMG -c "zrp 0 1"
80
+echo
81
+echo "opening the second zone"
82
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
83
+echo "report after:"
84
+$QEMU_IO $IMG -c "zrp 268435456 1"
85
+echo
86
+echo "opening the last zone"
87
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
88
+echo "report after:"
89
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
90
+echo
91
+echo
92
+echo "(3) closing the first zone"
93
+$QEMU_IO $IMG -c "zc 0 268435456"
94
+echo "report after:"
95
+$QEMU_IO $IMG -c "zrp 0 1"
96
+echo
97
+echo "closing the last zone"
98
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
99
+echo "report after:"
100
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
101
+echo
102
+echo
103
+echo "(4) finishing the second zone"
104
+$QEMU_IO $IMG -c "zf 268435456 268435456"
105
+echo "After finishing a zone:"
106
+$QEMU_IO $IMG -c "zrp 268435456 1"
107
+echo
108
+echo
109
+echo "(5) resetting the second zone"
110
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
111
+echo "After resetting a zone:"
112
+$QEMU_IO $IMG -c "zrp 268435456 1"
113
+
114
+# success, all done
115
+echo "*** done"
116
+rm -f $seq.full
117
+status=0
118
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
119
new file mode 100644
33
new file mode 100644
120
index XXXXXXX..XXXXXXX
34
index XXXXXXX..XXXXXXX
121
--- /dev/null
35
--- /dev/null
122
+++ b/tests/qemu-iotests/tests/zoned.out
36
+++ b/block/export/vhost-user-blk-server.h
123
@@ -XXX,XX +XXX,XX @@
37
@@ -XXX,XX +XXX,XX @@
124
+QA output created by zoned
38
+/*
125
+Testing a null_blk device:
39
+ * Sharing QEMU block devices via vhost-user protocal
126
+case 1: if the operations work
40
+ *
127
+(1) report the first zone:
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
128
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
42
+ * Copyright (c) 2020 Red Hat, Inc.
129
+
43
+ *
130
+report the first 10 zones
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
131
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
45
+ * later. See the COPYING file in the top-level directory.
132
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
46
+ */
133
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
47
+
134
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
48
+#ifndef VHOST_USER_BLK_SERVER_H
135
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
49
+#define VHOST_USER_BLK_SERVER_H
136
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
50
+#include "util/vhost-user-server.h"
137
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
51
+
138
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
52
+typedef struct VuBlockDev VuBlockDev;
139
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
140
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
54
+#define VHOST_USER_BLK_SERVER(obj) \
141
+
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
142
+report the last zone:
56
+
143
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
57
+/* vhost user block device */
144
+
58
+struct VuBlockDev {
145
+
59
+ Object parent_obj;
146
+(2) opening the first zone
60
+ char *node_name;
147
+report after:
61
+ SocketAddress *addr;
148
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
62
+ AioContext *ctx;
149
+
63
+ VuServer vu_server;
150
+opening the second zone
64
+ bool running;
151
+report after:
65
+ uint32_t blk_size;
152
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
66
+ BlockBackend *backend;
153
+
67
+ QIOChannelSocket *sioc;
154
+opening the last zone
68
+ QTAILQ_ENTRY(VuBlockDev) next;
155
+report after:
69
+ struct virtio_blk_config blkcfg;
156
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
70
+ bool writable;
157
+
71
+};
158
+
72
+
159
+(3) closing the first zone
73
+#endif /* VHOST_USER_BLK_SERVER_H */
160
+report after:
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
161
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
75
new file mode 100644
162
+
76
index XXXXXXX..XXXXXXX
163
+closing the last zone
77
--- /dev/null
164
+report after:
78
+++ b/block/export/vhost-user-blk-server.c
165
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
79
@@ -XXX,XX +XXX,XX @@
166
+
80
+/*
167
+
81
+ * Sharing QEMU block devices via vhost-user protocal
168
+(4) finishing the second zone
82
+ *
169
+After finishing a zone:
83
+ * Parts of the code based on nbd/server.c.
170
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
84
+ *
171
+
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
172
+
86
+ * Copyright (c) 2020 Red Hat, Inc.
173
+(5) resetting the second zone
87
+ *
174
+After resetting a zone:
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
175
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
89
+ * later. See the COPYING file in the top-level directory.
176
+*** done
90
+ */
91
+#include "qemu/osdep.h"
92
+#include "block/block.h"
93
+#include "vhost-user-blk-server.h"
94
+#include "qapi/error.h"
95
+#include "qom/object_interfaces.h"
96
+#include "sysemu/block-backend.h"
97
+#include "util/block-helpers.h"
98
+
99
+enum {
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
101
+};
102
+struct virtio_blk_inhdr {
103
+ unsigned char status;
104
+};
105
+
106
+typedef struct VuBlockReq {
107
+ VuVirtqElement *elem;
108
+ int64_t sector_num;
109
+ size_t size;
110
+ struct virtio_blk_inhdr *in;
111
+ struct virtio_blk_outhdr out;
112
+ VuServer *server;
113
+ struct VuVirtq *vq;
114
+} VuBlockReq;
115
+
116
+static void vu_block_req_complete(VuBlockReq *req)
117
+{
118
+ VuDev *vu_dev = &req->server->vu_dev;
119
+
120
+ /* IO size with 1 extra status byte */
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
122
+ vu_queue_notify(vu_dev, req->vq);
123
+
124
+ if (req->elem) {
125
+ free(req->elem);
126
+ }
127
+
128
+ g_free(req);
129
+}
130
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
132
+{
133
+ return container_of(server, VuBlockDev, vu_server);
134
+}
135
+
136
+static int coroutine_fn
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
138
+ uint32_t iovcnt, uint32_t type)
139
+{
140
+ struct virtio_blk_discard_write_zeroes desc;
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
142
+ if (unlikely(size != sizeof(desc))) {
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
144
+ return -EINVAL;
145
+ }
146
+
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
149
+ le32_to_cpu(desc.num_sectors) << 9 };
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
152
+ return 0;
153
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
156
+ range[0], range[1], 0) == 0) {
157
+ return 0;
158
+ }
159
+ }
160
+
161
+ return -EINVAL;
162
+}
163
+
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
165
+{
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
167
+ BlockBackend *backend = vdev_blk->backend;
168
+ blk_co_flush(backend);
169
+}
170
+
171
+struct req_data {
172
+ VuServer *server;
173
+ VuVirtq *vq;
174
+ VuVirtqElement *elem;
175
+};
176
+
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
178
+{
179
+ struct req_data *data = opaque;
180
+ VuServer *server = data->server;
181
+ VuVirtq *vq = data->vq;
182
+ VuVirtqElement *elem = data->elem;
183
+ uint32_t type;
184
+ VuBlockReq *req;
185
+
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
187
+ BlockBackend *backend = vdev_blk->backend;
188
+
189
+ struct iovec *in_iov = elem->in_sg;
190
+ struct iovec *out_iov = elem->out_sg;
191
+ unsigned in_num = elem->in_num;
192
+ unsigned out_num = elem->out_num;
193
+ /* refer to hw/block/virtio_blk.c */
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
195
+ error_report("virtio-blk request missing headers");
196
+ free(elem);
197
+ return;
198
+ }
199
+
200
+ req = g_new0(VuBlockReq, 1);
201
+ req->server = server;
202
+ req->vq = vq;
203
+ req->elem = elem;
204
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
206
+ sizeof(req->out)) != sizeof(req->out))) {
207
+ error_report("virtio-blk request outhdr too short");
208
+ goto err;
209
+ }
210
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
212
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
214
+ error_report("virtio-blk request inhdr too short");
215
+ goto err;
216
+ }
217
+
218
+ /* We always touch the last byte, so just see how big in_iov is. */
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
220
+ + in_iov[in_num - 1].iov_len
221
+ - sizeof(struct virtio_blk_inhdr);
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
223
+
224
+ type = le32_to_cpu(req->out.type);
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
226
+ case VIRTIO_BLK_T_IN:
227
+ case VIRTIO_BLK_T_OUT: {
228
+ ssize_t ret = 0;
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
230
+ req->sector_num = le64_to_cpu(req->out.sector);
231
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
233
+ QEMUIOVector qiov;
234
+ if (is_write) {
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
237
+ &qiov, 0);
238
+ } else {
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
241
+ &qiov, 0);
242
+ }
243
+ if (ret >= 0) {
244
+ req->in->status = VIRTIO_BLK_S_OK;
245
+ } else {
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
247
+ }
248
+ break;
249
+ }
250
+ case VIRTIO_BLK_T_FLUSH:
251
+ vu_block_flush(req);
252
+ req->in->status = VIRTIO_BLK_S_OK;
253
+ break;
254
+ case VIRTIO_BLK_T_GET_ID: {
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
256
+ VIRTIO_BLK_ID_BYTES);
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
258
+ req->in->status = VIRTIO_BLK_S_OK;
259
+ req->size = elem->in_sg[0].iov_len;
260
+ break;
261
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
264
+ int rc;
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
266
+ out_num, type);
267
+ if (rc == 0) {
268
+ req->in->status = VIRTIO_BLK_S_OK;
269
+ } else {
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
271
+ }
272
+ break;
273
+ }
274
+ default:
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
276
+ break;
277
+ }
278
+
279
+ vu_block_req_complete(req);
280
+ return;
281
+
282
+err:
283
+ free(elem);
284
+ g_free(req);
285
+ return;
286
+}
287
+
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
289
+{
290
+ VuServer *server;
291
+ VuVirtq *vq;
292
+ struct req_data *req_data;
293
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
295
+ assert(server);
296
+
297
+ vq = vu_get_queue(vu_dev, idx);
298
+ assert(vq);
299
+ VuVirtqElement *elem;
300
+ while (1) {
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
302
+ sizeof(VuBlockReq));
303
+ if (elem) {
304
+ req_data = g_new0(struct req_data, 1);
305
+ req_data->server = server;
306
+ req_data->vq = vq;
307
+ req_data->elem = elem;
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
309
+ req_data);
310
+ aio_co_enter(server->ioc->ctx, co);
311
+ } else {
312
+ break;
313
+ }
314
+ }
315
+}
316
+
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
318
+{
319
+ VuVirtq *vq;
320
+
321
+ assert(vu_dev);
322
+
323
+ vq = vu_get_queue(vu_dev, idx);
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
325
+}
326
+
327
+static uint64_t vu_block_get_features(VuDev *dev)
328
+{
329
+ uint64_t features;
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
340
+ 1ull << VIRTIO_F_VERSION_1 |
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
344
+
345
+ if (!vdev_blk->writable) {
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
347
+ }
348
+
349
+ return features;
350
+}
351
+
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
353
+{
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
356
+}
357
+
358
+static int
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
360
+{
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
363
+ memcpy(config, &vdev_blk->blkcfg, len);
364
+
365
+ return 0;
366
+}
367
+
368
+static int
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
370
+ uint32_t offset, uint32_t size, uint32_t flags)
371
+{
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
374
+ uint8_t wce;
375
+
376
+ /* don't support live migration */
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
378
+ return -EINVAL;
379
+ }
380
+
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
382
+ size != 1) {
383
+ return -EINVAL;
384
+ }
385
+
386
+ wce = *data;
387
+ vdev_blk->blkcfg.wce = wce;
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
389
+ return 0;
390
+}
391
+
392
+/*
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
394
+ * and vu_process_message will simple call exit which cause the VM
395
+ * to exit abruptly.
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
397
+ * of vu_process_message.
398
+ *
399
+ */
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
401
+{
402
+ if (vmsg->request == VHOST_USER_NONE) {
403
+ dev->panic(dev, "disconnect");
404
+ return true;
405
+ }
406
+ return false;
407
+}
408
+
409
+static const VuDevIface vu_block_iface = {
410
+ .get_features = vu_block_get_features,
411
+ .queue_set_started = vu_block_queue_set_started,
412
+ .get_protocol_features = vu_block_get_protocol_features,
413
+ .get_config = vu_block_get_config,
414
+ .set_config = vu_block_set_config,
415
+ .process_msg = vu_block_process_msg,
416
+};
417
+
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
419
+{
420
+ VuBlockDev *vub_dev = opaque;
421
+ aio_context_acquire(ctx);
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
423
+ aio_context_release(ctx);
424
+}
425
+
426
+static void blk_aio_detach(void *opaque)
427
+{
428
+ VuBlockDev *vub_dev = opaque;
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
430
+ aio_context_acquire(ctx);
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
432
+ aio_context_release(ctx);
433
+}
434
+
435
+static void
436
+vu_block_initialize_config(BlockDriverState *bs,
437
+ struct virtio_blk_config *config, uint32_t blk_size)
438
+{
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
440
+ config->blk_size = blk_size;
441
+ config->size_max = 0;
442
+ config->seg_max = 128 - 2;
443
+ config->min_io_size = 1;
444
+ config->opt_io_size = 1;
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
446
+ config->max_discard_sectors = 32768;
447
+ config->max_discard_seg = 1;
448
+ config->discard_sector_alignment = config->blk_size >> 9;
449
+ config->max_write_zeroes_sectors = 32768;
450
+ config->max_write_zeroes_seg = 1;
451
+}
452
+
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
454
+{
455
+
456
+ BlockBackend *blk;
457
+ Error *local_error = NULL;
458
+ const char *node_name = vu_block_device->node_name;
459
+ bool writable = vu_block_device->writable;
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
461
+ int ret;
462
+
463
+ AioContext *ctx;
464
+
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
466
+
467
+ if (!bs) {
468
+ error_propagate(errp, local_error);
469
+ return NULL;
470
+ }
471
+
472
+ if (bdrv_is_read_only(bs)) {
473
+ writable = false;
474
+ }
475
+
476
+ if (writable) {
477
+ perm |= BLK_PERM_WRITE;
478
+ }
479
+
480
+ ctx = bdrv_get_aio_context(bs);
481
+ aio_context_acquire(ctx);
482
+ bdrv_invalidate_cache(bs, NULL);
483
+ aio_context_release(ctx);
484
+
485
+ /*
486
+ * Don't allow resize while the vhost user server is running,
487
+ * otherwise we don't care what happens with the node.
488
+ */
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
492
+ ret = blk_insert_bs(blk, bs, errp);
493
+
494
+ if (ret < 0) {
495
+ goto fail;
496
+ }
497
+
498
+ blk_set_enable_write_cache(blk, false);
499
+
500
+ blk_set_allow_aio_context_change(blk, true);
501
+
502
+ vu_block_device->blkcfg.wce = 0;
503
+ vu_block_device->backend = blk;
504
+ if (!vu_block_device->blk_size) {
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
506
+ }
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
510
+ vu_block_device->blk_size);
511
+ return vu_block_device;
512
+
513
+fail:
514
+ blk_unref(blk);
515
+ return NULL;
516
+}
517
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
519
+{
520
+ if (vu_block_device->backend) {
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
522
+ blk_aio_detach, vu_block_device);
523
+ }
524
+
525
+ blk_unref(vu_block_device->backend);
526
+}
527
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
529
+{
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
531
+ vu_block_deinit(vu_block_device);
532
+}
533
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
535
+ Error **errp)
536
+{
537
+ AioContext *ctx;
538
+ SocketAddress *addr = vu_block_device->addr;
539
+
540
+ if (!vu_block_init(vu_block_device, errp)) {
541
+ return;
542
+ }
543
+
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
545
+
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
547
+ VHOST_USER_BLK_MAX_QUEUES,
548
+ NULL, &vu_block_iface,
549
+ errp)) {
550
+ goto error;
551
+ }
552
+
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
554
+ blk_aio_detach, vu_block_device);
555
+ vu_block_device->running = true;
556
+ return;
557
+
558
+ error:
559
+ vu_block_deinit(vu_block_device);
560
+}
561
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
563
+{
564
+ if (vus->running) {
565
+ error_setg(errp, "The property can't be modified "
566
+ "while the server is running");
567
+ return false;
568
+ }
569
+ return true;
570
+}
571
+
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
573
+{
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
575
+
576
+ if (!vu_prop_modifiable(vus, errp)) {
577
+ return;
578
+ }
579
+
580
+ if (vus->node_name) {
581
+ g_free(vus->node_name);
582
+ }
583
+
584
+ vus->node_name = g_strdup(value);
585
+}
586
+
587
+static char *vu_get_node_name(Object *obj, Error **errp)
588
+{
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
590
+ return g_strdup(vus->node_name);
591
+}
592
+
593
+static void free_socket_addr(SocketAddress *addr)
594
+{
595
+ g_free(addr->u.q_unix.path);
596
+ g_free(addr);
597
+}
598
+
599
+static void vu_set_unix_socket(Object *obj, const char *value,
600
+ Error **errp)
601
+{
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
603
+
604
+ if (!vu_prop_modifiable(vus, errp)) {
605
+ return;
606
+ }
607
+
608
+ if (vus->addr) {
609
+ free_socket_addr(vus->addr);
610
+ }
611
+
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
614
+ addr->u.q_unix.path = g_strdup(value);
615
+ vus->addr = addr;
616
+}
617
+
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
619
+{
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
621
+ return g_strdup(vus->addr->u.q_unix.path);
622
+}
623
+
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
625
+{
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
627
+ return vus->writable;
628
+}
629
+
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
631
+{
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
633
+
634
+ if (!vu_prop_modifiable(vus, errp)) {
635
+ return;
636
+ }
637
+
638
+ vus->writable = value;
639
+}
640
+
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
642
+ void *opaque, Error **errp)
643
+{
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
645
+ uint32_t value = vus->blk_size;
646
+
647
+ visit_type_uint32(v, name, &value, errp);
648
+}
649
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
651
+ void *opaque, Error **errp)
652
+{
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
654
+
655
+ Error *local_err = NULL;
656
+ uint32_t value;
657
+
658
+ if (!vu_prop_modifiable(vus, errp)) {
659
+ return;
660
+ }
661
+
662
+ visit_type_uint32(v, name, &value, &local_err);
663
+ if (local_err) {
664
+ goto out;
665
+ }
666
+
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
668
+ if (local_err) {
669
+ goto out;
670
+ }
671
+
672
+ vus->blk_size = value;
673
+
674
+out:
675
+ error_propagate(errp, local_err);
676
+}
677
+
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
679
+{
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
681
+
682
+ vhost_user_blk_server_stop(vub);
683
+
684
+ /*
685
+ * Unlike object_property_add_str, object_class_property_add_str
686
+ * doesn't have a release method. Thus manual memory freeing is
687
+ * needed.
688
+ */
689
+ free_socket_addr(vub->addr);
690
+ g_free(vub->node_name);
691
+}
692
+
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
694
+{
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
696
+
697
+ vhost_user_blk_server_start(vub, errp);
698
+}
699
+
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
701
+ void *class_data)
702
+{
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
704
+ ucc->complete = vhost_user_blk_server_complete;
705
+
706
+ object_class_property_add_bool(klass, "writable",
707
+ vu_get_block_writable,
708
+ vu_set_block_writable);
709
+
710
+ object_class_property_add_str(klass, "node-name",
711
+ vu_get_node_name,
712
+ vu_set_node_name);
713
+
714
+ object_class_property_add_str(klass, "unix-socket",
715
+ vu_get_unix_socket,
716
+ vu_set_unix_socket);
717
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
719
+ vu_get_blk_size, vu_set_blk_size,
720
+ NULL, NULL);
721
+}
722
+
723
+static const TypeInfo vhost_user_blk_server_info = {
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
725
+ .parent = TYPE_OBJECT,
726
+ .instance_size = sizeof(VuBlockDev),
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
728
+ .class_init = vhost_user_blk_server_class_init,
729
+ .interfaces = (InterfaceInfo[]) {
730
+ {TYPE_USER_CREATABLE},
731
+ {}
732
+ },
733
+};
734
+
735
+static void vhost_user_blk_server_register_types(void)
736
+{
737
+ type_register_static(&vhost_user_blk_server_info);
738
+}
739
+
740
+type_init(vhost_user_blk_server_register_types)
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
742
index XXXXXXX..XXXXXXX 100644
743
--- a/softmmu/vl.c
744
+++ b/softmmu/vl.c
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
746
}
747
#endif
748
749
+ /* Reason: vhost-user-blk-server property "node-name" */
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
751
+ return false;
752
+ }
753
/*
754
* Reason: filter-* property "netdev" etc.
755
*/
756
diff --git a/block/meson.build b/block/meson.build
757
index XXXXXXX..XXXXXXX 100644
758
--- a/block/meson.build
759
+++ b/block/meson.build
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
177
--
768
--
178
2.39.2
769
2.26.2
179
770
180
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Add the documentation about the example of using virtio-blk driver
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
4
to pass the zoned block devices through to the guest.
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
5
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Message-id: 20230407082528.18841-6-faithilikerun@gmail.com
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Fix Sphinx indentation error by turning command-lines into
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
pre-formatted text.
9
separate pull request.
10
--Stefan]
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
12
---
13
docs/devel/zoned-storage.rst | 25 ++++++++++++++++++++++---
13
MAINTAINERS | 7 +++++++
14
1 file changed, 22 insertions(+), 3 deletions(-)
14
1 file changed, 7 insertions(+)
15
15
16
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
18
--- a/docs/devel/zoned-storage.rst
18
--- a/MAINTAINERS
19
+++ b/docs/devel/zoned-storage.rst
19
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ When the BlockBackend's BlockLimits model reports a zoned storage device, users
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
21
like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
21
S: Supported
22
APIs for zoned storage emulation or testing.
22
F: tests/image-fuzzer/
23
23
24
-For example, to test zone_report on a null_blk device using qemu-io is:
24
+Vhost-user block device backend server
25
-$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
26
--c "zrp offset nr_zones"
26
+S: Maintained
27
+For example, to test zone_report on a null_blk device using qemu-io is::
27
+F: block/export/vhost-user-blk-server.c
28
+F: util/vhost-user-server.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
28
+
30
+
29
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
31
Replication
30
+
32
M: Wen Congyang <wencongyang2@huawei.com>
31
+To expose the host's zoned block device through virtio-blk, the command line
33
M: Xie Changlong <xiechanglong.d@gmail.com>
32
+can be (includes the -device parameter)::
33
+
34
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
35
+ -device virtio-blk-pci,drive=drive0
36
+
37
+Or only use the -drive parameter::
38
+
39
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
40
+
41
+Additionally, QEMU has several ways of supporting zoned storage, including:
42
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
43
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
44
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
45
+purposes, it cannot yet pass through a zoned device from the host. To pass on
46
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
47
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
48
+attached to the HBA.
49
--
34
--
50
2.39.2
35
2.26.2
36
diff view generated by jsdifflib
New patch
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
---
5
util/vhost-user-server.c | 2 +-
6
1 file changed, 1 insertion(+), 1 deletion(-)
1
7
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
9
index XXXXXXX..XXXXXXX 100644
10
--- a/util/vhost-user-server.c
11
+++ b/util/vhost-user-server.c
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
13
return false;
14
}
15
16
- /* zero out unspecified fileds */
17
+ /* zero out unspecified fields */
18
*server = (VuServer) {
19
.listener = listener,
20
.vu_iface = vu_iface,
21
--
22
2.26.2
23
diff view generated by jsdifflib
1
From: Carlos Santos <casantos@redhat.com>
1
We already have access to the value with the correct type (ioc and sioc
2
are the same QIOChannel).
2
3
3
It is not useful when configuring with --enable-trace-backends=nop.
4
5
Signed-off-by: Carlos Santos <casantos@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Message-Id: <20230408010410.281263-1-casantos@redhat.com>
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
7
---
9
trace/meson.build | 2 +-
8
util/vhost-user-server.c | 2 +-
10
1 file changed, 1 insertion(+), 1 deletion(-)
9
1 file changed, 1 insertion(+), 1 deletion(-)
11
10
12
diff --git a/trace/meson.build b/trace/meson.build
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
13
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
14
--- a/trace/meson.build
13
--- a/util/vhost-user-server.c
15
+++ b/trace/meson.build
14
+++ b/util/vhost-user-server.c
16
@@ -XXX,XX +XXX,XX @@ trace_events_all = custom_target('trace-events-all',
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
17
input: trace_events_files,
16
server->ioc = QIO_CHANNEL(sioc);
18
command: [ 'cat', '@INPUT@' ],
17
object_ref(OBJECT(server->ioc));
19
capture: true,
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
20
- install: true,
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
21
+ install: get_option('trace_backends') != [ 'nop' ],
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
22
install_dir: qemu_datadir)
21
vu_client_start(server);
23
22
}
24
if 'ust' in get_option('trace_backends')
23
25
--
24
--
26
2.39.2
25
2.26.2
26
diff view generated by jsdifflib
New patch
1
Explicitly deleting watches is not necessary since libvhost-user calls
2
remove_watch() during vu_deinit(). Add an assertion to check this
3
though.
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
util/vhost-user-server.c | 19 ++++---------------
10
1 file changed, 4 insertions(+), 15 deletions(-)
11
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.c
15
+++ b/util/vhost-user-server.c
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
18
server->sioc = NULL;
19
20
- VuFdWatch *vu_fd_watch, *next;
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
23
- NULL, NULL, NULL);
24
- }
25
-
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
28
- if (!vu_fd_watch->processing) {
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
30
- g_free(vu_fd_watch);
31
- }
32
- }
33
- }
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
39
}
40
41
vu_deinit(&server->vu_dev);
42
+
43
+ /* vu_deinit() should have called remove_watch() */
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
45
+
46
object_unref(OBJECT(sioc));
47
object_unref(OBJECT(server->ioc));
48
}
49
--
50
2.26.2
51
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Only one struct is needed per request. Drop req_data and the separate
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
2
4
3
This patch extends virtio-blk emulation to handle zoned device commands
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
7
6
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
support zoned block devices. Regular block devices(conventional zones)
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230407082528.18841-3-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
10
---
19
hw/block/virtio-blk-common.c | 2 +
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
12
1 file changed, 21 insertions(+), 47 deletions(-)
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
13
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
25
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
16
--- a/block/export/vhost-user-blk-server.c
27
+++ b/hw/block/virtio-blk-common.c
17
+++ b/block/export/vhost-user-blk-server.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
35
};
19
};
36
20
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
21
typedef struct VuBlockReq {
38
index XXXXXXX..XXXXXXX 100644
22
- VuVirtqElement *elem;
39
--- a/hw/block/virtio-blk.c
23
+ VuVirtqElement elem;
40
+++ b/hw/block/virtio-blk.c
24
int64_t sector_num;
41
@@ -XXX,XX +XXX,XX @@
25
size_t size;
42
#include "qemu/module.h"
26
struct virtio_blk_inhdr *in;
43
#include "qemu/error-report.h"
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
44
#include "qemu/main-loop.h"
28
VuDev *vu_dev = &req->server->vu_dev;
45
+#include "block/block_int.h"
29
46
#include "trace.h"
30
/* IO size with 1 extra status byte */
47
#include "hw/block/block.h"
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
48
#include "hw/qdev-properties.h"
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
49
@@ -XXX,XX +XXX,XX @@ err:
33
vu_queue_notify(vu_dev, req->vq);
50
return err_status;
34
35
- if (req->elem) {
36
- free(req->elem);
37
- }
38
-
39
- g_free(req);
40
+ free(req);
51
}
41
}
52
42
53
+typedef struct ZoneCmdData {
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
54
+ VirtIOBlockReq *req;
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
55
+ struct iovec *in_iov;
45
blk_co_flush(backend);
56
+ unsigned in_num;
46
}
57
+ union {
47
58
+ struct {
48
-struct req_data {
59
+ unsigned int nr_zones;
49
- VuServer *server;
60
+ BlockZoneDescriptor *zones;
50
- VuVirtq *vq;
61
+ } zone_report_data;
51
- VuVirtqElement *elem;
62
+ struct {
52
-};
63
+ int64_t offset;
53
-
64
+ } zone_append_data;
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
65
+ };
55
{
66
+} ZoneCmdData;
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
67
+
72
+
68
+/*
73
/* refer to hw/block/virtio_blk.c */
69
+ * check zoned_request: error checking before issuing requests. If all checks
74
if (elem->out_num < 1 || elem->in_num < 1) {
70
+ * passed, return true.
75
error_report("virtio-blk request missing headers");
71
+ * append: true if only zone append requests issued.
76
- free(elem);
72
+ */
77
- return;
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
78
+ goto err;
74
+ bool append, uint8_t *status) {
79
}
75
+ BlockDriverState *bs = blk_bs(s->blk);
80
76
+ int index;
81
- req = g_new0(VuBlockReq, 1);
82
- req->server = server;
83
- req->vq = vq;
84
- req->elem = elem;
85
-
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
87
sizeof(req->out)) != sizeof(req->out))) {
88
error_report("virtio-blk request outhdr too short");
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
90
91
err:
92
free(elem);
93
- g_free(req);
94
- return;
95
}
96
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
98
{
99
- VuServer *server;
100
- VuVirtq *vq;
101
- struct req_data *req_data;
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
104
105
- server = container_of(vu_dev, VuServer, vu_dev);
106
- assert(server);
107
-
108
- vq = vu_get_queue(vu_dev, idx);
109
- assert(vq);
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
77
+
124
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
79
+ *status = VIRTIO_BLK_S_UNSUPP;
126
+ if (!req) {
80
+ return false;
127
break;
81
+ }
128
}
82
+
129
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
130
+ req->server = server;
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
131
+ req->vq = vq;
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
132
+
89
+ if (append) {
133
+ Coroutine *co =
90
+ if (bs->bl.write_granularity) {
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
91
+ if ((offset % bs->bl.write_granularity) != 0) {
135
+ qemu_coroutine_enter(co);
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
136
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
137
}
463
138
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
466
VirtIOBlock *s = VIRTIO_BLK(dev);
467
VirtIOBlkConf *conf = &s->conf;
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
Error *err = NULL;
470
unsigned i;
471
472
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
473
return;
474
}
475
476
+ if (bs->bl.zoned != BLK_Z_NONE) {
477
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
478
+ if (bs->bl.zoned == BLK_Z_HM) {
479
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
480
+ }
481
+ }
482
+
483
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
484
(!conf->max_discard_sectors ||
485
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
486
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
487
index XXXXXXX..XXXXXXX 100644
488
--- a/hw/virtio/virtio-qmp.c
489
+++ b/hw/virtio/virtio-qmp.c
490
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
491
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
492
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
493
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
494
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
495
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
496
#ifndef VIRTIO_BLK_NO_LEGACY
497
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
498
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
499
--
139
--
500
2.39.2
140
2.26.2
141
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
The device panic notifier callback is not used. Drop it.
2
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
6
---
10
block/file-posix.c | 3 +++
7
util/vhost-user-server.h | 3 ---
11
block/trace-events | 2 ++
8
block/export/vhost-user-blk-server.c | 3 +--
12
2 files changed, 5 insertions(+)
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
13
11
14
diff --git a/block/file-posix.c b/block/file-posix.c
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
15
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
16
--- a/block/file-posix.c
14
--- a/util/vhost-user-server.h
17
+++ b/block/file-posix.c
15
+++ b/util/vhost-user-server.h
18
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
19
},
17
} VuFdWatch;
18
19
typedef struct VuServer VuServer;
20
-typedef void DevicePanicNotifierFn(VuServer *server);
21
22
struct VuServer {
23
QIONetListener *listener;
24
AioContext *ctx;
25
- DevicePanicNotifierFn *device_panic_notifier;
26
int max_queues;
27
const VuDevIface *vu_iface;
28
VuDev vu_dev;
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
30
SocketAddress *unix_socket,
31
AioContext *ctx,
32
uint16_t max_queues,
33
- DevicePanicNotifierFn *device_panic_notifier,
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
73
{
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
75
.vu_iface = vu_iface,
76
.max_queues = max_queues,
77
.ctx = ctx,
78
- .device_panic_notifier = device_panic_notifier,
20
};
79
};
21
80
22
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
23
return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
24
}
25
#endif
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
27
},
28
};
29
30
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
31
+ len >> BDRV_SECTOR_BITS);
32
ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
33
if (ret != 0) {
34
error_report("ioctl %s failed %d", op_name, ret);
35
diff --git a/block/trace-events b/block/trace-events
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/trace-events
38
+++ b/block/trace-events
39
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
40
file_setup_cdrom(const char *partition) "Using %s as optical disc"
41
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
42
file_flush_fdatasync_failed(int err) "errno %d"
43
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
44
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
45
46
# ssh.c
47
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
48
--
82
--
49
2.39.2
83
2.26.2
84
diff view generated by jsdifflib
New patch
1
fds[] is leaked when qio_channel_readv_full() fails.
1
2
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
12
1 file changed, 23 insertions(+), 27 deletions(-)
13
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/util/vhost-user-server.c
17
+++ b/util/vhost-user-server.c
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
19
};
20
int rc, read_bytes = 0;
21
Error *local_err = NULL;
22
- /*
23
- * Store fds/nfds returned from qio_channel_readv_full into
24
- * temporary variables.
25
- *
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
29
- * thus two temporary variables nfds and fds are used here.
30
- */
31
- size_t nfds = 0, nfds_t = 0;
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
33
- int *fds_t = NULL;
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
35
QIOChannel *ioc = server->ioc;
36
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
47
+
48
/*
49
* qio_channel_readv_full may have short reads, keeping calling it
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
51
*/
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
54
if (rc < 0) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
56
+ assert(local_err == NULL);
57
qio_channel_yield(ioc, G_IO_IN);
58
continue;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
63
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
105
--
106
2.26.2
107
diff view generated by jsdifflib
New patch
1
Unexpected EOF is an error that must be reported.
1
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.c | 6 ++++--
8
1 file changed, 4 insertions(+), 2 deletions(-)
9
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/util/vhost-user-server.c
13
+++ b/util/vhost-user-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
15
};
16
if (vmsg->size) {
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
18
- if (rc == -1) {
19
- error_report_err(local_err);
20
+ if (rc != 1) {
21
+ if (local_err) {
22
+ error_report_err(local_err);
23
+ }
24
goto fail;
25
}
26
}
27
--
28
2.26.2
29
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
3
still access it in some cases.
2
4
3
The patch tests zone append writes by reporting the zone wp after
5
Rework the lifecycle to solve these safety issues.
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
7
6
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
10
Message-id: 20230407081657.17947-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
10
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
11
util/vhost-user-server.h | 29 ++--
14
tests/qemu-iotests/tests/zoned | 16 +++++++
12
block/export/vhost-user-blk-server.c | 9 +-
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
13
util/vhost-user-server.c | 245 +++++++++++++++------------
16
3 files changed, 107 insertions(+)
14
3 files changed, 155 insertions(+), 128 deletions(-)
17
15
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
19
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
20
--- a/qemu-io-cmds.c
18
--- a/util/vhost-user-server.h
21
+++ b/qemu-io-cmds.c
19
+++ b/util/vhost-user-server.h
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
20
@@ -XXX,XX +XXX,XX @@
23
.oneline = "reset a zone write pointer in zone block device",
21
#include "qapi/error.h"
24
};
22
#include "standard-headers/linux/virtio_blk.h"
25
23
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
24
+/* A kick fd that we monitor on behalf of libvhost-user */
27
+ int64_t *offset, int flags, int *total)
25
typedef struct VuFdWatch {
26
VuDev *vu_dev;
27
int fd; /*kick fd*/
28
void *pvt;
29
vu_watch_cb cb;
30
- bool processing;
31
QTAILQ_ENTRY(VuFdWatch) next;
32
} VuFdWatch;
33
34
-typedef struct VuServer VuServer;
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
85
VuBlockDev *vub_dev = opaque;
86
- aio_context_acquire(ctx);
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
88
- aio_context_release(ctx);
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
90
}
91
92
static void blk_aio_detach(void *opaque)
93
{
94
VuBlockDev *vub_dev = opaque;
95
- AioContext *ctx = vub_dev->vu_server.ctx;
96
- aio_context_acquire(ctx);
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
98
- aio_context_release(ctx);
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
113
114
+/*
115
+ * Theory of operation:
116
+ *
117
+ * VuServer is started and stopped by vhost_user_server_start() and
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
153
+ */
154
+
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
156
{
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
160
}
161
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
163
- gpointer opaque);
164
-
165
-static void close_client(VuServer *server)
166
-{
167
- /*
168
- * Before closing the client
169
- *
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
171
- *
172
- * 2. remove kick_handler
173
- *
174
- * 3. wait for the kick handler to be finished
175
- *
176
- * 4. wait for the current vhost-user msg to be finished processing
177
- */
178
-
179
- QIOChannelSocket *sioc = server->sioc;
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
181
- server->sioc = NULL;
182
-
183
- while (server->processing_msg) {
184
- if (server->ioc->read_coroutine) {
185
- server->ioc->read_coroutine = NULL;
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
187
- NULL, server->ioc);
188
- server->processing_msg = false;
189
- }
190
- }
191
-
192
- vu_deinit(&server->vu_dev);
193
-
194
- /* vu_deinit() should have called remove_watch() */
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
196
-
197
- object_unref(OBJECT(sioc));
198
- object_unref(OBJECT(server->ioc));
199
-}
200
-
201
static void panic_cb(VuDev *vu_dev, const char *buf)
202
{
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
204
-
205
- /* avoid while loop in close_client */
206
- server->processing_msg = false;
207
-
208
- if (buf) {
209
- error_report("vu_panic: %s", buf);
210
- }
211
-
212
- if (server->sioc) {
213
- close_client(server);
214
- }
215
-
216
- /*
217
- * Set the callback function for network listener so another
218
- * vhost-user client can connect to this server
219
- */
220
- qio_net_listener_set_client_func(server->listener,
221
- vu_accept,
222
- server,
223
- NULL);
224
+ error_report("vu_panic: %s", buf);
225
}
226
227
static bool coroutine_fn
228
@@ -XXX,XX +XXX,XX @@ fail:
229
return false;
230
}
231
232
-
233
-static void vu_client_start(VuServer *server);
234
static coroutine_fn void vu_client_trip(void *opaque)
235
{
236
VuServer *server = opaque;
237
+ VuDev *vu_dev = &server->vu_dev;
238
239
- while (!server->aio_context_changed && server->sioc) {
240
- server->processing_msg = true;
241
- vu_dispatch(&server->vu_dev);
242
- server->processing_msg = false;
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
244
+ /* Keep running */
245
}
246
247
- if (server->aio_context_changed && server->sioc) {
248
- server->aio_context_changed = false;
249
- vu_client_start(server);
250
- }
251
-}
252
+ vu_deinit(vu_dev);
253
+
254
+ /* vu_deinit() should have called remove_watch() */
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
256
+
257
+ object_unref(OBJECT(server->sioc));
258
+ server->sioc = NULL;
259
260
-static void vu_client_start(VuServer *server)
261
-{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
263
- aio_co_enter(server->ctx, server->co_trip);
264
+ object_unref(OBJECT(server->ioc));
265
+ server->ioc = NULL;
266
+
267
+ server->co_trip = NULL;
268
+ if (server->restart_listener_bh) {
269
+ qemu_bh_schedule(server->restart_listener_bh);
270
+ }
271
+ aio_wait_kick();
272
}
273
274
/*
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
276
static void kick_handler(void *opaque)
277
{
278
VuFdWatch *vu_fd_watch = opaque;
279
- vu_fd_watch->processing = true;
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
281
- vu_fd_watch->processing = false;
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
283
+
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
285
+
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
287
+ if (vu_dev->broken) {
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
344
+
345
+/*
346
+ * Allow the next client to connect to the server. Called from a BH in the main
347
+ * loop.
348
+ */
349
+static void restart_listener_bh(void *opaque)
28
+{
350
+{
29
+ int async_ret = NOT_DONE;
351
+ VuServer *server = opaque;
30
+
352
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
32
+ while (async_ret == NOT_DONE) {
354
+ NULL);
33
+ main_loop_wait(false);
355
}
356
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
358
+/* Called with ctx acquired */
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
360
{
361
- VuFdWatch *vu_fd_watch, *next;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
34
+ }
386
+ }
35
+
387
+
36
+ *total = qiov->size;
388
+ aio_co_schedule(ctx, server->co_trip);
37
+ return async_ret < 0 ? async_ret : 1;
38
+}
389
+}
39
+
390
+
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
391
+/* Called with server->ctx acquired */
392
+void vhost_user_server_detach_aio_context(VuServer *server)
41
+{
393
+{
42
+ int ret;
394
+ if (server->sioc) {
43
+ bool pflag = false;
395
+ VuFdWatch *vu_fd_watch;
44
+ int flags = 0;
396
+
45
+ int total = 0;
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
46
+ int64_t offset;
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
47
+ char *buf;
399
+ NULL, NULL, NULL, vu_fd_watch);
48
+ int c, nr_iov;
400
+ }
49
+ int pattern = 0xcd;
401
+
50
+ QEMUIOVector qiov;
402
qio_channel_detach_aio_context(server->ioc);
51
+
403
- /* server->ioc->ctx keeps the old AioConext */
52
+ if (optind > argc - 3) {
404
- ctx = server->ioc->ctx;
53
+ return -EINVAL;
405
- attach = false;
54
+ }
406
}
55
+
407
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
57
+ pflag = true;
409
- if (vu_fd_watch->cb) {
58
+ }
410
- opaque = attach ? vu_fd_watch : NULL;
59
+
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
60
+ offset = cvtnum(argv[optind]);
412
- io_read, NULL, NULL,
61
+ if (offset < 0) {
413
- opaque);
62
+ print_cvtnum_err(offset, argv[optind]);
414
- }
63
+ return offset;
415
- }
64
+ }
416
+ server->ctx = NULL;
65
+ optind++;
417
}
66
+ nr_iov = argc - optind;
418
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
419
-
68
+ flags & BDRV_REQ_REGISTERED_BUF);
420
bool vhost_user_server_start(VuServer *server,
69
+ if (buf == NULL) {
421
SocketAddress *socket_addr,
70
+ return -EINVAL;
422
AioContext *ctx,
71
+ }
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
424
const VuDevIface *vu_iface,
73
+ if (ret < 0) {
425
Error **errp)
74
+ printf("zone append failed: %s\n", strerror(-ret));
426
{
75
+ goto out;
427
+ QEMUBH *bh;
76
+ }
428
QIONetListener *listener = qio_net_listener_new();
77
+
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
78
+ if (pflag) {
430
errp) < 0) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
80
+ tosector(offset));
432
return false;
81
+ }
433
}
82
+
434
83
+out:
435
+ bh = qemu_bh_new(restart_listener_bh, server);
84
+ qemu_io_free(blk, buf, qiov.size,
436
+
85
+ flags & BDRV_REQ_REGISTERED_BUF);
437
/* zero out unspecified fields */
86
+ qemu_iovec_destroy(&qiov);
438
*server = (VuServer) {
87
+ return ret;
439
.listener = listener,
88
+}
440
+ .restart_listener_bh = bh,
89
+
441
.vu_iface = vu_iface,
90
+static const cmdinfo_t zone_append_cmd = {
442
.max_queues = max_queues,
91
+ .name = "zone_append",
443
.ctx = ctx,
92
+ .altname = "zap",
93
+ .cfunc = zone_append_f,
94
+ .argmin = 3,
95
+ .argmax = 4,
96
+ .args = "offset len [len..]",
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
99
+
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
101
static const cmdinfo_t truncate_cmd = {
102
.name = "truncate",
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
146
+
147
+
148
+(6) append write
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
150
+After zap done, the append sector is 0x0
151
+After appending the first zone firstly:
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
153
+After zap done, the append sector is 0x18
154
+After appending the first zone secondly:
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
156
+After zap done, the append sector is 0x80000
157
+After appending the second zone firstly:
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
159
+After zap done, the append sector is 0x80018
160
+After appending the second zone secondly:
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
162
*** done
163
--
444
--
164
2.39.2
445
2.26.2
446
diff view generated by jsdifflib
New patch
1
Propagate the flush return value since errors are possible.
1
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
block/export/vhost-user-blk-server.c | 11 +++++++----
8
1 file changed, 7 insertions(+), 4 deletions(-)
9
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/block/export/vhost-user-blk-server.c
13
+++ b/block/export/vhost-user-blk-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
15
return -EINVAL;
16
}
17
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
20
{
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
22
BlockBackend *backend = vdev_blk->backend;
23
- blk_co_flush(backend);
24
+ return blk_co_flush(backend);
25
}
26
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
29
break;
30
}
31
case VIRTIO_BLK_T_FLUSH:
32
- vu_block_flush(req);
33
- req->in->status = VIRTIO_BLK_S_OK;
34
+ if (vu_block_flush(req) == 0) {
35
+ req->in->status = VIRTIO_BLK_S_OK;
36
+ } else {
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
38
+ }
39
break;
40
case VIRTIO_BLK_T_GET_ID: {
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
42
--
43
2.26.2
44
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Use the new QAPI block exports API instead of defining our own QOM
2
2
objects.
3
Use scripts/update-linux-headers.sh to update headers to 6.3-rc1.
3
4
4
This is a large change because the lifecycle of VuBlockDev needs to
5
Signed-off-by: Sam Li <faithilikerun@gmail.com>
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
objects.
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
7
8
Message-id: 20230407082528.18841-2-faithilikerun@gmail.com
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
9
Several fields can be dropped since BlockExport already has equivalents.
10
11
The file names and meson build integration will be adjusted in a future
12
patch. libvhost-user should probably be built as a static library that
13
is linked into QEMU instead of as a .c file that results in duplicate
14
compilation.
15
16
The new command-line syntax is:
17
18
$ qemu-storage-daemon \
19
--blockdev file,node-name=drive0,filename=test.img \
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
36
--Stefan]
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
38
---
11
include/standard-headers/drm/drm_fourcc.h | 12 +++
39
qapi/block-export.json | 21 +-
12
include/standard-headers/linux/ethtool.h | 48 ++++++++-
40
block/export/vhost-user-blk-server.h | 23 +-
13
include/standard-headers/linux/fuse.h | 45 +++++++-
41
block/export/export.c | 6 +
14
include/standard-headers/linux/pci_regs.h | 1 +
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
15
include/standard-headers/linux/vhost_types.h | 2 +
43
util/vhost-user-server.c | 10 +-
16
include/standard-headers/linux/virtio_blk.h | 105 +++++++++++++++++++
44
block/export/meson.build | 1 +
17
linux-headers/asm-arm64/kvm.h | 1 +
45
block/meson.build | 1 -
18
linux-headers/asm-x86/kvm.h | 34 +++++-
46
7 files changed, 156 insertions(+), 358 deletions(-)
19
linux-headers/linux/kvm.h | 9 ++
47
20
linux-headers/linux/vfio.h | 15 +--
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
21
linux-headers/linux/vhost.h | 8 ++
22
11 files changed, 270 insertions(+), 10 deletions(-)
23
24
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
25
index XXXXXXX..XXXXXXX 100644
49
index XXXXXXX..XXXXXXX 100644
26
--- a/include/standard-headers/drm/drm_fourcc.h
50
--- a/qapi/block-export.json
27
+++ b/include/standard-headers/drm/drm_fourcc.h
51
+++ b/qapi/block-export.json
28
@@ -XXX,XX +XXX,XX @@ extern "C" {
52
@@ -XXX,XX +XXX,XX @@
29
*
53
'data': { '*name': 'str', '*description': 'str',
30
* The authoritative list of format modifier codes is found in
54
'*bitmap': 'str' } }
31
* `include/uapi/drm/drm_fourcc.h`
55
32
+ *
56
+##
33
+ * Open Source User Waiver
57
+# @BlockExportOptionsVhostUserBlk:
34
+ * -----------------------
58
+#
35
+ *
59
+# A vhost-user-blk block export.
36
+ * Because this is the authoritative source for pixel formats and modifiers
60
+#
37
+ * referenced by GL, Vulkan extensions and other standards and hence used both
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
38
+ * by open source and closed source driver stacks, the usual requirement for an
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
39
+ * upstream in-kernel or open source userspace user does not apply.
63
+# sockets.
40
+ *
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
41
+ * To ensure, as much as feasible, compatibility across stacks and avoid
65
+#
42
+ * confusion with incompatible enumerations stakeholders for all relevant driver
66
+# Since: 5.2
43
+ * stacks should approve additions.
67
+##
44
*/
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
45
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
46
#define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
70
+
47
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
71
##
72
# @NbdServerAddOptions:
73
#
74
@@ -XXX,XX +XXX,XX @@
75
# An enumeration of block export types
76
#
77
# @nbd: NBD export
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
79
#
80
# Since: 4.2
81
##
82
{ 'enum': 'BlockExportType',
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
48
index XXXXXXX..XXXXXXX 100644
99
index XXXXXXX..XXXXXXX 100644
49
--- a/include/standard-headers/linux/ethtool.h
100
--- a/block/export/vhost-user-blk-server.h
50
+++ b/include/standard-headers/linux/ethtool.h
101
+++ b/block/export/vhost-user-blk-server.h
51
@@ -XXX,XX +XXX,XX @@ enum ethtool_stringset {
102
@@ -XXX,XX +XXX,XX @@
52
    ETH_SS_COUNT
103
104
#ifndef VHOST_USER_BLK_SERVER_H
105
#define VHOST_USER_BLK_SERVER_H
106
-#include "util/vhost-user-server.h"
107
108
-typedef struct VuBlockDev VuBlockDev;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
110
-#define VHOST_USER_BLK_SERVER(obj) \
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
112
+#include "block/export.h"
113
114
-/* vhost user block device */
115
-struct VuBlockDev {
116
- Object parent_obj;
117
- char *node_name;
118
- SocketAddress *addr;
119
- AioContext *ctx;
120
- VuServer vu_server;
121
- bool running;
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
136
+++ b/block/export/export.c
137
@@ -XXX,XX +XXX,XX @@
138
#include "sysemu/block-backend.h"
139
#include "block/export.h"
140
#include "block/nbd.h"
141
+#if CONFIG_LINUX
142
+#include "block/export/vhost-user-blk-server.h"
143
+#endif
144
#include "qapi/error.h"
145
#include "qapi/qapi-commands-block-export.h"
146
#include "qapi/qapi-events-block-export.h"
147
@@ -XXX,XX +XXX,XX @@
148
149
static const BlockExportDriver *blk_exp_drivers[] = {
150
&blk_exp_nbd,
151
+#if CONFIG_LINUX
152
+ &blk_exp_vhost_user_blk,
153
+#endif
53
};
154
};
54
155
55
+/**
156
/* Only accessed from the main thread */
56
+ * enum ethtool_mac_stats_src - source of ethtool MAC statistics
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
57
+ * @ETHTOOL_MAC_STATS_SRC_AGGREGATE:
58
+ *    if device supports a MAC merge layer, this retrieves the aggregate
59
+ *    statistics of the eMAC and pMAC. Otherwise, it retrieves just the
60
+ *    statistics of the single (express) MAC.
61
+ * @ETHTOOL_MAC_STATS_SRC_EMAC:
62
+ *    if device supports a MM layer, this retrieves the eMAC statistics.
63
+ *    Otherwise, it retrieves the statistics of the single (express) MAC.
64
+ * @ETHTOOL_MAC_STATS_SRC_PMAC:
65
+ *    if device supports a MM layer, this retrieves the pMAC statistics.
66
+ */
67
+enum ethtool_mac_stats_src {
68
+    ETHTOOL_MAC_STATS_SRC_AGGREGATE,
69
+    ETHTOOL_MAC_STATS_SRC_EMAC,
70
+    ETHTOOL_MAC_STATS_SRC_PMAC,
71
+};
72
+
73
/**
74
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
75
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
76
@@ -XXX,XX +XXX,XX @@ enum ethtool_podl_pse_pw_d_status {
77
    ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
78
};
79
80
+/**
81
+ * enum ethtool_mm_verify_status - status of MAC Merge Verify function
82
+ * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN:
83
+ *    verification status is unknown
84
+ * @ETHTOOL_MM_VERIFY_STATUS_INITIAL:
85
+ *    the 802.3 Verify State diagram is in the state INIT_VERIFICATION
86
+ * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING:
87
+ *    the Verify State diagram is in the state VERIFICATION_IDLE,
88
+ *    SEND_VERIFY or WAIT_FOR_RESPONSE
89
+ * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
90
+ *    indicates that the Verify State diagram is in the state VERIFIED
91
+ * @ETHTOOL_MM_VERIFY_STATUS_FAILED:
92
+ *    the Verify State diagram is in the state VERIFY_FAIL
93
+ * @ETHTOOL_MM_VERIFY_STATUS_DISABLED:
94
+ *    verification of preemption operation is disabled
95
+ */
96
+enum ethtool_mm_verify_status {
97
+    ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
98
+    ETHTOOL_MM_VERIFY_STATUS_INITIAL,
99
+    ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
100
+    ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
101
+    ETHTOOL_MM_VERIFY_STATUS_FAILED,
102
+    ETHTOOL_MM_VERIFY_STATUS_DISABLED,
103
+};
104
+
105
/**
106
* struct ethtool_gstrings - string set for data tagging
107
* @cmd: Command number = %ETHTOOL_GSTRINGS
108
@@ -XXX,XX +XXX,XX @@ struct ethtool_rxnfc {
109
        uint32_t            rule_cnt;
110
        uint32_t            rss_context;
111
    };
112
-    uint32_t                rule_locs[0];
113
+    uint32_t                rule_locs[];
114
};
115
116
117
@@ -XXX,XX +XXX,XX @@ enum ethtool_link_mode_bit_indices {
118
    ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT     = 96,
119
    ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT     = 97,
120
    ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT     = 98,
121
+    ETHTOOL_LINK_MODE_10baseT1S_Full_BIT         = 99,
122
+    ETHTOOL_LINK_MODE_10baseT1S_Half_BIT         = 100,
123
+    ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT     = 101,
124
125
    /* must be last entry */
126
    __ETHTOOL_LINK_MODE_MASK_NBITS
127
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
128
index XXXXXXX..XXXXXXX 100644
158
index XXXXXXX..XXXXXXX 100644
129
--- a/include/standard-headers/linux/fuse.h
159
--- a/block/export/vhost-user-blk-server.c
130
+++ b/include/standard-headers/linux/fuse.h
160
+++ b/block/export/vhost-user-blk-server.c
131
@@ -XXX,XX +XXX,XX @@
132
* 7.38
133
* - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
134
* - add FOPEN_PARALLEL_DIRECT_WRITES
135
+ * - add total_extlen to fuse_in_header
136
+ * - add FUSE_MAX_NR_SECCTX
137
+ * - add extension header
138
+ * - add FUSE_EXT_GROUPS
139
+ * - add FUSE_CREATE_SUPP_GROUP
140
*/
141
142
#ifndef _LINUX_FUSE_H
143
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
144
* FUSE_SECURITY_CTX:    add security context to create, mkdir, symlink, and
145
*            mknod
146
* FUSE_HAS_INODE_DAX: use per inode DAX
147
+ * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
148
+ *            symlink and mknod (single group that matches parent)
149
*/
150
#define FUSE_ASYNC_READ        (1 << 0)
151
#define FUSE_POSIX_LOCKS    (1 << 1)
152
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
153
/* bits 32..63 get shifted down 32 bits into the flags2 field */
154
#define FUSE_SECURITY_CTX    (1ULL << 32)
155
#define FUSE_HAS_INODE_DAX    (1ULL << 33)
156
+#define FUSE_CREATE_SUPP_GROUP    (1ULL << 34)
157
158
/**
159
* CUSE INIT request/reply flags
160
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
161
*/
162
#define FUSE_EXPIRE_ONLY        (1 << 0)
163
164
+/**
165
+ * extension type
166
+ * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
167
+ * FUSE_EXT_GROUPS: &fuse_supp_groups extension
168
+ */
169
+enum fuse_ext_type {
170
+    /* Types 0..31 are reserved for fuse_secctx_header */
171
+    FUSE_MAX_NR_SECCTX    = 31,
172
+    FUSE_EXT_GROUPS        = 32,
173
+};
174
+
175
enum fuse_opcode {
176
    FUSE_LOOKUP        = 1,
177
    FUSE_FORGET        = 2, /* no reply */
178
@@ -XXX,XX +XXX,XX @@ struct fuse_in_header {
179
    uint32_t    uid;
180
    uint32_t    gid;
181
    uint32_t    pid;
182
-    uint32_t    padding;
183
+    uint16_t    total_extlen; /* length of extensions in 8byte units */
184
+    uint16_t    padding;
185
};
186
187
struct fuse_out_header {
188
@@ -XXX,XX +XXX,XX @@ struct fuse_secctx_header {
189
    uint32_t    nr_secctx;
190
};
191
192
+/**
193
+ * struct fuse_ext_header - extension header
194
+ * @size: total size of this extension including this header
195
+ * @type: type of extension
196
+ *
197
+ * This is made compatible with fuse_secctx_header by using type values >
198
+ * FUSE_MAX_NR_SECCTX
199
+ */
200
+struct fuse_ext_header {
201
+    uint32_t    size;
202
+    uint32_t    type;
203
+};
204
+
205
+/**
206
+ * struct fuse_supp_groups - Supplementary group extension
207
+ * @nr_groups: number of supplementary groups
208
+ * @groups: flexible array of group IDs
209
+ */
210
+struct fuse_supp_groups {
211
+    uint32_t    nr_groups;
212
+    uint32_t    groups[];
213
+};
214
+
215
#endif /* _LINUX_FUSE_H */
216
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
217
index XXXXXXX..XXXXXXX 100644
218
--- a/include/standard-headers/linux/pci_regs.h
219
+++ b/include/standard-headers/linux/pci_regs.h
220
@@ -XXX,XX +XXX,XX @@
221
#define PCI_EXP_LNKCTL2_TX_MARGIN    0x0380 /* Transmit Margin */
222
#define PCI_EXP_LNKCTL2_HASD        0x0020 /* HW Autonomous Speed Disable */
223
#define PCI_EXP_LNKSTA2        0x32    /* Link Status 2 */
224
+#define PCI_EXP_LNKSTA2_FLIT        0x0400 /* Flit Mode Status */
225
#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2    0x32    /* end of v2 EPs w/ link */
226
#define PCI_EXP_SLTCAP2        0x34    /* Slot Capabilities 2 */
227
#define PCI_EXP_SLTCAP2_IBPD    0x00000001 /* In-band PD Disable Supported */
228
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
229
index XXXXXXX..XXXXXXX 100644
230
--- a/include/standard-headers/linux/vhost_types.h
231
+++ b/include/standard-headers/linux/vhost_types.h
232
@@ -XXX,XX +XXX,XX @@ struct vhost_vdpa_iova_range {
233
#define VHOST_BACKEND_F_IOTLB_ASID 0x3
234
/* Device can be suspended */
235
#define VHOST_BACKEND_F_SUSPEND 0x4
236
+/* Device can be resumed */
237
+#define VHOST_BACKEND_F_RESUME 0x5
238
239
#endif
240
diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
241
index XXXXXXX..XXXXXXX 100644
242
--- a/include/standard-headers/linux/virtio_blk.h
243
+++ b/include/standard-headers/linux/virtio_blk.h
244
@@ -XXX,XX +XXX,XX @@
245
#define VIRTIO_BLK_F_DISCARD    13    /* DISCARD is supported */
246
#define VIRTIO_BLK_F_WRITE_ZEROES    14    /* WRITE ZEROES is supported */
247
#define VIRTIO_BLK_F_SECURE_ERASE    16 /* Secure Erase is supported */
248
+#define VIRTIO_BLK_F_ZONED        17    /* Zoned block device */
249
250
/* Legacy feature bits */
251
#ifndef VIRTIO_BLK_NO_LEGACY
252
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
253
    /* Secure erase commands must be aligned to this number of sectors. */
254
    __virtio32 secure_erase_sector_alignment;
255
256
+    /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
257
+    struct virtio_blk_zoned_characteristics {
258
+        uint32_t zone_sectors;
259
+        uint32_t max_open_zones;
260
+        uint32_t max_active_zones;
261
+        uint32_t max_append_sectors;
262
+        uint32_t write_granularity;
263
+        uint8_t model;
264
+        uint8_t unused2[3];
265
+    } zoned;
266
} QEMU_PACKED;
267
268
/*
269
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
270
/* Secure erase command */
271
#define VIRTIO_BLK_T_SECURE_ERASE    14
272
273
+/* Zone append command */
274
+#define VIRTIO_BLK_T_ZONE_APPEND 15
275
+
276
+/* Report zones command */
277
+#define VIRTIO_BLK_T_ZONE_REPORT 16
278
+
279
+/* Open zone command */
280
+#define VIRTIO_BLK_T_ZONE_OPEN 18
281
+
282
+/* Close zone command */
283
+#define VIRTIO_BLK_T_ZONE_CLOSE 20
284
+
285
+/* Finish zone command */
286
+#define VIRTIO_BLK_T_ZONE_FINISH 22
287
+
288
+/* Reset zone command */
289
+#define VIRTIO_BLK_T_ZONE_RESET 24
290
+
291
+/* Reset All zones command */
292
+#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
293
+
294
#ifndef VIRTIO_BLK_NO_LEGACY
295
/* Barrier before this op. */
296
#define VIRTIO_BLK_T_BARRIER    0x80000000
297
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_outhdr {
298
    __virtio64 sector;
299
};
300
301
+/*
302
+ * Supported zoned device models.
303
+ */
304
+
305
+/* Regular block device */
306
+#define VIRTIO_BLK_Z_NONE 0
307
+/* Host-managed zoned device */
308
+#define VIRTIO_BLK_Z_HM 1
309
+/* Host-aware zoned device */
310
+#define VIRTIO_BLK_Z_HA 2
311
+
312
+/*
313
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
314
+ */
315
+struct virtio_blk_zone_descriptor {
316
+    /* Zone capacity */
317
+    uint64_t z_cap;
318
+    /* The starting sector of the zone */
319
+    uint64_t z_start;
320
+    /* Zone write pointer position in sectors */
321
+    uint64_t z_wp;
322
+    /* Zone type */
323
+    uint8_t z_type;
324
+    /* Zone state */
325
+    uint8_t z_state;
326
+    uint8_t reserved[38];
327
+};
328
+
329
+struct virtio_blk_zone_report {
330
+    uint64_t nr_zones;
331
+    uint8_t reserved[56];
332
+    struct virtio_blk_zone_descriptor zones[];
333
+};
334
+
335
+/*
336
+ * Supported zone types.
337
+ */
338
+
339
+/* Conventional zone */
340
+#define VIRTIO_BLK_ZT_CONV 1
341
+/* Sequential Write Required zone */
342
+#define VIRTIO_BLK_ZT_SWR 2
343
+/* Sequential Write Preferred zone */
344
+#define VIRTIO_BLK_ZT_SWP 3
345
+
346
+/*
347
+ * Zone states that are available for zones of all types.
348
+ */
349
+
350
+/* Not a write pointer (conventional zones only) */
351
+#define VIRTIO_BLK_ZS_NOT_WP 0
352
+/* Empty */
353
+#define VIRTIO_BLK_ZS_EMPTY 1
354
+/* Implicitly Open */
355
+#define VIRTIO_BLK_ZS_IOPEN 2
356
+/* Explicitly Open */
357
+#define VIRTIO_BLK_ZS_EOPEN 3
358
+/* Closed */
359
+#define VIRTIO_BLK_ZS_CLOSED 4
360
+/* Read-Only */
361
+#define VIRTIO_BLK_ZS_RDONLY 13
362
+/* Full */
363
+#define VIRTIO_BLK_ZS_FULL 14
364
+/* Offline */
365
+#define VIRTIO_BLK_ZS_OFFLINE 15
366
+
367
/* Unmap this range (only valid for write zeroes command) */
368
#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP    0x00000001
369
370
@@ -XXX,XX +XXX,XX @@ struct virtio_scsi_inhdr {
371
#define VIRTIO_BLK_S_OK        0
372
#define VIRTIO_BLK_S_IOERR    1
373
#define VIRTIO_BLK_S_UNSUPP    2
374
+
375
+/* Error codes that are specific to zoned block devices */
376
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
377
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4
378
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5
379
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
380
+
381
#endif /* _LINUX_VIRTIO_BLK_H */
382
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
383
index XXXXXXX..XXXXXXX 100644
384
--- a/linux-headers/asm-arm64/kvm.h
385
+++ b/linux-headers/asm-arm64/kvm.h
386
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
387
#define KVM_ARM_VCPU_SVE        4 /* enable SVE for this CPU */
388
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS    5 /* VCPU uses address authentication */
389
#define KVM_ARM_VCPU_PTRAUTH_GENERIC    6 /* VCPU uses generic authentication */
390
+#define KVM_ARM_VCPU_HAS_EL2        7 /* Support nested virtualization */
391
392
struct kvm_vcpu_init {
393
    __u32 target;
394
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
395
index XXXXXXX..XXXXXXX 100644
396
--- a/linux-headers/asm-x86/kvm.h
397
+++ b/linux-headers/asm-x86/kvm.h
398
@@ -XXX,XX +XXX,XX @@
399
400
#include <linux/types.h>
401
#include <linux/ioctl.h>
402
+#include <linux/stddef.h>
403
404
#define KVM_PIO_PAGE_OFFSET 1
405
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
406
@@ -XXX,XX +XXX,XX @@ struct kvm_nested_state {
407
     * KVM_{GET,PUT}_NESTED_STATE ioctl values.
408
     */
409
    union {
410
-        struct kvm_vmx_nested_state_data vmx[0];
411
-        struct kvm_svm_nested_state_data svm[0];
412
+        __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx);
413
+        __DECLARE_FLEX_ARRAY(struct kvm_svm_nested_state_data, svm);
414
    } data;
415
};
416
417
@@ -XXX,XX +XXX,XX @@ struct kvm_pmu_event_filter {
418
#define KVM_PMU_EVENT_ALLOW 0
419
#define KVM_PMU_EVENT_DENY 1
420
421
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
422
+#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
423
+
424
+/*
425
+ * Masked event layout.
426
+ * Bits Description
427
+ * ---- -----------
428
+ * 7:0 event select (low bits)
429
+ * 15:8 umask match
430
+ * 31:16 unused
431
+ * 35:32 event select (high bits)
432
+ * 36:54 unused
433
+ * 55 exclude bit
434
+ * 63:56 umask mask
435
+ */
436
+
437
+#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
438
+    (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \
439
+    (((mask) & 0xFFULL) << 56) | \
440
+    (((match) & 0xFFULL) << 8) | \
441
+    ((__u64)(!!(exclude)) << 55))
442
+
443
+#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
444
+    (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
445
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK        (GENMASK_ULL(63, 56))
446
+#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH    (GENMASK_ULL(15, 8))
447
+#define KVM_PMU_MASKED_ENTRY_EXCLUDE        (BIT_ULL(55))
448
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT    (56)
449
+
450
/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
451
#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
452
#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
453
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
454
index XXXXXXX..XXXXXXX 100644
455
--- a/linux-headers/linux/kvm.h
456
+++ b/linux-headers/linux/kvm.h
457
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
458
        struct {
459
            __u8 ar;    /* the access register number */
460
            __u8 key;    /* access key, ignored if flag unset */
461
+            __u8 pad1[6];    /* ignored */
462
+            __u64 old_addr;    /* ignored if cmpxchg flag unset */
463
        };
464
        __u32 sida_offset; /* offset into the sida */
465
        __u8 reserved[32]; /* ignored */
466
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
467
#define KVM_S390_MEMOP_SIDA_WRITE    3
468
#define KVM_S390_MEMOP_ABSOLUTE_READ    4
469
#define KVM_S390_MEMOP_ABSOLUTE_WRITE    5
470
+#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG    6
471
+
472
/* flags for kvm_s390_mem_op->flags */
473
#define KVM_S390_MEMOP_F_CHECK_ONLY        (1ULL << 0)
474
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION    (1ULL << 1)
475
#define KVM_S390_MEMOP_F_SKEY_PROTECTION    (1ULL << 2)
476
477
+/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
478
+#define KVM_S390_MEMOP_EXTENSION_CAP_BASE    (1 << 0)
479
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG    (1 << 1)
480
+
481
/* for KVM_INTERRUPT */
482
struct kvm_interrupt {
483
    /* in */
484
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
485
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
486
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
487
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
488
+#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
489
490
#ifdef KVM_CAP_IRQ_ROUTING
491
492
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
493
index XXXXXXX..XXXXXXX 100644
494
--- a/linux-headers/linux/vfio.h
495
+++ b/linux-headers/linux/vfio.h
496
@@ -XXX,XX +XXX,XX @@
497
/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
498
#define VFIO_UNMAP_ALL            9
499
500
-/* Supports the vaddr flag for DMA map and unmap */
501
+/*
502
+ * Supports the vaddr flag for DMA map and unmap. Not supported for mediated
503
+ * devices, so this capability is subject to change as groups are added or
504
+ * removed.
505
+ */
506
#define VFIO_UPDATE_VADDR        10
507
508
/*
509
@@ -XXX,XX +XXX,XX @@ struct vfio_iommu_type1_info_dma_avail {
510
* Map process virtual addresses to IO virtual addresses using the
511
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
512
*
513
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
514
- * unblock translation of host virtual addresses in the iova range. The vaddr
515
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
516
* must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR. To
517
* maintain memory consistency within the user application, the updated vaddr
518
* must address the same memory object as originally mapped. Failure to do so
519
@@ -XXX,XX +XXX,XX @@ struct vfio_bitmap {
520
* must be 0. This cannot be combined with the get-dirty-bitmap flag.
521
*
522
* If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
523
- * virtual addresses in the iova range. Tasks that attempt to translate an
524
- * iova's vaddr will block. DMA to already-mapped pages continues. This
525
- * cannot be combined with the get-dirty-bitmap flag.
526
+ * virtual addresses in the iova range. DMA to already-mapped pages continues.
527
+ * Groups may not be added to the container while any addresses are invalid.
528
+ * This cannot be combined with the get-dirty-bitmap flag.
529
*/
530
struct vfio_iommu_type1_dma_unmap {
531
    __u32    argsz;
532
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
533
index XXXXXXX..XXXXXXX 100644
534
--- a/linux-headers/linux/vhost.h
535
+++ b/linux-headers/linux/vhost.h
536
@@ -XXX,XX +XXX,XX @@
161
@@ -XXX,XX +XXX,XX @@
537
*/
162
*/
538
#define VHOST_VDPA_SUSPEND        _IO(VHOST_VIRTIO, 0x7D)
163
#include "qemu/osdep.h"
539
164
#include "block/block.h"
540
+/* Resume a device so it can resume processing virtqueue requests
165
+#include "contrib/libvhost-user/libvhost-user.h"
541
+ *
166
+#include "standard-headers/linux/virtio_blk.h"
542
+ * After the return of this ioctl the device will have restored all the
167
+#include "util/vhost-user-server.h"
543
+ * necessary states and it is fully operational to continue processing the
168
#include "vhost-user-blk-server.h"
544
+ * virtqueue descriptors.
169
#include "qapi/error.h"
545
+ */
170
#include "qom/object_interfaces.h"
546
+#define VHOST_VDPA_RESUME        _IO(VHOST_VIRTIO, 0x7E)
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
547
+
197
+
548
#endif
198
+static void vu_blk_req_complete(VuBlkReq *req)
199
{
200
VuDev *vu_dev = &req->server->vu_dev;
201
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
203
free(req);
204
}
205
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
207
-{
208
- return container_of(server, VuBlockDev, vu_server);
209
-}
210
-
211
static int coroutine_fn
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
213
- uint32_t iovcnt, uint32_t type)
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
215
+ uint32_t iovcnt, uint32_t type)
216
{
217
struct virtio_blk_discard_write_zeroes desc;
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
220
return -EINVAL;
221
}
222
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
225
le32_to_cpu(desc.num_sectors) << 9 };
226
if (type == VIRTIO_BLK_T_DISCARD) {
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
229
return 0;
230
}
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
233
- range[0], range[1], 0) == 0) {
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
235
return 0;
236
}
237
}
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
239
return -EINVAL;
240
}
241
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
244
{
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
246
- BlockBackend *backend = vdev_blk->backend;
247
- return blk_co_flush(backend);
248
-}
249
-
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
251
-{
252
- VuBlockReq *req = opaque;
253
+ VuBlkReq *req = opaque;
254
VuServer *server = req->server;
255
VuVirtqElement *elem = &req->elem;
256
uint32_t type;
257
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
259
- BlockBackend *backend = vdev_blk->backend;
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
261
+ BlockBackend *blk = vexp->export.blk;
262
263
struct iovec *in_iov = elem->in_sg;
264
struct iovec *out_iov = elem->out_sg;
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
266
bool is_write = type & VIRTIO_BLK_T_OUT;
267
req->sector_num = le64_to_cpu(req->out.sector);
268
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
270
+ if (is_write && !vexp->writable) {
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
272
+ break;
273
+ }
274
+
275
+ int64_t offset = req->sector_num * vexp->blk_size;
276
QEMUIOVector qiov;
277
if (is_write) {
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
280
- &qiov, 0);
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
305
+
306
+ if (!vexp->writable) {
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
308
+ break;
309
+ }
310
+
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
312
if (rc == 0) {
313
req->in->status = VIRTIO_BLK_S_OK;
314
} else {
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
316
break;
317
}
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
719
+ }
720
+ vexp->blk_size = logical_block_size;
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
818
index XXXXXXX..XXXXXXX 100644
819
--- a/util/vhost-user-server.c
820
+++ b/util/vhost-user-server.c
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
822
Error **errp)
823
{
824
QEMUBH *bh;
825
- QIONetListener *listener = qio_net_listener_new();
826
+ QIONetListener *listener;
827
+
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
833
+
834
+ listener = qio_net_listener_new();
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
836
errp) < 0) {
837
object_unref(OBJECT(listener));
838
diff --git a/block/export/meson.build b/block/export/meson.build
839
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
841
+++ b/block/export/meson.build
842
@@ -1 +1,2 @@
843
block_ss.add(files('export.c'))
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
845
diff --git a/block/meson.build b/block/meson.build
846
index XXXXXXX..XXXXXXX 100644
847
--- a/block/meson.build
848
+++ b/block/meson.build
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
549
--
857
--
550
2.39.2
858
2.26.2
859
diff view generated by jsdifflib
New patch
1
Headers used by other subsystems are located in include/. Also add the
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
1
3
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
MAINTAINERS | 4 +++-
9
{util => include/qemu}/vhost-user-server.h | 0
10
block/export/vhost-user-blk-server.c | 2 +-
11
util/vhost-user-server.c | 2 +-
12
4 files changed, 5 insertions(+), 3 deletions(-)
13
rename {util => include/qemu}/vhost-user-server.h (100%)
14
15
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
17
--- a/MAINTAINERS
18
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
S: Maintained
22
F: block/export/vhost-user-blk-server.c
23
-F: util/vhost-user-server.c
24
+F: block/export/vhost-user-blk-server.h
25
+F: include/qemu/vhost-user-server.h
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+F: util/vhost-user-server.c
28
29
Replication
30
M: Wen Congyang <wencongyang2@huawei.com>
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
similarity index 100%
33
rename from util/vhost-user-server.h
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
38
+++ b/block/export/vhost-user-blk-server.c
39
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
41
#include "contrib/libvhost-user/libvhost-user.h"
42
#include "standard-headers/linux/virtio_blk.h"
43
-#include "util/vhost-user-server.h"
44
+#include "qemu/vhost-user-server.h"
45
#include "vhost-user-blk-server.h"
46
#include "qapi/error.h"
47
#include "qom/object_interfaces.h"
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/util/vhost-user-server.c
51
+++ b/util/vhost-user-server.c
52
@@ -XXX,XX +XXX,XX @@
53
*/
54
#include "qemu/osdep.h"
55
#include "qemu/main-loop.h"
56
+#include "qemu/vhost-user-server.h"
57
#include "block/aio-wait.h"
58
-#include "vhost-user-server.h"
59
60
/*
61
* Theory of operation:
62
--
63
2.26.2
64
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
2
the static library once and then reuse it throughout QEMU.
2
3
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
vhost-user tools (vhost-user-gpu, etc) do.
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
6
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
9
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
10
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
11
<philmd@linaro.org>.
12
--Stefan]
10
--Stefan]
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
---
12
---
15
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
13
block/export/export.c | 8 ++++----
16
1 file changed, 43 insertions(+)
14
block/export/meson.build | 2 +-
15
contrib/libvhost-user/meson.build | 1 +
16
meson.build | 6 +++++-
17
util/meson.build | 4 +++-
18
5 files changed, 14 insertions(+), 7 deletions(-)
17
19
18
diff --git a/include/block/block-common.h b/include/block/block-common.h
20
diff --git a/block/export/export.c b/block/export/export.c
19
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
20
--- a/include/block/block-common.h
22
--- a/block/export/export.c
21
+++ b/include/block/block-common.h
23
+++ b/block/export/export.c
22
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
24
@@ -XXX,XX +XXX,XX @@
23
typedef struct BdrvChild BdrvChild;
25
#include "sysemu/block-backend.h"
24
typedef struct BdrvChildClass BdrvChildClass;
26
#include "block/export.h"
25
27
#include "block/nbd.h"
26
+typedef enum BlockZoneOp {
28
-#if CONFIG_LINUX
27
+ BLK_ZO_OPEN,
29
-#include "block/export/vhost-user-blk-server.h"
28
+ BLK_ZO_CLOSE,
30
-#endif
29
+ BLK_ZO_FINISH,
31
#include "qapi/error.h"
30
+ BLK_ZO_RESET,
32
#include "qapi/qapi-commands-block-export.h"
31
+} BlockZoneOp;
33
#include "qapi/qapi-events-block-export.h"
34
#include "qemu/id.h"
35
+#ifdef CONFIG_VHOST_USER
36
+#include "vhost-user-blk-server.h"
37
+#endif
38
39
static const BlockExportDriver *blk_exp_drivers[] = {
40
&blk_exp_nbd,
41
-#if CONFIG_LINUX
42
+#ifdef CONFIG_VHOST_USER
43
&blk_exp_vhost_user_blk,
44
#endif
45
};
46
diff --git a/block/export/meson.build b/block/export/meson.build
47
index XXXXXXX..XXXXXXX 100644
48
--- a/block/export/meson.build
49
+++ b/block/export/meson.build
50
@@ -XXX,XX +XXX,XX @@
51
block_ss.add(files('export.c'))
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
55
index XXXXXXX..XXXXXXX 100644
56
--- a/contrib/libvhost-user/meson.build
57
+++ b/contrib/libvhost-user/meson.build
58
@@ -XXX,XX +XXX,XX @@
59
libvhost_user = static_library('vhost-user',
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
61
build_by_default: false)
62
+vhost_user = declare_dependency(link_with: libvhost_user)
63
diff --git a/meson.build b/meson.build
64
index XXXXXXX..XXXXXXX 100644
65
--- a/meson.build
66
+++ b/meson.build
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
68
'util',
69
]
70
71
+vhost_user = not_found
72
+if 'CONFIG_VHOST_USER' in config_host
73
+ subdir('contrib/libvhost-user')
74
+endif
32
+
75
+
33
+typedef enum BlockZoneModel {
76
subdir('qapi')
34
+ BLK_Z_NONE = 0x0, /* Regular block device */
77
subdir('qobject')
35
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
78
subdir('stubs')
36
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
79
@@ -XXX,XX +XXX,XX @@ if have_tools
37
+} BlockZoneModel;
80
install: true)
38
+
81
39
+typedef enum BlockZoneState {
82
if 'CONFIG_VHOST_USER' in config_host
40
+ BLK_ZS_NOT_WP = 0x0,
83
- subdir('contrib/libvhost-user')
41
+ BLK_ZS_EMPTY = 0x1,
84
subdir('contrib/vhost-user-blk')
42
+ BLK_ZS_IOPEN = 0x2,
85
subdir('contrib/vhost-user-gpu')
43
+ BLK_ZS_EOPEN = 0x3,
86
subdir('contrib/vhost-user-input')
44
+ BLK_ZS_CLOSED = 0x4,
87
diff --git a/util/meson.build b/util/meson.build
45
+ BLK_ZS_RDONLY = 0xD,
88
index XXXXXXX..XXXXXXX 100644
46
+ BLK_ZS_FULL = 0xE,
89
--- a/util/meson.build
47
+ BLK_ZS_OFFLINE = 0xF,
90
+++ b/util/meson.build
48
+} BlockZoneState;
91
@@ -XXX,XX +XXX,XX @@ if have_block
49
+
92
util_ss.add(files('main-loop.c'))
50
+typedef enum BlockZoneType {
93
util_ss.add(files('nvdimm-utils.c'))
51
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
52
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
53
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
54
+} BlockZoneType;
97
+ files('vhost-user-server.c'), vhost_user
55
+
98
+ ])
56
+/*
99
util_ss.add(files('block-helpers.c'))
57
+ * Zone descriptor data structure.
100
util_ss.add(files('qemu-coroutine-sleep.c'))
58
+ * Provides information on a zone with all position and size values in bytes.
101
util_ss.add(files('qemu-co-shared-resource.c'))
59
+ */
60
+typedef struct BlockZoneDescriptor {
61
+ uint64_t start;
62
+ uint64_t length;
63
+ uint64_t cap;
64
+ uint64_t wp;
65
+ BlockZoneType type;
66
+ BlockZoneState state;
67
+} BlockZoneDescriptor;
68
+
69
typedef struct BlockDriverInfo {
70
/* in bytes, 0 if irrelevant */
71
int cluster_size;
72
--
102
--
73
2.39.2
103
2.26.2
74
104
75
diff view generated by jsdifflib
New patch
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
1
2
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
meson.build | 12 ++++++++++--
10
storage-daemon/meson.build | 3 +--
11
2 files changed, 11 insertions(+), 4 deletions(-)
12
13
diff --git a/meson.build b/meson.build
14
index XXXXXXX..XXXXXXX 100644
15
--- a/meson.build
16
+++ b/meson.build
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
18
# os-win32.c does not
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
21
-softmmu_ss.add_all(blockdev_ss)
22
23
common_ss.add(files('cpus-common.c'))
24
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
26
link_args: '@block.syms',
27
dependencies: [crypto, io])
28
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
31
+ dependencies: blockdev_ss.dependencies(),
32
+ name_suffix: 'fa',
33
+ build_by_default: false)
34
+
35
+blockdev = declare_dependency(link_whole: [libblockdev],
36
+ dependencies: [block])
37
+
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
40
dependencies: qmp_ss.dependencies(),
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
42
install_dir: config_host['qemu_moddir'])
43
endforeach
44
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
47
common_ss.add(qom, qemuutil)
48
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
51
index XXXXXXX..XXXXXXX 100644
52
--- a/storage-daemon/meson.build
53
+++ b/storage-daemon/meson.build
54
@@ -XXX,XX +XXX,XX @@
55
qsd_ss = ss.source_set()
56
qsd_ss.add(files('qemu-storage-daemon.c'))
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
58
-qsd_ss.add_all(blockdev_ss)
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
60
61
subdir('qapi')
62
63
--
64
2.26.2
65
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
2
They are not used by other programs and are not otherwise needed in
3
libblock.
2
4
3
raw-format driver usually sits on top of file-posix driver. It needs to
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
4
pass through requests of zone commands.
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
7
(libblockdev) a stub function is required..
5
8
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Make qemu-nbd.c use signal handling utility functions instead of
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
duplicating the code. This helps because os-posix.c is in libblockdev
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
Once we use the signal handling utility functions we also end up
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
13
providing the necessary symbol.
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
14
12
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
14
<philmd@linaro.org>.
17
Reviewed-by: Eric Blake <eblake@redhat.com>
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
15
--Stefan]
20
--Stefan]
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
22
---
18
block/raw-format.c | 17 +++++++++++++++++
23
qemu-nbd.c | 21 ++++++++-------------
19
1 file changed, 17 insertions(+)
24
stubs/blk-exp-close-all.c | 7 +++++++
25
block/export/meson.build | 4 ++--
26
meson.build | 4 ++--
27
nbd/meson.build | 2 ++
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
20
31
21
diff --git a/block/raw-format.c b/block/raw-format.c
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
22
index XXXXXXX..XXXXXXX 100644
33
index XXXXXXX..XXXXXXX 100644
23
--- a/block/raw-format.c
34
--- a/qemu-nbd.c
24
+++ b/block/raw-format.c
35
+++ b/qemu-nbd.c
25
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
36
@@ -XXX,XX +XXX,XX @@
26
return bdrv_co_pdiscard(bs->file, offset, bytes);
37
#include "qapi/error.h"
38
#include "qemu/cutils.h"
39
#include "sysemu/block-backend.h"
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
41
#include "block/block_int.h"
42
#include "block/nbd.h"
43
#include "qemu/main-loop.h"
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
27
}
45
}
28
46
29
+static int coroutine_fn GRAPH_RDLOCK
47
#ifdef CONFIG_POSIX
30
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
48
-static void termsig_handler(int signum)
31
+ unsigned int *nr_zones,
49
+/*
32
+ BlockZoneDescriptor *zones)
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
52
+ */
53
+void qemu_system_killed(int signum, pid_t pid)
54
{
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
56
qemu_notify_event();
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
58
BlockExportOptions *export_opts;
59
60
#ifdef CONFIG_POSIX
61
- /*
62
- * Exit gracefully on various signals, which includes SIGTERM used
63
- * by 'qemu-nbd -v -c'.
64
- */
65
- struct sigaction sa_sigterm;
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
67
- sa_sigterm.sa_handler = termsig_handler;
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
69
- sigaction(SIGINT, &sa_sigterm, NULL);
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
71
-
72
- signal(SIGPIPE, SIG_IGN);
73
+ os_setup_early_signal_handling();
74
+ os_setup_signal_handling();
75
#endif
76
77
socket_init();
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
79
new file mode 100644
80
index XXXXXXX..XXXXXXX
81
--- /dev/null
82
+++ b/stubs/blk-exp-close-all.c
83
@@ -XXX,XX +XXX,XX @@
84
+#include "qemu/osdep.h"
85
+#include "block/export.h"
86
+
87
+/* Only used in programs that support block exports (libblockdev.fa) */
88
+void blk_exp_close_all(void)
33
+{
89
+{
34
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
35
+}
90
+}
36
+
91
diff --git a/block/export/meson.build b/block/export/meson.build
37
+static int coroutine_fn GRAPH_RDLOCK
92
index XXXXXXX..XXXXXXX 100644
38
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
93
--- a/block/export/meson.build
39
+ int64_t offset, int64_t len)
94
+++ b/block/export/meson.build
40
+{
95
@@ -XXX,XX +XXX,XX @@
41
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
96
-block_ss.add(files('export.c'))
42
+}
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
43
+
98
+blockdev_ss.add(files('export.c'))
44
static int64_t coroutine_fn GRAPH_RDLOCK
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
45
raw_co_getlength(BlockDriverState *bs)
100
diff --git a/meson.build b/meson.build
46
{
101
index XXXXXXX..XXXXXXX 100644
47
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
102
--- a/meson.build
48
.bdrv_co_pwritev = &raw_co_pwritev,
103
+++ b/meson.build
49
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
50
.bdrv_co_pdiscard = &raw_co_pdiscard,
105
51
+ .bdrv_co_zone_report = &raw_co_zone_report,
106
block_ss.add(files(
52
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
107
'block.c',
53
.bdrv_co_block_status = &raw_co_block_status,
108
- 'blockdev-nbd.c',
54
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
109
'blockjob.c',
55
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
110
'job.c',
111
'qemu-io-cmds.c',
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
113
114
blockdev_ss.add(files(
115
'blockdev.c',
116
+ 'blockdev-nbd.c',
117
'iothread.c',
118
'job-qmp.c',
119
))
120
@@ -XXX,XX +XXX,XX @@ if have_tools
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
122
dependencies: [block, qemuutil], install: true)
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
124
- dependencies: [block, qemuutil], install: true)
125
+ dependencies: [blockdev, qemuutil], install: true)
126
127
subdir('storage-daemon')
128
subdir('contrib/rdmacm-mux')
129
diff --git a/nbd/meson.build b/nbd/meson.build
130
index XXXXXXX..XXXXXXX 100644
131
--- a/nbd/meson.build
132
+++ b/nbd/meson.build
133
@@ -XXX,XX +XXX,XX @@
134
block_ss.add(files(
135
'client.c',
136
'common.c',
137
+))
138
+blockdev_ss.add(files(
139
'server.c',
140
))
141
diff --git a/stubs/meson.build b/stubs/meson.build
142
index XXXXXXX..XXXXXXX 100644
143
--- a/stubs/meson.build
144
+++ b/stubs/meson.build
145
@@ -XXX,XX +XXX,XX @@
146
stub_ss.add(files('arch_type.c'))
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
56
--
153
--
57
2.39.2
154
2.26.2
58
155
59
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Make it possible to specify the iothread where the export will run. By
2
default the block node can be moved to other AioContexts later and the
3
export will follow. The fixed-iothread option forces strict behavior
4
that prevents changing AioContext while the export is active. See the
5
QAPI docs for details.
2
6
3
Since Linux doesn't have a user API to issue zone append operations to
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
zoned devices from user space, the file-posix driver is modified to add
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
5
zone append emulation using regular writes. To do this, the file-posix
9
[Fix stray '#' character in block-export.json and add missing "(since:
6
driver tracks the wp location of all zones of the device. It uses an
10
5.2)" as suggested by Eric Blake.
7
array of uint64_t. The most significant bit of each wp location indicates
11
--Stefan]
8
if the zone type is conventional zones.
9
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230407081657.17947-2-faithilikerun@gmail.com
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
13
---
20
include/block/block-common.h | 14 +++
14
qapi/block-export.json | 11 ++++++++++
21
include/block/block_int-common.h | 5 +
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
22
block/file-posix.c | 173 ++++++++++++++++++++++++++++++-
16
block/export/vhost-user-blk-server.c | 5 ++++-
23
3 files changed, 189 insertions(+), 3 deletions(-)
17
nbd/server.c | 2 --
18
4 files changed, 45 insertions(+), 4 deletions(-)
24
19
25
diff --git a/include/block/block-common.h b/include/block/block-common.h
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
26
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
27
--- a/include/block/block-common.h
22
--- a/qapi/block-export.json
28
+++ b/include/block/block-common.h
23
+++ b/qapi/block-export.json
29
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
24
@@ -XXX,XX +XXX,XX @@
30
BlockZoneState state;
25
# export before completion is signalled. (since: 5.2;
31
} BlockZoneDescriptor;
26
# default: false)
32
27
#
33
+/*
28
+# @iothread: The name of the iothread object where the export will run. The
34
+ * Track write pointers of a zone in bytes.
29
+# default is to use the thread currently associated with the
35
+ */
30
+# block node. (since: 5.2)
36
+typedef struct BlockZoneWps {
31
+#
37
+ CoMutex colock;
32
+# @fixed-iothread: True prevents the block node from being moved to another
38
+ uint64_t wp[];
33
+# thread while the export is active. If true and @iothread is
39
+} BlockZoneWps;
34
+# given, export creation fails if the block node cannot be
35
+# moved to the iothread. The default is false. (since: 5.2)
36
+#
37
# Since: 4.2
38
##
39
{ 'union': 'BlockExportOptions',
40
'base': { 'type': 'BlockExportType',
41
'id': 'str',
42
+     '*fixed-iothread': 'bool',
43
+     '*iothread': 'str',
44
'node-name': 'str',
45
'*writable': 'bool',
46
'*writethrough': 'bool' },
47
diff --git a/block/export/export.c b/block/export/export.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/block/export/export.c
50
+++ b/block/export/export.c
51
@@ -XXX,XX +XXX,XX @@
52
53
#include "block/block.h"
54
#include "sysemu/block-backend.h"
55
+#include "sysemu/iothread.h"
56
#include "block/export.h"
57
#include "block/nbd.h"
58
#include "qapi/error.h"
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
60
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
62
{
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
64
const BlockExportDriver *drv;
65
BlockExport *exp = NULL;
66
BlockDriverState *bs;
67
- BlockBackend *blk;
68
+ BlockBackend *blk = NULL;
69
AioContext *ctx;
70
uint64_t perm;
71
int ret;
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
73
ctx = bdrv_get_aio_context(bs);
74
aio_context_acquire(ctx);
75
76
+ if (export->has_iothread) {
77
+ IOThread *iothread;
78
+ AioContext *new_ctx;
40
+
79
+
41
typedef struct BlockDriverInfo {
80
+ iothread = iothread_by_id(export->iothread);
42
/* in bytes, 0 if irrelevant */
81
+ if (!iothread) {
43
int cluster_size;
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
44
@@ -XXX,XX +XXX,XX @@ typedef enum {
83
+ goto fail;
45
#define BDRV_SECTOR_BITS 9
46
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
47
48
+/*
49
+ * Get the first most significant bit of wp. If it is zero, then
50
+ * the zone type is SWR.
51
+ */
52
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
53
+
54
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
55
INT_MAX >> BDRV_SECTOR_BITS)
56
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
57
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/block_int-common.h
60
+++ b/include/block/block_int-common.h
61
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
62
63
/* maximum number of active zones */
64
int64_t max_active_zones;
65
+
66
+ int64_t write_granularity;
67
} BlockLimits;
68
69
typedef struct BdrvOpBlocker BdrvOpBlocker;
70
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
71
CoMutex bsc_modify_lock;
72
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
73
BdrvBlockStatusCache *block_status_cache;
74
+
75
+ /* array of write pointers' location of each zone in the zoned device. */
76
+ BlockZoneWps *wps;
77
};
78
79
struct BlockBackendRootState {
80
diff --git a/block/file-posix.c b/block/file-posix.c
81
index XXXXXXX..XXXXXXX 100644
82
--- a/block/file-posix.c
83
+++ b/block/file-posix.c
84
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
85
#endif
86
}
87
88
+#if defined(CONFIG_BLKZONED)
89
+/*
90
+ * If the reset_all flag is true, then the wps of zone whose state is
91
+ * not readonly or offline should be all reset to the start sector.
92
+ * Else, take the real wp of the device.
93
+ */
94
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
95
+ unsigned int nrz, bool reset_all)
96
+{
97
+ struct blk_zone *blkz;
98
+ size_t rep_size;
99
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
100
+ BlockZoneWps *wps = bs->wps;
101
+ int j = offset / bs->bl.zone_size;
102
+ int ret, n = 0, i = 0;
103
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
104
+ g_autofree struct blk_zone_report *rep = NULL;
105
+
106
+ rep = g_malloc(rep_size);
107
+ blkz = (struct blk_zone *)(rep + 1);
108
+ while (n < nrz) {
109
+ memset(rep, 0, rep_size);
110
+ rep->sector = sector;
111
+ rep->nr_zones = nrz - n;
112
+
113
+ do {
114
+ ret = ioctl(fd, BLKREPORTZONE, rep);
115
+ } while (ret != 0 && errno == EINTR);
116
+ if (ret != 0) {
117
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
118
+ fd, offset, errno);
119
+ return -errno;
120
+ }
84
+ }
121
+
85
+
122
+ if (!rep->nr_zones) {
86
+ new_ctx = iothread_get_aio_context(iothread);
123
+ break;
124
+ }
125
+
87
+
126
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
127
+ /*
89
+ if (ret == 0) {
128
+ * The wp tracking cares only about sequential writes required and
90
+ aio_context_release(ctx);
129
+ * sequential write preferred zones so that the wp can advance to
91
+ aio_context_acquire(new_ctx);
130
+ * the right location.
92
+ ctx = new_ctx;
131
+ * Use the most significant bit of the wp location to indicate the
93
+ } else if (fixed_iothread) {
132
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
94
+ goto fail;
133
+ */
134
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
135
+ wps->wp[j] |= 1ULL << 63;
136
+ } else {
137
+ switch(blkz[i].cond) {
138
+ case BLK_ZONE_COND_FULL:
139
+ case BLK_ZONE_COND_READONLY:
140
+ /* Zone not writable */
141
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
142
+ break;
143
+ case BLK_ZONE_COND_OFFLINE:
144
+ /* Zone not writable nor readable */
145
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
146
+ break;
147
+ default:
148
+ if (reset_all) {
149
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
150
+ } else {
151
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
152
+ }
153
+ break;
154
+ }
155
+ }
156
+ }
157
+ sector = blkz[i - 1].start + blkz[i - 1].len;
158
+ }
159
+
160
+ return 0;
161
+}
162
+
163
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
164
+ unsigned int nrz)
165
+{
166
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
167
+ error_report("update zone wp failed");
168
+ }
169
+}
170
+#endif
171
+
172
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
173
{
174
BDRVRawState *s = bs->opaque;
175
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
176
if (ret >= 0) {
177
bs->bl.max_active_zones = ret;
178
}
179
+
180
+ ret = get_sysfs_long_val(&st, "physical_block_size");
181
+ if (ret >= 0) {
182
+ bs->bl.write_granularity = ret;
183
+ }
184
+
185
+ /* The refresh_limits() function can be called multiple times. */
186
+ g_free(bs->wps);
187
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
188
+ sizeof(int64_t) * bs->bl.nr_zones);
189
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
190
+ if (ret < 0) {
191
+ error_setg_errno(errp, -ret, "report wps failed");
192
+ bs->wps = NULL;
193
+ return;
194
+ }
195
+ qemu_co_mutex_init(&bs->wps->colock);
196
return;
197
}
198
out:
199
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
200
{
201
BDRVRawState *s = bs->opaque;
202
RawPosixAIOData acb;
203
+ int ret;
204
205
if (fd_open(bs) < 0)
206
return -EIO;
207
+#if defined(CONFIG_BLKZONED)
208
+ if (type & QEMU_AIO_WRITE && bs->wps) {
209
+ qemu_co_mutex_lock(&bs->wps->colock);
210
+ }
211
+#endif
212
213
/*
214
* When using O_DIRECT, the request must be aligned to be able to use
215
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
216
} else if (s->use_linux_io_uring) {
217
LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
218
assert(qiov->size == bytes);
219
- return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
220
+ ret = luring_co_submit(bs, aio, s->fd, offset, qiov, type);
221
+ goto out;
222
#endif
223
#ifdef CONFIG_LINUX_AIO
224
} else if (s->use_linux_aio) {
225
LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
226
assert(qiov->size == bytes);
227
- return laio_co_submit(bs, aio, s->fd, offset, qiov, type,
228
+ ret = laio_co_submit(bs, aio, s->fd, offset, qiov, type,
229
s->aio_max_batch);
230
+ goto out;
231
#endif
232
}
233
234
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
235
};
236
237
assert(qiov->size == bytes);
238
- return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
239
+ ret = raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
240
+
241
+out:
242
+#if defined(CONFIG_BLKZONED)
243
+ BlockZoneWps *wps = bs->wps;
244
+ if (ret == 0) {
245
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
246
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
247
+ if (!BDRV_ZT_IS_CONV(*wp)) {
248
+ /* Advance the wp if needed */
249
+ if (offset + bytes > *wp) {
250
+ *wp = offset + bytes;
251
+ }
252
+ }
253
+ }
254
+ } else {
255
+ if (type & QEMU_AIO_WRITE) {
256
+ update_zones_wp(bs, s->fd, 0, 1);
257
+ }
95
+ }
258
+ }
96
+ }
259
+
97
+
260
+ if (type & QEMU_AIO_WRITE && wps) {
98
/*
261
+ qemu_co_mutex_unlock(&wps->colock);
99
* Block exports are used for non-shared storage migration. Make sure
262
+ }
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
263
+#endif
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
264
+ return ret;
265
}
266
267
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
268
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
269
BDRVRawState *s = bs->opaque;
270
271
if (s->fd >= 0) {
272
+#if defined(CONFIG_BLKZONED)
273
+ g_free(bs->wps);
274
+#endif
275
qemu_close(s->fd);
276
s->fd = -1;
277
}
102
}
278
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
103
279
const char *op_name;
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
280
unsigned long zo;
105
+
281
int ret;
106
+ if (!fixed_iothread) {
282
+ BlockZoneWps *wps = bs->wps;
107
+ blk_set_allow_aio_context_change(blk, true);
283
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
284
285
zone_size = bs->bl.zone_size;
286
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
287
return -EINVAL;
288
}
289
290
+ QEMU_LOCK_GUARD(&wps->colock);
291
+ uint32_t i = offset / bs->bl.zone_size;
292
+ uint32_t nrz = len / bs->bl.zone_size;
293
+ uint64_t *wp = &wps->wp[i];
294
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
295
+ error_report("zone mgmt operations are not allowed for conventional zones");
296
+ return -EIO;
297
+ }
108
+ }
298
+
109
+
299
switch (op) {
110
ret = blk_insert_bs(blk, bs, errp);
300
case BLK_ZO_OPEN:
111
if (ret < 0) {
301
op_name = "BLKOPENZONE";
112
goto fail;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
303
len >> BDRV_SECTOR_BITS);
114
index XXXXXXX..XXXXXXX 100644
304
ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
115
--- a/block/export/vhost-user-blk-server.c
305
if (ret != 0) {
116
+++ b/block/export/vhost-user-blk-server.c
306
+ update_zones_wp(bs, s->fd, offset, i);
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
307
error_report("ioctl %s failed %d", op_name, ret);
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
308
+ return ret;
119
{
309
+ }
120
VuBlkExport *vexp = opaque;
310
+
121
+
311
+ if (zo == BLKRESETZONE && len == capacity) {
122
+ vexp->export.ctx = ctx;
312
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
313
+ if (ret < 0) {
124
}
314
+ error_report("reporting single wp failed");
125
315
+ return ret;
126
static void blk_aio_detach(void *opaque)
316
+ }
127
{
317
+ } else if (zo == BLKRESETZONE) {
128
VuBlkExport *vexp = opaque;
318
+ for (int j = 0; j < nrz; ++j) {
129
+
319
+ wp[j] = offset + j * zone_size;
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
320
+ }
131
+ vexp->export.ctx = NULL;
321
+ } else if (zo == BLKFINISHZONE) {
132
}
322
+ for (int j = 0; j < nrz; ++j) {
133
323
+ /* The zoned device allows the last zone smaller that the
134
static void
324
+ * zone size. */
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
325
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
326
+ }
137
logical_block_size);
138
139
- blk_set_allow_aio_context_change(exp->blk, true);
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
141
vexp);
142
143
diff --git a/nbd/server.c b/nbd/server.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/nbd/server.c
146
+++ b/nbd/server.c
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
148
return ret;
327
}
149
}
328
150
329
return ret;
151
- blk_set_allow_aio_context_change(blk, true);
152
-
153
QTAILQ_INIT(&exp->clients);
154
exp->name = g_strdup(arg->name);
155
exp->description = g_strdup(arg->description);
330
--
156
--
331
2.39.2
157
2.26.2
158
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
1
Allow the number of queues to be configured using --export
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
2
6
3
Introduce the BdrvDmgUncompressFunc type defintion. To emphasis
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
4
dmg_uncompress_bz2 and dmg_uncompress_lzfse are pointer to functions,
8
virtqueue processing runs in the same AioContext. No new locking is
5
declare them using this new typedef.
9
needed.
6
10
7
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
8
Message-id: 20230320152610.32052-1-philmd@linaro.org
12
Note that the feature bit only announces the presence of the num_queues
13
configuration space field. It does not promise that there is more than 1
14
virtqueue, so we can set it unconditionally.
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
26
[Fixed accidental tab characters as suggested by Markus Armbruster
27
--Stefan]
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
29
---
11
block/dmg.h | 8 ++++----
30
qapi/block-export.json | 10 +++++++---
12
block/dmg.c | 7 ++-----
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
13
2 files changed, 6 insertions(+), 9 deletions(-)
32
2 files changed, 25 insertions(+), 9 deletions(-)
14
33
15
diff --git a/block/dmg.h b/block/dmg.h
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
16
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
17
--- a/block/dmg.h
36
--- a/qapi/block-export.json
18
+++ b/block/dmg.h
37
+++ b/qapi/block-export.json
19
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVDMGState {
38
@@ -XXX,XX +XXX,XX @@
20
z_stream zstream;
39
# SocketAddress types are supported. Passed fds must be UNIX domain
21
} BDRVDMGState;
40
# sockets.
22
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
23
-extern int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
24
- char *next_out, unsigned int avail_out);
43
+# to 1.
25
+typedef int BdrvDmgUncompressFunc(char *next_in, unsigned int avail_in,
44
#
26
+ char *next_out, unsigned int avail_out);
45
# Since: 5.2
27
46
##
28
-extern int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in,
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
29
- char *next_out, unsigned int avail_out);
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
30
+extern BdrvDmgUncompressFunc *dmg_uncompress_bz2;
49
+ 'data': { 'addr': 'SocketAddress',
31
+extern BdrvDmgUncompressFunc *dmg_uncompress_lzfse;
50
+     '*logical-block-size': 'size',
32
51
+ '*num-queues': 'uint16'} }
33
#endif
52
34
diff --git a/block/dmg.c b/block/dmg.c
53
##
54
# @NbdServerAddOptions:
55
@@ -XXX,XX +XXX,XX @@
56
{ 'union': 'BlockExportOptions',
57
'base': { 'type': 'BlockExportType',
58
'id': 'str',
59
-     '*fixed-iothread': 'bool',
60
-     '*iothread': 'str',
61
+ '*fixed-iothread': 'bool',
62
+ '*iothread': 'str',
63
'node-name': 'str',
64
'*writable': 'bool',
65
'*writethrough': 'bool' },
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
35
index XXXXXXX..XXXXXXX 100644
67
index XXXXXXX..XXXXXXX 100644
36
--- a/block/dmg.c
68
--- a/block/export/vhost-user-blk-server.c
37
+++ b/block/dmg.c
69
+++ b/block/export/vhost-user-blk-server.c
38
@@ -XXX,XX +XXX,XX @@
70
@@ -XXX,XX +XXX,XX @@
39
#include "qemu/memalign.h"
71
#include "util/block-helpers.h"
40
#include "dmg.h"
41
42
-int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
43
- char *next_out, unsigned int avail_out);
44
-
45
-int (*dmg_uncompress_lzfse)(char *next_in, unsigned int avail_in,
46
- char *next_out, unsigned int avail_out);
47
+BdrvDmgUncompressFunc *dmg_uncompress_bz2;
48
+BdrvDmgUncompressFunc *dmg_uncompress_lzfse;
49
72
50
enum {
73
enum {
51
/* Limit chunk sizes to prevent unreasonable amounts of memory being used
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
76
};
77
struct virtio_blk_inhdr {
78
unsigned char status;
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
80
1ull << VIRTIO_BLK_F_DISCARD |
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
83
+ 1ull << VIRTIO_BLK_F_MQ |
84
1ull << VIRTIO_F_VERSION_1 |
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
86
1ull << VIRTIO_RING_F_EVENT_IDX |
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
88
89
static void
90
vu_blk_initialize_config(BlockDriverState *bs,
91
- struct virtio_blk_config *config, uint32_t blk_size)
92
+ struct virtio_blk_config *config,
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
119
+
120
+ if (vu_opts->has_num_queues) {
121
+ num_queues = vu_opts->num_queues;
122
+ }
123
+ if (num_queues == 0) {
124
+ error_setg(errp, "num-queues must be greater than 0");
125
+ return -EINVAL;
126
+ }
127
+
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
129
- logical_block_size);
130
+ logical_block_size, num_queues);
131
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
133
vexp);
134
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
137
- errp)) {
138
+ num_queues, &vu_blk_iface, errp)) {
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
140
blk_aio_detach, vexp);
141
return -EADDRNOTAVAIL;
52
--
142
--
53
2.39.2
143
2.26.2
54
144
55
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Use get_sysfs_str_val() to get the string value of device
3
bdrv_co_block_status_above has several design problems with handling
4
zoned model. Then get_sysfs_zoned_model() can convert it to
4
short backing files:
5
BlockZoneModel type of QEMU.
6
5
7
Use get_sysfs_long_val() to get the long value of zoned device
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
8
information.
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
8
which produces these after-EOF zeros is inside requested backing
9
sequence.
9
10
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
because of EOF of short backing file.
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
Fix these things, making logic about short backing files clearer.
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
With fixed bdrv_block_status_above we also have to improve is_zero in
16
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
17
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
18
stop to merge zeros of different types (produced by fully unallocated
18
<philmd@linaro.org>.
19
in the whole backing chain regions vs produced by short backing files).
20
21
Note also, that this patch leaves for another day the general problem
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
24
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
29
[Fix s/comes/come/ as suggested by Eric Blake
19
--Stefan]
30
--Stefan]
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
32
---
22
include/block/block_int-common.h | 3 +
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
23
block/file-posix.c | 130 ++++++++++++++++++++++---------
34
block/qcow2.c | 16 ++++++++++--
24
2 files changed, 95 insertions(+), 38 deletions(-)
35
2 files changed, 68 insertions(+), 16 deletions(-)
25
36
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
37
diff --git a/block/io.c b/block/io.c
27
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
39
--- a/block/io.c
29
+++ b/include/block/block_int-common.h
40
+++ b/block/io.c
30
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
31
* an explicit monitor command to load the disk inside the guest).
42
int64_t *map,
32
*/
43
BlockDriverState **file)
33
bool has_variable_length;
44
{
45
+ int ret;
46
BlockDriverState *p;
47
- int ret = 0;
48
- bool first = true;
49
+ int64_t eof = 0;
50
51
assert(bs != base);
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
34
+
53
+
35
+ /* device zone model */
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
36
+ BlockZoneModel zoned;
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
37
} BlockLimits;
38
39
typedef struct BdrvOpBlocker BdrvOpBlocker;
40
diff --git a/block/file-posix.c b/block/file-posix.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/block/file-posix.c
43
+++ b/block/file-posix.c
44
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
45
#endif
46
}
47
48
-static int hdev_get_max_segments(int fd, struct stat *st)
49
+/*
50
+ * Get a sysfs attribute value as character string.
51
+ */
52
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
53
+ char **val) {
54
+#ifdef CONFIG_LINUX
55
+ g_autofree char *sysfspath = NULL;
56
+ int ret;
57
+ size_t len;
58
+
59
+ if (!S_ISBLK(st->st_mode)) {
60
+ return -ENOTSUP;
61
+ }
62
+
63
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
64
+ major(st->st_rdev), minor(st->st_rdev),
65
+ attribute);
66
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
67
+ if (ret == -1) {
68
+ return -ENOENT;
69
+ }
70
+
71
+ /* The file is ended with '\n' */
72
+ char *p;
73
+ p = *val;
74
+ if (*(p + len - 1) == '\n') {
75
+ *(p + len - 1) = '\0';
76
+ }
77
+ return ret;
78
+#else
79
+ return -ENOTSUP;
80
+#endif
81
+}
82
+
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
84
+{
85
+ g_autofree char *val = NULL;
86
+ int ret;
87
+
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
89
+ if (ret < 0) {
90
+ return ret;
56
+ return ret;
91
+ }
57
+ }
92
+
58
+
93
+ if (strcmp(val, "host-managed") == 0) {
59
+ if (ret & BDRV_BLOCK_EOF) {
94
+ *zoned = BLK_Z_HM;
60
+ eof = offset + *pnum;
95
+ } else if (strcmp(val, "host-aware") == 0) {
96
+ *zoned = BLK_Z_HA;
97
+ } else if (strcmp(val, "none") == 0) {
98
+ *zoned = BLK_Z_NONE;
99
+ } else {
100
+ return -ENOTSUP;
101
+ }
102
+ return 0;
103
+}
104
+
105
+/*
106
+ * Get a sysfs attribute value as a long integer.
107
+ */
108
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
109
{
110
#ifdef CONFIG_LINUX
111
- char buf[32];
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
119
+ if (ret < 0) {
120
+ return ret;
121
+ }
61
+ }
122
+
62
+
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
63
+ assert(*pnum <= bytes);
124
+ ret = qemu_strtol(str, &end, 10, &val);
64
+ bytes = *pnum;
125
+ if (ret == 0 && end && *end == '\0') {
65
+
126
+ ret = val;
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ p = bdrv_filter_or_cow_bs(p))
68
+ {
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
70
file);
71
if (ret < 0) {
72
- break;
73
+ return ret;
74
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
76
+ if (*pnum == 0) {
77
/*
78
- * Reading beyond the end of the file continues to read
79
- * zeroes, but we can only widen the result to the
80
- * unallocated length we learned from an earlier
81
- * iteration.
82
+ * The top layer deferred to this layer, and because this layer is
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
84
+ * were allocated at this layer.
85
+ *
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
88
+ * below.
89
*/
90
+ assert(ret & BDRV_BLOCK_EOF);
91
*pnum = bytes;
92
+ if (file) {
93
+ *file = p;
94
+ }
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
96
+ break;
97
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
114
+
115
+ /*
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
117
+ * let's continue the diving.
118
+ */
119
+ assert(*pnum <= bytes);
120
+ bytes = *pnum;
127
+ }
121
+ }
128
+ return ret;
129
+#else
130
+ return -ENOTSUP;
131
+#endif
132
+}
133
+
122
+
134
+static int hdev_get_max_segments(int fd, struct stat *st)
123
+ if (offset + *pnum == eof) {
135
+{
124
+ ret |= BDRV_BLOCK_EOF;
136
+#ifdef CONFIG_LINUX
137
int ret;
138
- int sysfd = -1;
139
- long max_segments;
140
141
if (S_ISCHR(st->st_mode)) {
142
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
143
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
144
}
145
return -ENOTSUP;
146
}
147
-
148
- if (!S_ISBLK(st->st_mode)) {
149
- return -ENOTSUP;
150
- }
151
-
152
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
153
- major(st->st_rdev), minor(st->st_rdev));
154
- sysfd = open(sysfspath, O_RDONLY);
155
- if (sysfd == -1) {
156
- ret = -errno;
157
- goto out;
158
- }
159
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
160
- if (ret < 0) {
161
- ret = -errno;
162
- goto out;
163
- } else if (ret == 0) {
164
- ret = -EIO;
165
- goto out;
166
- }
167
- buf[ret] = 0;
168
- /* The file is ended with '\n', pass 'end' to accept that. */
169
- ret = qemu_strtol(buf, &end, 10, &max_segments);
170
- if (ret == 0 && end && *end == '\n') {
171
- ret = max_segments;
172
- }
173
-
174
-out:
175
- if (sysfd != -1) {
176
- close(sysfd);
177
- }
178
- g_free(sysfspath);
179
- return ret;
180
+ return get_sysfs_long_val(st, "max_segments");
181
#else
182
return -ENOTSUP;
183
#endif
184
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
185
{
186
BDRVRawState *s = bs->opaque;
187
struct stat st;
188
+ int ret;
189
+ BlockZoneModel zoned;
190
191
s->needs_alignment = raw_needs_alignment(bs);
192
raw_probe_alignment(bs, s->fd, errp);
193
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
194
bs->bl.max_hw_iov = ret;
195
}
196
}
125
}
197
+
126
+
198
+ ret = get_sysfs_zoned_model(&st, &zoned);
127
return ret;
199
+ if (ret < 0) {
200
+ zoned = BLK_Z_NONE;
201
+ }
202
+ bs->bl.zoned = zoned;
203
}
128
}
204
129
205
static int check_for_dasd(int fd)
130
diff --git a/block/qcow2.c b/block/qcow2.c
131
index XXXXXXX..XXXXXXX 100644
132
--- a/block/qcow2.c
133
+++ b/block/qcow2.c
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
135
if (!bytes) {
136
return true;
137
}
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
140
+
141
+ /*
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
143
+ * example, zeros which come from the region which is unallocated in
144
+ * the whole backing chain, and zeros which come because of a short
145
+ * backing file. So, we need a loop.
146
+ */
147
+ do {
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
149
+ offset += nr;
150
+ bytes -= nr;
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
152
+
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
154
}
155
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
206
--
157
--
207
2.39.2
158
2.26.2
208
159
209
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
3
In order to reuse bdrv_common_block_status_above in
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
4
bdrv_is_allocated_above, let's support include_base parameter.
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
6
Message-id: 20230407081657.17947-5-faithilikerun@gmail.com
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
11
---
9
block/file-posix.c | 3 +++
12
block/coroutines.h | 2 ++
10
block/trace-events | 2 ++
13
block/io.c | 21 ++++++++++++++-------
11
2 files changed, 5 insertions(+)
14
2 files changed, 16 insertions(+), 7 deletions(-)
12
15
13
diff --git a/block/file-posix.c b/block/file-posix.c
16
diff --git a/block/coroutines.h b/block/coroutines.h
14
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
18
--- a/block/coroutines.h
16
+++ b/block/file-posix.c
19
+++ b/block/coroutines.h
17
@@ -XXX,XX +XXX,XX @@ out:
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
18
if (!BDRV_ZT_IS_CONV(*wp)) {
21
int coroutine_fn
19
if (type & QEMU_AIO_ZONE_APPEND) {
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
20
*s->offset = *wp;
23
BlockDriverState *base,
21
+ trace_zbd_zone_append_complete(bs, *s->offset
24
+ bool include_base,
22
+ >> BDRV_SECTOR_BITS);
25
bool want_zero,
23
}
26
int64_t offset,
24
/* Advance the wp if needed */
27
int64_t bytes,
25
if (offset + bytes > *wp) {
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
29
int generated_co_wrapper
27
len += iov_len;
30
bdrv_common_block_status_above(BlockDriverState *bs,
31
BlockDriverState *base,
32
+ bool include_base,
33
bool want_zero,
34
int64_t offset,
35
int64_t bytes,
36
diff --git a/block/io.c b/block/io.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/block/io.c
39
+++ b/block/io.c
40
@@ -XXX,XX +XXX,XX @@ early_out:
41
int coroutine_fn
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
43
BlockDriverState *base,
44
+ bool include_base,
45
bool want_zero,
46
int64_t offset,
47
int64_t bytes,
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
BlockDriverState *p;
50
int64_t eof = 0;
51
52
- assert(bs != base);
53
+ assert(include_base || bs != base);
54
+ assert(!include_base || base); /* Can't include NULL base */
55
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
return ret;
28
}
60
}
29
61
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
63
assert(*pnum <= bytes);
64
bytes = *pnum;
65
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
68
p = bdrv_filter_or_cow_bs(p))
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
73
}
74
75
+ if (p == base) {
76
+ assert(include_base);
77
+ break;
78
+ }
79
+
80
/*
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
82
* let's continue the diving.
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
84
int64_t offset, int64_t bytes, int64_t *pnum,
85
int64_t *map, BlockDriverState **file)
86
{
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
89
pnum, map, file);
32
}
90
}
33
#endif
91
34
diff --git a/block/trace-events b/block/trace-events
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
35
index XXXXXXX..XXXXXXX 100644
93
int ret;
36
--- a/block/trace-events
94
int64_t dummy;
37
+++ b/block/trace-events
95
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
39
file_flush_fdatasync_failed(int err) "errno %d"
97
- offset, bytes, pnum ? pnum : &dummy,
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
98
- NULL, NULL);
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
100
+ bytes, pnum ? pnum : &dummy, NULL,
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
101
+ NULL);
44
102
if (ret < 0) {
45
# ssh.c
103
return ret;
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
104
}
47
--
105
--
48
2.39.2
106
2.26.2
107
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
A zone append command is a write operation that specifies the first
3
We are going to reuse bdrv_common_block_status_above in
4
logical block of a zone as the write position. When writing to a zoned
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
5
block device using zone append, the byte offset of the call may point at
5
include_base == false and still bs == base (for ex. from img_rebase()).
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
6
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
So, support this corner case.
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
13
Message-id: 20230407081657.17947-3-faithilikerun@gmail.com
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
15
---
16
include/block/block-io.h | 4 +++
16
block/io.c | 6 +++++-
17
include/block/block_int-common.h | 3 ++
17
1 file changed, 5 insertions(+), 1 deletion(-)
18
include/block/raw-aio.h | 4 ++-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 60 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 ++++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 +++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 +++++
26
10 files changed, 172 insertions(+), 8 deletions(-)
27
18
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ if (!blk_is_available(blk)) {
172
+ blk_dec_in_flight(blk);
173
+ return -ENOMEDIUM;
174
+ }
175
+
176
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
177
+ blk_dec_in_flight(blk);
178
+ return ret;
179
+}
180
+
181
void blk_drain(BlockBackend *blk)
182
{
183
BlockDriverState *bs = blk_bs(blk);
184
diff --git a/block/file-posix.c b/block/file-posix.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/file-posix.c
187
+++ b/block/file-posix.c
188
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
189
bool has_write_zeroes:1;
190
bool use_linux_aio:1;
191
bool use_linux_io_uring:1;
192
+ int64_t *offset; /* offset of zone append operation */
193
int page_cache_inconsistent; /* errno from fdatasync failure */
194
bool has_fallocate;
195
bool needs_alignment;
196
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
197
ssize_t len;
198
199
len = RETRY_ON_EINTR(
200
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
201
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
202
qemu_pwritev(aiocb->aio_fildes,
203
aiocb->io.iov,
204
aiocb->io.niov,
205
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
206
ssize_t len;
207
208
while (offset < aiocb->aio_nbytes) {
209
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
210
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
211
len = pwrite(aiocb->aio_fildes,
212
(const char *)buf + offset,
213
aiocb->aio_nbytes - offset,
214
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
215
}
216
217
nbytes = handle_aiocb_rw_linear(aiocb, buf);
218
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
219
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
220
char *p = buf;
221
size_t count = aiocb->aio_nbytes, copy;
222
int i;
223
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
224
if (fd_open(bs) < 0)
225
return -EIO;
226
#if defined(CONFIG_BLKZONED)
227
- if (type & QEMU_AIO_WRITE && bs->wps) {
228
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
229
qemu_co_mutex_lock(&bs->wps->colock);
230
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
231
+ int index = offset / bs->bl.zone_size;
232
+ offset = bs->wps->wp[index];
233
+ }
234
}
235
#endif
236
237
@@ -XXX,XX +XXX,XX @@ out:
238
#if defined(CONFIG_BLKZONED)
239
BlockZoneWps *wps = bs->wps;
240
if (ret == 0) {
241
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
242
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
243
+ && wps && bs->bl.zone_size) {
244
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
245
if (!BDRV_ZT_IS_CONV(*wp)) {
246
+ if (type & QEMU_AIO_ZONE_APPEND) {
247
+ *s->offset = *wp;
248
+ }
249
/* Advance the wp if needed */
250
if (offset + bytes > *wp) {
251
*wp = offset + bytes;
252
@@ -XXX,XX +XXX,XX @@ out:
253
}
254
}
255
} else {
256
- if (type & QEMU_AIO_WRITE) {
257
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
258
update_zones_wp(bs, s->fd, 0, 1);
259
}
260
}
261
262
- if (type & QEMU_AIO_WRITE && wps) {
263
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
264
qemu_co_mutex_unlock(&wps->colock);
265
}
266
#endif
267
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
268
}
269
#endif
270
271
+#if defined(CONFIG_BLKZONED)
272
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
273
+ int64_t *offset,
274
+ QEMUIOVector *qiov,
275
+ BdrvRequestFlags flags) {
276
+ assert(flags == 0);
277
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
278
+ int64_t iov_len = 0;
279
+ int64_t len = 0;
280
+ BDRVRawState *s = bs->opaque;
281
+ s->offset = offset;
282
+
283
+ if (*offset & zone_size_mask) {
284
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
285
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
286
+ return -EINVAL;
287
+ }
288
+
289
+ int64_t wg = bs->bl.write_granularity;
290
+ int64_t wg_mask = wg - 1;
291
+ for (int i = 0; i < qiov->niov; i++) {
292
+ iov_len = qiov->iov[i].iov_len;
293
+ if (iov_len & wg_mask) {
294
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
295
+ "block size %" PRId64 "", i, iov_len, wg);
296
+ return -EINVAL;
297
+ }
298
+ len += iov_len;
299
+ }
300
+
301
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
302
+}
303
+#endif
304
+
305
static coroutine_fn int
306
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
307
bool blkdev)
308
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
309
/* zone management operations */
310
.bdrv_co_zone_report = raw_co_zone_report,
311
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
312
+ .bdrv_co_zone_append = raw_co_zone_append,
313
#endif
314
};
315
316
diff --git a/block/io.c b/block/io.c
19
diff --git a/block/io.c b/block/io.c
317
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
318
--- a/block/io.c
21
--- a/block/io.c
319
+++ b/block/io.c
22
+++ b/block/io.c
320
@@ -XXX,XX +XXX,XX @@ out:
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
321
return co.ret;
24
BlockDriverState *p;
322
}
25
int64_t eof = 0;
323
26
324
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
27
- assert(include_base || bs != base);
325
+ QEMUIOVector *qiov,
28
assert(!include_base || base); /* Can't include NULL base */
326
+ BdrvRequestFlags flags)
29
327
+{
30
+ if (!include_base && bs == base) {
328
+ int ret;
31
+ *pnum = bytes;
329
+ BlockDriver *drv = bs->drv;
32
+ return 0;
330
+ CoroutineIOCompletion co = {
331
+ .coroutine = qemu_coroutine_self(),
332
+ };
333
+ IO_CODE();
334
+
335
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
336
+ if (ret < 0) {
337
+ return ret;
338
+ }
33
+ }
339
+
34
+
340
+ bdrv_inc_in_flight(bs);
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
341
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
342
+ co.ret = -ENOTSUP;
37
return ret;
343
+ goto out;
344
+ }
345
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
346
+out:
347
+ bdrv_dec_in_flight(bs);
348
+ return co.ret;
349
+}
350
+
351
void *qemu_blockalign(BlockDriverState *bs, size_t size)
352
{
353
IO_CODE();
354
diff --git a/block/io_uring.c b/block/io_uring.c
355
index XXXXXXX..XXXXXXX 100644
356
--- a/block/io_uring.c
357
+++ b/block/io_uring.c
358
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
359
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
360
luringcb->qiov->niov, offset);
361
break;
362
+ case QEMU_AIO_ZONE_APPEND:
363
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
364
+ luringcb->qiov->niov, offset);
365
+ break;
366
case QEMU_AIO_READ:
367
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
368
luringcb->qiov->niov, offset);
369
diff --git a/block/linux-aio.c b/block/linux-aio.c
370
index XXXXXXX..XXXXXXX 100644
371
--- a/block/linux-aio.c
372
+++ b/block/linux-aio.c
373
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
374
case QEMU_AIO_WRITE:
375
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
376
break;
377
+ case QEMU_AIO_ZONE_APPEND:
378
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
379
+ break;
380
case QEMU_AIO_READ:
381
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
382
break;
383
diff --git a/block/raw-format.c b/block/raw-format.c
384
index XXXXXXX..XXXXXXX 100644
385
--- a/block/raw-format.c
386
+++ b/block/raw-format.c
387
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
388
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
389
}
390
391
+static int coroutine_fn GRAPH_RDLOCK
392
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
393
+ BdrvRequestFlags flags)
394
+{
395
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
396
+}
397
+
398
static int64_t coroutine_fn GRAPH_RDLOCK
399
raw_co_getlength(BlockDriverState *bs)
400
{
401
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
402
.bdrv_co_pdiscard = &raw_co_pdiscard,
403
.bdrv_co_zone_report = &raw_co_zone_report,
404
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
405
+ .bdrv_co_zone_append = &raw_co_zone_append,
406
.bdrv_co_block_status = &raw_co_block_status,
407
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
408
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
409
--
38
--
410
2.39.2
39
2.26.2
40
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Add zoned device option to host_device BlockDriver. It will be presented only
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
4
for zoned host block devices. By adding zone management operations to the
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
5
host_block_device BlockDriver, users can use the new block layer APIs
5
generated on the level of short backing file (if all overlays have
6
including Report Zone and four zone management operations
6
unallocated areas at that place).
7
(open, close, finish, reset, reset_all).
8
7
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
9
path.
11
zone_finish(zf).
12
10
13
For example, to test zone_report, use following command:
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
12
Reviewed-by: Eric Blake <eblake@redhat.com>
15
-c "zrp offset nr_zones"
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
16
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
23
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
24
<philmd@linaro.org> and remove spurious ret = -errno in
25
raw_co_zone_mgmt().
26
--Stefan]
16
--Stefan]
27
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
28
---
18
---
29
meson.build | 4 +
19
block/io.c | 43 +++++--------------------------------------
30
include/block/block-io.h | 9 +
20
1 file changed, 5 insertions(+), 38 deletions(-)
31
include/block/block_int-common.h | 21 ++
32
include/block/raw-aio.h | 6 +-
33
include/sysemu/block-backend-io.h | 18 ++
34
block/block-backend.c | 133 +++++++++++++
35
block/file-posix.c | 306 +++++++++++++++++++++++++++++-
36
block/io.c | 41 ++++
37
qemu-io-cmds.c | 149 +++++++++++++++
38
9 files changed, 684 insertions(+), 3 deletions(-)
39
21
40
diff --git a/meson.build b/meson.build
41
index XXXXXXX..XXXXXXX 100644
42
--- a/meson.build
43
+++ b/meson.build
44
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed())
45
# has_header
46
config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
47
config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
48
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
49
config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
50
config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
51
config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
52
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
53
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
54
cc.has_member('struct stat', 'st_atim',
55
prefix: '#include <sys/stat.h>'))
56
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
57
+ cc.has_member('struct blk_zone', 'capacity',
58
+ prefix: '#include <linux/blkzoned.h>'))
59
60
# has_type
61
config_host_data.set('CONFIG_IOVEC',
62
diff --git a/include/block/block-io.h b/include/block/block-io.h
63
index XXXXXXX..XXXXXXX 100644
64
--- a/include/block/block-io.h
65
+++ b/include/block/block-io.h
66
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
67
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
68
int64_t bytes);
69
70
+/* Report zone information of zone block device. */
71
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
72
+ int64_t offset,
73
+ unsigned int *nr_zones,
74
+ BlockZoneDescriptor *zones);
75
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
76
+ BlockZoneOp op,
77
+ int64_t offset, int64_t len);
78
+
79
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
80
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
81
int64_t bytes, int64_t *pnum, int64_t *map,
82
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
83
index XXXXXXX..XXXXXXX 100644
84
--- a/include/block/block_int-common.h
85
+++ b/include/block/block_int-common.h
86
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
87
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
88
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
89
90
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
91
+ int64_t offset, unsigned int *nr_zones,
92
+ BlockZoneDescriptor *zones);
93
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
94
+ int64_t offset, int64_t len);
95
+
96
/* removable device specific */
97
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
98
BlockDriverState *bs);
99
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
100
101
/* device zone model */
102
BlockZoneModel zoned;
103
+
104
+ /* zone size expressed in bytes */
105
+ uint32_t zone_size;
106
+
107
+ /* total number of zones */
108
+ uint32_t nr_zones;
109
+
110
+ /* maximum sectors of a zone append write operation */
111
+ int64_t max_append_sectors;
112
+
113
+ /* maximum number of open zones */
114
+ int64_t max_open_zones;
115
+
116
+ /* maximum number of active zones */
117
+ int64_t max_active_zones;
118
} BlockLimits;
119
120
typedef struct BdrvOpBlocker BdrvOpBlocker;
121
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
122
index XXXXXXX..XXXXXXX 100644
123
--- a/include/block/raw-aio.h
124
+++ b/include/block/raw-aio.h
125
@@ -XXX,XX +XXX,XX @@
126
#define QEMU_AIO_WRITE_ZEROES 0x0020
127
#define QEMU_AIO_COPY_RANGE 0x0040
128
#define QEMU_AIO_TRUNCATE 0x0080
129
+#define QEMU_AIO_ZONE_REPORT 0x0100
130
+#define QEMU_AIO_ZONE_MGMT 0x0200
131
#define QEMU_AIO_TYPE_MASK \
132
(QEMU_AIO_READ | \
133
QEMU_AIO_WRITE | \
134
@@ -XXX,XX +XXX,XX @@
135
QEMU_AIO_DISCARD | \
136
QEMU_AIO_WRITE_ZEROES | \
137
QEMU_AIO_COPY_RANGE | \
138
- QEMU_AIO_TRUNCATE)
139
+ QEMU_AIO_TRUNCATE | \
140
+ QEMU_AIO_ZONE_REPORT | \
141
+ QEMU_AIO_ZONE_MGMT)
142
143
/* AIO flags */
144
#define QEMU_AIO_MISALIGNED 0x1000
145
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
146
index XXXXXXX..XXXXXXX 100644
147
--- a/include/sysemu/block-backend-io.h
148
+++ b/include/sysemu/block-backend-io.h
149
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
150
BlockCompletionFunc *cb, void *opaque);
151
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
152
BlockCompletionFunc *cb, void *opaque);
153
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
154
+ unsigned int *nr_zones,
155
+ BlockZoneDescriptor *zones,
156
+ BlockCompletionFunc *cb, void *opaque);
157
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
158
+ int64_t offset, int64_t len,
159
+ BlockCompletionFunc *cb, void *opaque);
160
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
161
BlockCompletionFunc *cb, void *opaque);
162
void blk_aio_cancel_async(BlockAIOCB *acb);
163
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
164
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
165
int64_t bytes, BdrvRequestFlags flags);
166
167
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
168
+ unsigned int *nr_zones,
169
+ BlockZoneDescriptor *zones);
170
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
171
+ unsigned int *nr_zones,
172
+ BlockZoneDescriptor *zones);
173
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
174
+ int64_t offset, int64_t len);
175
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
176
+ int64_t offset, int64_t len);
177
+
178
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
179
int64_t bytes);
180
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
181
diff --git a/block/block-backend.c b/block/block-backend.c
182
index XXXXXXX..XXXXXXX 100644
183
--- a/block/block-backend.c
184
+++ b/block/block-backend.c
185
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
186
return ret;
187
}
188
189
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
190
+{
191
+ BlkAioEmAIOCB *acb = opaque;
192
+ BlkRwCo *rwco = &acb->rwco;
193
+
194
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
195
+ (unsigned int*)acb->bytes,rwco->iobuf);
196
+ blk_aio_complete(acb);
197
+}
198
+
199
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
200
+ unsigned int *nr_zones,
201
+ BlockZoneDescriptor *zones,
202
+ BlockCompletionFunc *cb, void *opaque)
203
+{
204
+ BlkAioEmAIOCB *acb;
205
+ Coroutine *co;
206
+ IO_CODE();
207
+
208
+ blk_inc_in_flight(blk);
209
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
210
+ acb->rwco = (BlkRwCo) {
211
+ .blk = blk,
212
+ .offset = offset,
213
+ .iobuf = zones,
214
+ .ret = NOT_DONE,
215
+ };
216
+ acb->bytes = (int64_t)nr_zones,
217
+ acb->has_returned = false;
218
+
219
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
220
+ aio_co_enter(blk_get_aio_context(blk), co);
221
+
222
+ acb->has_returned = true;
223
+ if (acb->rwco.ret != NOT_DONE) {
224
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
225
+ blk_aio_complete_bh, acb);
226
+ }
227
+
228
+ return &acb->common;
229
+}
230
+
231
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
232
+{
233
+ BlkAioEmAIOCB *acb = opaque;
234
+ BlkRwCo *rwco = &acb->rwco;
235
+
236
+ rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf,
237
+ rwco->offset, acb->bytes);
238
+ blk_aio_complete(acb);
239
+}
240
+
241
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
242
+ int64_t offset, int64_t len,
243
+ BlockCompletionFunc *cb, void *opaque) {
244
+ BlkAioEmAIOCB *acb;
245
+ Coroutine *co;
246
+ IO_CODE();
247
+
248
+ blk_inc_in_flight(blk);
249
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
250
+ acb->rwco = (BlkRwCo) {
251
+ .blk = blk,
252
+ .offset = offset,
253
+ .iobuf = (void *)op,
254
+ .ret = NOT_DONE,
255
+ };
256
+ acb->bytes = len;
257
+ acb->has_returned = false;
258
+
259
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
260
+ aio_co_enter(blk_get_aio_context(blk), co);
261
+
262
+ acb->has_returned = true;
263
+ if (acb->rwco.ret != NOT_DONE) {
264
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
265
+ blk_aio_complete_bh, acb);
266
+ }
267
+
268
+ return &acb->common;
269
+}
270
+
271
+/*
272
+ * Send a zone_report command.
273
+ * offset is a byte offset from the start of the device. No alignment
274
+ * required for offset.
275
+ * nr_zones represents IN maximum and OUT actual.
276
+ */
277
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
278
+ unsigned int *nr_zones,
279
+ BlockZoneDescriptor *zones)
280
+{
281
+ int ret;
282
+ IO_CODE();
283
+
284
+ blk_inc_in_flight(blk); /* increase before waiting */
285
+ blk_wait_while_drained(blk);
286
+ if (!blk_is_available(blk)) {
287
+ blk_dec_in_flight(blk);
288
+ return -ENOMEDIUM;
289
+ }
290
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
291
+ blk_dec_in_flight(blk);
292
+ return ret;
293
+}
294
+
295
+/*
296
+ * Send a zone_management command.
297
+ * op is the zone operation;
298
+ * offset is the byte offset from the start of the zoned device;
299
+ * len is the maximum number of bytes the command should operate on. It
300
+ * should be aligned with the device zone size.
301
+ */
302
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
303
+ int64_t offset, int64_t len)
304
+{
305
+ int ret;
306
+ IO_CODE();
307
+
308
+ blk_inc_in_flight(blk);
309
+ blk_wait_while_drained(blk);
310
+
311
+ ret = blk_check_byte_request(blk, offset, len);
312
+ if (ret < 0) {
313
+ blk_dec_in_flight(blk);
314
+ return ret;
315
+ }
316
+
317
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
318
+ blk_dec_in_flight(blk);
319
+ return ret;
320
+}
321
+
322
void blk_drain(BlockBackend *blk)
323
{
324
BlockDriverState *bs = blk_bs(blk);
325
diff --git a/block/file-posix.c b/block/file-posix.c
326
index XXXXXXX..XXXXXXX 100644
327
--- a/block/file-posix.c
328
+++ b/block/file-posix.c
329
@@ -XXX,XX +XXX,XX @@
330
#include <sys/param.h>
331
#include <sys/syscall.h>
332
#include <sys/vfs.h>
333
+#if defined(CONFIG_BLKZONED)
334
+#include <linux/blkzoned.h>
335
+#endif
336
#include <linux/cdrom.h>
337
#include <linux/fd.h>
338
#include <linux/fs.h>
339
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
340
PreallocMode prealloc;
341
Error **errp;
342
} truncate;
343
+ struct {
344
+ unsigned int *nr_zones;
345
+ BlockZoneDescriptor *zones;
346
+ } zone_report;
347
+ struct {
348
+ unsigned long op;
349
+ } zone_mgmt;
350
};
351
} RawPosixAIOData;
352
353
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
354
zoned = BLK_Z_NONE;
355
}
356
bs->bl.zoned = zoned;
357
+ if (zoned != BLK_Z_NONE) {
358
+ /*
359
+ * The zoned device must at least have zone size and nr_zones fields.
360
+ */
361
+ ret = get_sysfs_long_val(&st, "chunk_sectors");
362
+ if (ret < 0) {
363
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
364
+ "sysfs attribute");
365
+ goto out;
366
+ } else if (!ret) {
367
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
368
+ goto out;
369
+ }
370
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
371
+
372
+ ret = get_sysfs_long_val(&st, "nr_zones");
373
+ if (ret < 0) {
374
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
375
+ "sysfs attribute");
376
+ goto out;
377
+ } else if (!ret) {
378
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
379
+ goto out;
380
+ }
381
+ bs->bl.nr_zones = ret;
382
+
383
+ ret = get_sysfs_long_val(&st, "zone_append_max_bytes");
384
+ if (ret > 0) {
385
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
386
+ }
387
+
388
+ ret = get_sysfs_long_val(&st, "max_open_zones");
389
+ if (ret >= 0) {
390
+ bs->bl.max_open_zones = ret;
391
+ }
392
+
393
+ ret = get_sysfs_long_val(&st, "max_active_zones");
394
+ if (ret >= 0) {
395
+ bs->bl.max_active_zones = ret;
396
+ }
397
+ return;
398
+ }
399
+out:
400
+ bs->bl.zoned = BLK_Z_NONE;
401
}
402
403
static int check_for_dasd(int fd)
404
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
405
BDRVRawState *s = bs->opaque;
406
int ret;
407
408
- /* If DASD, get blocksizes */
409
+ /* If DASD or zoned devices, get blocksizes */
410
if (check_for_dasd(s->fd) < 0) {
411
- return -ENOTSUP;
412
+ /* zoned devices are not DASD */
413
+ if (bs->bl.zoned == BLK_Z_NONE) {
414
+ return -ENOTSUP;
415
+ }
416
}
417
ret = probe_logical_blocksize(s->fd, &bsz->log);
418
if (ret < 0) {
419
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
420
}
421
#endif
422
423
+/*
424
+ * parse_zone - Fill a zone descriptor
425
+ */
426
+#if defined(CONFIG_BLKZONED)
427
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
428
+ const struct blk_zone *blkz) {
429
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
430
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
431
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
432
+
433
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
434
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
435
+#else
436
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
437
+#endif
438
+
439
+ switch (blkz->type) {
440
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
441
+ zone->type = BLK_ZT_SWR;
442
+ break;
443
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
444
+ zone->type = BLK_ZT_SWP;
445
+ break;
446
+ case BLK_ZONE_TYPE_CONVENTIONAL:
447
+ zone->type = BLK_ZT_CONV;
448
+ break;
449
+ default:
450
+ error_report("Unsupported zone type: 0x%x", blkz->type);
451
+ return -ENOTSUP;
452
+ }
453
+
454
+ switch (blkz->cond) {
455
+ case BLK_ZONE_COND_NOT_WP:
456
+ zone->state = BLK_ZS_NOT_WP;
457
+ break;
458
+ case BLK_ZONE_COND_EMPTY:
459
+ zone->state = BLK_ZS_EMPTY;
460
+ break;
461
+ case BLK_ZONE_COND_IMP_OPEN:
462
+ zone->state = BLK_ZS_IOPEN;
463
+ break;
464
+ case BLK_ZONE_COND_EXP_OPEN:
465
+ zone->state = BLK_ZS_EOPEN;
466
+ break;
467
+ case BLK_ZONE_COND_CLOSED:
468
+ zone->state = BLK_ZS_CLOSED;
469
+ break;
470
+ case BLK_ZONE_COND_READONLY:
471
+ zone->state = BLK_ZS_RDONLY;
472
+ break;
473
+ case BLK_ZONE_COND_FULL:
474
+ zone->state = BLK_ZS_FULL;
475
+ break;
476
+ case BLK_ZONE_COND_OFFLINE:
477
+ zone->state = BLK_ZS_OFFLINE;
478
+ break;
479
+ default:
480
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
481
+ return -ENOTSUP;
482
+ }
483
+ return 0;
484
+}
485
+#endif
486
+
487
+#if defined(CONFIG_BLKZONED)
488
+static int handle_aiocb_zone_report(void *opaque)
489
+{
490
+ RawPosixAIOData *aiocb = opaque;
491
+ int fd = aiocb->aio_fildes;
492
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
493
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
494
+ /* zoned block devices use 512-byte sectors */
495
+ uint64_t sector = aiocb->aio_offset / 512;
496
+
497
+ struct blk_zone *blkz;
498
+ size_t rep_size;
499
+ unsigned int nrz;
500
+ int ret, n = 0, i = 0;
501
+
502
+ nrz = *nr_zones;
503
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
504
+ g_autofree struct blk_zone_report *rep = NULL;
505
+ rep = g_malloc(rep_size);
506
+
507
+ blkz = (struct blk_zone *)(rep + 1);
508
+ while (n < nrz) {
509
+ memset(rep, 0, rep_size);
510
+ rep->sector = sector;
511
+ rep->nr_zones = nrz - n;
512
+
513
+ do {
514
+ ret = ioctl(fd, BLKREPORTZONE, rep);
515
+ } while (ret != 0 && errno == EINTR);
516
+ if (ret != 0) {
517
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
518
+ fd, sector, errno);
519
+ return -errno;
520
+ }
521
+
522
+ if (!rep->nr_zones) {
523
+ break;
524
+ }
525
+
526
+ for (i = 0; i < rep->nr_zones; i++, n++) {
527
+ ret = parse_zone(&zones[n], &blkz[i]);
528
+ if (ret != 0) {
529
+ return ret;
530
+ }
531
+
532
+ /* The next report should start after the last zone reported */
533
+ sector = blkz[i].start + blkz[i].len;
534
+ }
535
+ }
536
+
537
+ *nr_zones = n;
538
+ return 0;
539
+}
540
+#endif
541
+
542
+#if defined(CONFIG_BLKZONED)
543
+static int handle_aiocb_zone_mgmt(void *opaque)
544
+{
545
+ RawPosixAIOData *aiocb = opaque;
546
+ int fd = aiocb->aio_fildes;
547
+ uint64_t sector = aiocb->aio_offset / 512;
548
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
549
+ struct blk_zone_range range;
550
+ int ret;
551
+
552
+ /* Execute the operation */
553
+ range.sector = sector;
554
+ range.nr_sectors = nr_sectors;
555
+ do {
556
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
557
+ } while (ret != 0 && errno == EINTR);
558
+
559
+ return ret;
560
+}
561
+#endif
562
+
563
static int handle_aiocb_copy_range(void *opaque)
564
{
565
RawPosixAIOData *aiocb = opaque;
566
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
567
}
568
}
569
570
+/*
571
+ * zone report - Get a zone block device's information in the form
572
+ * of an array of zone descriptors.
573
+ * zones is an array of zone descriptors to hold zone information on reply;
574
+ * offset can be any byte within the entire size of the device;
575
+ * nr_zones is the maxium number of sectors the command should operate on.
576
+ */
577
+#if defined(CONFIG_BLKZONED)
578
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
579
+ unsigned int *nr_zones,
580
+ BlockZoneDescriptor *zones) {
581
+ BDRVRawState *s = bs->opaque;
582
+ RawPosixAIOData acb = (RawPosixAIOData) {
583
+ .bs = bs,
584
+ .aio_fildes = s->fd,
585
+ .aio_type = QEMU_AIO_ZONE_REPORT,
586
+ .aio_offset = offset,
587
+ .zone_report = {
588
+ .nr_zones = nr_zones,
589
+ .zones = zones,
590
+ },
591
+ };
592
+
593
+ return raw_thread_pool_submit(bs, handle_aiocb_zone_report, &acb);
594
+}
595
+#endif
596
+
597
+/*
598
+ * zone management operations - Execute an operation on a zone
599
+ */
600
+#if defined(CONFIG_BLKZONED)
601
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
602
+ int64_t offset, int64_t len) {
603
+ BDRVRawState *s = bs->opaque;
604
+ RawPosixAIOData acb;
605
+ int64_t zone_size, zone_size_mask;
606
+ const char *op_name;
607
+ unsigned long zo;
608
+ int ret;
609
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
610
+
611
+ zone_size = bs->bl.zone_size;
612
+ zone_size_mask = zone_size - 1;
613
+ if (offset & zone_size_mask) {
614
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
615
+ "%" PRId64 "", offset / 512, zone_size / 512);
616
+ return -EINVAL;
617
+ }
618
+
619
+ if (((offset + len) < capacity && len & zone_size_mask) ||
620
+ offset + len > capacity) {
621
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
622
+ " %" PRId64 "", len / 512, zone_size / 512);
623
+ return -EINVAL;
624
+ }
625
+
626
+ switch (op) {
627
+ case BLK_ZO_OPEN:
628
+ op_name = "BLKOPENZONE";
629
+ zo = BLKOPENZONE;
630
+ break;
631
+ case BLK_ZO_CLOSE:
632
+ op_name = "BLKCLOSEZONE";
633
+ zo = BLKCLOSEZONE;
634
+ break;
635
+ case BLK_ZO_FINISH:
636
+ op_name = "BLKFINISHZONE";
637
+ zo = BLKFINISHZONE;
638
+ break;
639
+ case BLK_ZO_RESET:
640
+ op_name = "BLKRESETZONE";
641
+ zo = BLKRESETZONE;
642
+ break;
643
+ default:
644
+ error_report("Unsupported zone op: 0x%x", op);
645
+ return -ENOTSUP;
646
+ }
647
+
648
+ acb = (RawPosixAIOData) {
649
+ .bs = bs,
650
+ .aio_fildes = s->fd,
651
+ .aio_type = QEMU_AIO_ZONE_MGMT,
652
+ .aio_offset = offset,
653
+ .aio_nbytes = len,
654
+ .zone_mgmt = {
655
+ .op = zo,
656
+ },
657
+ };
658
+
659
+ ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
660
+ if (ret != 0) {
661
+ error_report("ioctl %s failed %d", op_name, ret);
662
+ }
663
+
664
+ return ret;
665
+}
666
+#endif
667
+
668
static coroutine_fn int
669
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
670
bool blkdev)
671
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
672
#ifdef __linux__
673
.bdrv_co_ioctl = hdev_co_ioctl,
674
#endif
675
+
676
+ /* zoned device */
677
+#if defined(CONFIG_BLKZONED)
678
+ /* zone management operations */
679
+ .bdrv_co_zone_report = raw_co_zone_report,
680
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
681
+#endif
682
};
683
684
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
685
diff --git a/block/io.c b/block/io.c
22
diff --git a/block/io.c b/block/io.c
686
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
687
--- a/block/io.c
24
--- a/block/io.c
688
+++ b/block/io.c
25
+++ b/block/io.c
689
@@ -XXX,XX +XXX,XX @@ out:
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
690
return co.ret;
27
* at 'offset + *pnum' may return the same allocation status (in other
28
* words, the result is not necessarily the maximum possible range);
29
* but 'pnum' will only be 0 when end of file is reached.
30
- *
31
*/
32
int bdrv_is_allocated_above(BlockDriverState *top,
33
BlockDriverState *base,
34
bool include_base, int64_t offset,
35
int64_t bytes, int64_t *pnum)
36
{
37
- BlockDriverState *intermediate;
38
- int ret;
39
- int64_t n = bytes;
40
-
41
- assert(base || !include_base);
42
-
43
- intermediate = top;
44
- while (include_base || intermediate != base) {
45
- int64_t pnum_inter;
46
- int64_t size_inter;
47
-
48
- assert(intermediate);
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
50
- if (ret < 0) {
51
- return ret;
52
- }
53
- if (ret) {
54
- *pnum = pnum_inter;
55
- return 1;
56
- }
57
-
58
- size_inter = bdrv_getlength(intermediate);
59
- if (size_inter < 0) {
60
- return size_inter;
61
- }
62
- if (n > pnum_inter &&
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
64
- n = pnum_inter;
65
- }
66
-
67
- if (intermediate == base) {
68
- break;
69
- }
70
-
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
73
+ offset, bytes, pnum, NULL, NULL);
74
+ if (ret < 0) {
75
+ return ret;
76
}
77
78
- *pnum = n;
79
- return 0;
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
691
}
81
}
692
82
693
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
83
int coroutine_fn
694
+ unsigned int *nr_zones,
695
+ BlockZoneDescriptor *zones)
696
+{
697
+ BlockDriver *drv = bs->drv;
698
+ CoroutineIOCompletion co = {
699
+ .coroutine = qemu_coroutine_self(),
700
+ };
701
+ IO_CODE();
702
+
703
+ bdrv_inc_in_flight(bs);
704
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
705
+ co.ret = -ENOTSUP;
706
+ goto out;
707
+ }
708
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
709
+out:
710
+ bdrv_dec_in_flight(bs);
711
+ return co.ret;
712
+}
713
+
714
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
715
+ int64_t offset, int64_t len)
716
+{
717
+ BlockDriver *drv = bs->drv;
718
+ CoroutineIOCompletion co = {
719
+ .coroutine = qemu_coroutine_self(),
720
+ };
721
+ IO_CODE();
722
+
723
+ bdrv_inc_in_flight(bs);
724
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
725
+ co.ret = -ENOTSUP;
726
+ goto out;
727
+ }
728
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
729
+out:
730
+ bdrv_dec_in_flight(bs);
731
+ return co.ret;
732
+}
733
+
734
void *qemu_blockalign(BlockDriverState *bs, size_t size)
735
{
736
IO_CODE();
737
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
738
index XXXXXXX..XXXXXXX 100644
739
--- a/qemu-io-cmds.c
740
+++ b/qemu-io-cmds.c
741
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
742
.oneline = "flush all in-core file state to disk",
743
};
744
745
+static inline int64_t tosector(int64_t bytes)
746
+{
747
+ return bytes >> BDRV_SECTOR_BITS;
748
+}
749
+
750
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
751
+{
752
+ int ret;
753
+ int64_t offset;
754
+ unsigned int nr_zones;
755
+
756
+ ++optind;
757
+ offset = cvtnum(argv[optind]);
758
+ ++optind;
759
+ nr_zones = cvtnum(argv[optind]);
760
+
761
+ g_autofree BlockZoneDescriptor *zones = NULL;
762
+ zones = g_new(BlockZoneDescriptor, nr_zones);
763
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
764
+ if (ret < 0) {
765
+ printf("zone report failed: %s\n", strerror(-ret));
766
+ } else {
767
+ for (int i = 0; i < nr_zones; ++i) {
768
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
769
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
770
+ "zcond:%u, [type: %u]\n",
771
+ tosector(zones[i].start), tosector(zones[i].length),
772
+ tosector(zones[i].cap), tosector(zones[i].wp),
773
+ zones[i].state, zones[i].type);
774
+ }
775
+ }
776
+ return ret;
777
+}
778
+
779
+static const cmdinfo_t zone_report_cmd = {
780
+ .name = "zone_report",
781
+ .altname = "zrp",
782
+ .cfunc = zone_report_f,
783
+ .argmin = 2,
784
+ .argmax = 2,
785
+ .args = "offset number",
786
+ .oneline = "report zone information",
787
+};
788
+
789
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
790
+{
791
+ int ret;
792
+ int64_t offset, len;
793
+ ++optind;
794
+ offset = cvtnum(argv[optind]);
795
+ ++optind;
796
+ len = cvtnum(argv[optind]);
797
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
798
+ if (ret < 0) {
799
+ printf("zone open failed: %s\n", strerror(-ret));
800
+ }
801
+ return ret;
802
+}
803
+
804
+static const cmdinfo_t zone_open_cmd = {
805
+ .name = "zone_open",
806
+ .altname = "zo",
807
+ .cfunc = zone_open_f,
808
+ .argmin = 2,
809
+ .argmax = 2,
810
+ .args = "offset len",
811
+ .oneline = "explicit open a range of zones in zone block device",
812
+};
813
+
814
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
815
+{
816
+ int ret;
817
+ int64_t offset, len;
818
+ ++optind;
819
+ offset = cvtnum(argv[optind]);
820
+ ++optind;
821
+ len = cvtnum(argv[optind]);
822
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
823
+ if (ret < 0) {
824
+ printf("zone close failed: %s\n", strerror(-ret));
825
+ }
826
+ return ret;
827
+}
828
+
829
+static const cmdinfo_t zone_close_cmd = {
830
+ .name = "zone_close",
831
+ .altname = "zc",
832
+ .cfunc = zone_close_f,
833
+ .argmin = 2,
834
+ .argmax = 2,
835
+ .args = "offset len",
836
+ .oneline = "close a range of zones in zone block device",
837
+};
838
+
839
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
840
+{
841
+ int ret;
842
+ int64_t offset, len;
843
+ ++optind;
844
+ offset = cvtnum(argv[optind]);
845
+ ++optind;
846
+ len = cvtnum(argv[optind]);
847
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
848
+ if (ret < 0) {
849
+ printf("zone finish failed: %s\n", strerror(-ret));
850
+ }
851
+ return ret;
852
+}
853
+
854
+static const cmdinfo_t zone_finish_cmd = {
855
+ .name = "zone_finish",
856
+ .altname = "zf",
857
+ .cfunc = zone_finish_f,
858
+ .argmin = 2,
859
+ .argmax = 2,
860
+ .args = "offset len",
861
+ .oneline = "finish a range of zones in zone block device",
862
+};
863
+
864
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
865
+{
866
+ int ret;
867
+ int64_t offset, len;
868
+ ++optind;
869
+ offset = cvtnum(argv[optind]);
870
+ ++optind;
871
+ len = cvtnum(argv[optind]);
872
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
873
+ if (ret < 0) {
874
+ printf("zone reset failed: %s\n", strerror(-ret));
875
+ }
876
+ return ret;
877
+}
878
+
879
+static const cmdinfo_t zone_reset_cmd = {
880
+ .name = "zone_reset",
881
+ .altname = "zrs",
882
+ .cfunc = zone_reset_f,
883
+ .argmin = 2,
884
+ .argmax = 2,
885
+ .args = "offset len",
886
+ .oneline = "reset a zone write pointer in zone block device",
887
+};
888
+
889
static int truncate_f(BlockBackend *blk, int argc, char **argv);
890
static const cmdinfo_t truncate_cmd = {
891
.name = "truncate",
892
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
893
qemuio_add_command(&aio_write_cmd);
894
qemuio_add_command(&aio_flush_cmd);
895
qemuio_add_command(&flush_cmd);
896
+ qemuio_add_command(&zone_report_cmd);
897
+ qemuio_add_command(&zone_open_cmd);
898
+ qemuio_add_command(&zone_close_cmd);
899
+ qemuio_add_command(&zone_finish_cmd);
900
+ qemuio_add_command(&zone_reset_cmd);
901
qemuio_add_command(&truncate_cmd);
902
qemuio_add_command(&length_cmd);
903
qemuio_add_command(&info_cmd);
904
--
84
--
905
2.39.2
85
2.26.2
906
86
907
diff view generated by jsdifflib
1
From: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
The event filename is an absolute path. Convert it to a relative path when
3
These cases are fixed by previous patches around block_status and
4
writing '#line' directives, to preserve reproducibility of the generated
4
is_allocated.
5
output when different base paths are used.
6
5
7
Signed-off-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-Id: <20230406080045.21696-1-thomas.de_schampheleire@nokia.com>
10
---
11
---
11
scripts/tracetool/backend/ftrace.py | 4 +++-
12
tests/qemu-iotests/274 | 20 +++++++++++
12
scripts/tracetool/backend/log.py | 4 +++-
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
13
scripts/tracetool/backend/syslog.py | 4 +++-
14
2 files changed, 88 insertions(+)
14
3 files changed, 9 insertions(+), 3 deletions(-)
15
15
16
diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
17
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
18
--- a/scripts/tracetool/backend/ftrace.py
49
--- a/tests/qemu-iotests/274.out
19
+++ b/scripts/tracetool/backend/ftrace.py
50
+++ b/tests/qemu-iotests/274.out
20
@@ -XXX,XX +XXX,XX @@
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
21
__email__ = "stefanha@redhat.com"
52
read 1048576/1048576 bytes at offset 1048576
22
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
23
54
24
+import os.path
55
+=== Testing qemu-img commit (top -> base) ===
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
25
+
57
+
26
from tracetool import out
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
27
28
29
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
30
args=event.args,
31
event_id="TRACE_" + event.name.upper(),
32
event_lineno=event.lineno,
33
- event_filename=event.filename,
34
+ event_filename=os.path.relpath(event.filename),
35
fmt=event.fmt.rstrip("\n"),
36
argnames=argnames)
37
38
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
39
index XXXXXXX..XXXXXXX 100644
40
--- a/scripts/tracetool/backend/log.py
41
+++ b/scripts/tracetool/backend/log.py
42
@@ -XXX,XX +XXX,XX @@
43
__email__ = "stefanha@redhat.com"
44
45
46
+import os.path
47
+
59
+
48
from tracetool import out
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
49
50
51
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
52
' }',
53
cond=cond,
54
event_lineno=event.lineno,
55
- event_filename=event.filename,
56
+ event_filename=os.path.relpath(event.filename),
57
name=event.name,
58
fmt=event.fmt.rstrip("\n"),
59
argnames=argnames)
60
diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
61
index XXXXXXX..XXXXXXX 100644
62
--- a/scripts/tracetool/backend/syslog.py
63
+++ b/scripts/tracetool/backend/syslog.py
64
@@ -XXX,XX +XXX,XX @@
65
__email__ = "stefanha@redhat.com"
66
67
68
+import os.path
69
+
61
+
70
from tracetool import out
62
+wrote 2097152/2097152 bytes at offset 0
71
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
72
64
+
73
@@ -XXX,XX +XXX,XX @@ def generate_h(event, group):
65
+Image committed.
74
' }',
66
+
75
cond=cond,
67
+image: TEST_IMG
76
event_lineno=event.lineno,
68
+file format: IMGFMT
77
- event_filename=event.filename,
69
+virtual size: 2 MiB (2097152 bytes)
78
+ event_filename=os.path.relpath(event.filename),
70
+cluster_size: 65536
79
name=event.name,
71
+Format specific information:
80
fmt=event.fmt.rstrip("\n"),
72
+ compat: 1.1
81
argnames=argnames)
73
+ compression type: zlib
74
+ lazy refcounts: false
75
+ refcount bits: 16
76
+ corrupt: false
77
+ extended l2: false
78
+
79
+read 1048576/1048576 bytes at offset 0
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
81
+
82
+read 1048576/1048576 bytes at offset 1048576
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
84
+
85
+=== Testing QMP active commit (top -> base) ===
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
87
+
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
82
--
126
--
83
2.39.2
127
2.26.2
128
diff view generated by jsdifflib