1
The following changes since commit ca4e667dbf431d4a2a5a619cde79d30dd2ac3eb2:
1
The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
2
2
3
Merge remote-tracking branch 'remotes/kraxel/tags/usb-20170717-pull-request' into staging (2017-07-17 17:54:17 +0100)
3
Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
4
4
5
are available in the git repository at:
5
are available in the Git repository at:
6
6
7
git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
7
https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
8
8
9
for you to fetch changes up to 8508eee740c78d1465e25dad7c3e06137485dfbc:
9
for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
10
10
11
live-block-ops.txt: Rename, rewrite, and improve it (2017-07-18 00:11:01 -0400)
11
iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Block patches (documentation)
14
Block patches:
15
- The SSH block driver now uses libssh instead of libssh2
16
- The VMDK block driver gets read-only support for the seSparse
17
subformat
18
- Various fixes
19
20
---
21
22
v2:
23
- Squashed Pino's fix for pre-0.8 libssh into the libssh patch
24
15
----------------------------------------------------------------
25
----------------------------------------------------------------
26
Anton Nefedov (1):
27
iotest 134: test cluster-misaligned encrypted write
16
28
17
Kashyap Chamarthy (2):
29
Klaus Birkelund Jensen (1):
18
bitmaps.md: Convert to rST; move it into 'interop' dir
30
nvme: do not advertise support for unsupported arbitration mechanism
19
live-block-ops.txt: Rename, rewrite, and improve it
20
31
21
docs/devel/bitmaps.md | 505 ---------------
32
Max Reitz (1):
22
docs/interop/bitmaps.rst | 555 ++++++++++++++++
33
iotests: Fix 205 for concurrent runs
23
docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
34
24
docs/live-block-ops.txt | 72 ---
35
Pino Toscano (1):
25
4 files changed, 1643 insertions(+), 577 deletions(-)
36
ssh: switch from libssh2 to libssh
26
delete mode 100644 docs/devel/bitmaps.md
37
27
create mode 100644 docs/interop/bitmaps.rst
38
Sam Eiderman (3):
28
create mode 100644 docs/interop/live-block-operations.rst
39
vmdk: Fix comment regarding max l1_size coverage
29
delete mode 100644 docs/live-block-ops.txt
40
vmdk: Reduce the max bound for L1 table size
41
vmdk: Add read-only support for seSparse snapshots
42
43
Vladimir Sementsov-Ogievskiy (1):
44
blockdev: enable non-root nodes for transaction drive-backup source
45
46
configure | 65 +-
47
block/Makefile.objs | 6 +-
48
block/ssh.c | 652 ++++++++++--------
49
block/vmdk.c | 372 +++++++++-
50
blockdev.c | 2 +-
51
hw/block/nvme.c | 1 -
52
.travis.yml | 4 +-
53
block/trace-events | 14 +-
54
docs/qemu-block-drivers.texi | 2 +-
55
.../dockerfiles/debian-win32-cross.docker | 1 -
56
.../dockerfiles/debian-win64-cross.docker | 1 -
57
tests/docker/dockerfiles/fedora.docker | 4 +-
58
tests/docker/dockerfiles/ubuntu.docker | 2 +-
59
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
60
tests/qemu-iotests/059.out | 2 +-
61
tests/qemu-iotests/134 | 9 +
62
tests/qemu-iotests/134.out | 10 +
63
tests/qemu-iotests/205 | 2 +-
64
tests/qemu-iotests/207 | 54 +-
65
tests/qemu-iotests/207.out | 2 +-
66
20 files changed, 823 insertions(+), 384 deletions(-)
30
67
31
--
68
--
32
2.9.4
69
2.21.0
33
70
34
71
diff view generated by jsdifflib
New patch
1
From: Klaus Birkelund Jensen <klaus@birkelund.eu>
1
2
3
The device mistakenly reports that the Weighted Round Robin with Urgent
4
Priority Class arbitration mechanism is supported.
5
6
It is not.
7
8
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
9
Message-id: 20190606092530.14206-1-klaus@birkelund.eu
10
Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
11
Signed-off-by: Max Reitz <mreitz@redhat.com>
12
---
13
hw/block/nvme.c | 1 -
14
1 file changed, 1 deletion(-)
15
16
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/hw/block/nvme.c
19
+++ b/hw/block/nvme.c
20
@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
21
n->bar.cap = 0;
22
NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
23
NVME_CAP_SET_CQR(n->bar.cap, 1);
24
- NVME_CAP_SET_AMS(n->bar.cap, 1);
25
NVME_CAP_SET_TO(n->bar.cap, 0xf);
26
NVME_CAP_SET_CSS(n->bar.cap, 1);
27
NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
28
--
29
2.21.0
30
31
diff view generated by jsdifflib
New patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
2
3
We forget to enable it for transaction .prepare, while it is already
4
enabled in do_drive_backup since commit a2d665c1bc362
5
"blockdev: loosen restrictions on drive-backup source node"
6
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
8
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
9
Reviewed-by: John Snow <jsnow@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
---
12
blockdev.c | 2 +-
13
1 file changed, 1 insertion(+), 1 deletion(-)
14
15
diff --git a/blockdev.c b/blockdev.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/blockdev.c
18
+++ b/blockdev.c
19
@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
20
assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
21
backup = common->action->u.drive_backup.data;
22
23
- bs = qmp_get_root_bs(backup->device, errp);
24
+ bs = bdrv_lookup_bs(backup->device, backup->device, errp);
25
if (!bs) {
26
return;
27
}
28
--
29
2.21.0
30
31
diff view generated by jsdifflib
New patch
1
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
1
2
3
COW (even empty/zero) areas require encryption too
4
5
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
6
Reviewed-by: Eric Blake <eblake@redhat.com>
7
Reviewed-by: Max Reitz <mreitz@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
---
12
tests/qemu-iotests/134 | 9 +++++++++
13
tests/qemu-iotests/134.out | 10 ++++++++++
14
2 files changed, 19 insertions(+)
15
16
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/134
19
+++ b/tests/qemu-iotests/134
20
@@ -XXX,XX +XXX,XX @@ echo
21
echo "== reading whole image =="
22
$QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
23
24
+echo
25
+echo "== rewriting cluster part =="
26
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
27
+
28
+echo
29
+echo "== verify pattern =="
30
+$QEMU_IO --object $SECRET -c "read -P 0 0 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
31
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
32
+
33
echo
34
echo "== rewriting whole image =="
35
$QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
36
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
37
index XXXXXXX..XXXXXXX 100644
38
--- a/tests/qemu-iotests/134.out
39
+++ b/tests/qemu-iotests/134.out
40
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
41
read 134217728/134217728 bytes at offset 0
42
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
43
44
+== rewriting cluster part ==
45
+wrote 512/512 bytes at offset 512
46
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
47
+
48
+== verify pattern ==
49
+read 512/512 bytes at offset 0
50
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
51
+read 512/512 bytes at offset 512
52
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
53
+
54
== rewriting whole image ==
55
wrote 134217728/134217728 bytes at offset 0
56
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
57
--
58
2.21.0
59
60
diff view generated by jsdifflib
New patch
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
2
3
Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
4
extended the l1_size check from VMDK4 to VMDK3 but did not update the
5
default coverage in the moved comment.
6
7
The previous vmdk4 calculation:
8
9
(512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
10
11
The added vmdk3 calculation:
12
13
(512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
14
15
Adding the calculation of vmdk3 to the comment.
16
17
In any case, VMware does not offer virtual disks more than 2TB for
18
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
19
not implemented yet in qemu.
20
21
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
22
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
23
Reviewed-by: Liran Alon <liran.alon@oracle.com>
24
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
25
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
26
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
27
Reviewed-by: yuchenlin <yuchenlin@synology.com>
28
Reviewed-by: Max Reitz <mreitz@redhat.com>
29
Signed-off-by: Max Reitz <mreitz@redhat.com>
30
---
31
block/vmdk.c | 11 ++++++++---
32
1 file changed, 8 insertions(+), 3 deletions(-)
33
34
diff --git a/block/vmdk.c b/block/vmdk.c
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/vmdk.c
37
+++ b/block/vmdk.c
38
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
39
return -EFBIG;
40
}
41
if (l1_size > 512 * 1024 * 1024) {
42
- /* Although with big capacity and small l1_entry_sectors, we can get a
43
+ /*
44
+ * Although with big capacity and small l1_entry_sectors, we can get a
45
* big l1_size, we don't want unbounded value to allocate the table.
46
- * Limit it to 512M, which is 16PB for default cluster and L2 table
47
- * size */
48
+ * Limit it to 512M, which is:
49
+ * 16PB - for default "Hosted Sparse Extent" (VMDK4)
50
+ * cluster size: 64KB, L2 table size: 512 entries
51
+ * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
52
+ * cluster size: 512B, L2 table size: 4096 entries
53
+ */
54
error_setg(errp, "L1 size too big");
55
return -EFBIG;
56
}
57
--
58
2.21.0
59
60
diff view generated by jsdifflib
New patch
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
2
3
512M of L1 entries is a very loose bound, only 32M are required to store
4
the maximal supported VMDK file size of 2TB.
5
6
Fixed qemu-iotest 59# - now failure occures before on impossible L1
7
table size.
8
9
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
10
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
11
Reviewed-by: Liran Alon <liran.alon@oracle.com>
12
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
13
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
14
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
15
Reviewed-by: Max Reitz <mreitz@redhat.com>
16
Signed-off-by: Max Reitz <mreitz@redhat.com>
17
---
18
block/vmdk.c | 13 +++++++------
19
tests/qemu-iotests/059.out | 2 +-
20
2 files changed, 8 insertions(+), 7 deletions(-)
21
22
diff --git a/block/vmdk.c b/block/vmdk.c
23
index XXXXXXX..XXXXXXX 100644
24
--- a/block/vmdk.c
25
+++ b/block/vmdk.c
26
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
27
error_setg(errp, "Invalid granularity, image may be corrupt");
28
return -EFBIG;
29
}
30
- if (l1_size > 512 * 1024 * 1024) {
31
+ if (l1_size > 32 * 1024 * 1024) {
32
/*
33
* Although with big capacity and small l1_entry_sectors, we can get a
34
* big l1_size, we don't want unbounded value to allocate the table.
35
- * Limit it to 512M, which is:
36
- * 16PB - for default "Hosted Sparse Extent" (VMDK4)
37
- * cluster size: 64KB, L2 table size: 512 entries
38
- * 1PB - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
39
- * cluster size: 512B, L2 table size: 4096 entries
40
+ * Limit it to 32M, which is enough to store:
41
+ * 8TB - for both VMDK3 & VMDK4 with
42
+ * minimal cluster size: 512B
43
+ * minimal L2 table size: 512 entries
44
+ * 8 TB is still more than the maximal value supported for
45
+ * VMDK3 & VMDK4 which is 2TB.
46
*/
47
error_setg(errp, "L1 size too big");
48
return -EFBIG;
49
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
50
index XXXXXXX..XXXXXXX 100644
51
--- a/tests/qemu-iotests/059.out
52
+++ b/tests/qemu-iotests/059.out
53
@@ -XXX,XX +XXX,XX @@ Offset Length Mapped to File
54
0x140000000 0x10000 0x50000 TEST_DIR/t-s003.vmdk
55
56
=== Testing afl image with a very large capacity ===
57
-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
58
+qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
59
*** done
60
--
61
2.21.0
62
63
diff view generated by jsdifflib
New patch
1
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
2
3
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
4
QEMU).
5
6
This format was lacking in the following:
7
8
* Grain directory (L1) and grain table (L2) entries were 32-bit,
9
allowing access to only 2TB (slightly less) of data.
10
* The grain size (default) was 512 bytes - leading to data
11
fragmentation and many grain tables.
12
* For space reclamation purposes, it was necessary to find all the
13
grains which are not pointed to by any grain table - so a reverse
14
mapping of "offset of grain in vmdk" to "grain table" must be
15
constructed - which takes large amounts of CPU/RAM.
16
17
The format specification can be found in VMware's documentation:
18
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
19
20
In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
21
introduced: SESparse (Space Efficient).
22
23
This format fixes the above issues:
24
25
* All entries are now 64-bit.
26
* The grain size (default) is 4KB.
27
* Grain directory and grain tables are now located at the beginning
28
of the file.
29
+ seSparse format reserves space for all grain tables.
30
+ Grain tables can be addressed using an index.
31
+ Grains are located in the end of the file and can also be
32
addressed with an index.
33
- seSparse vmdks of large disks (64TB) have huge preallocated
34
headers - mainly due to L2 tables, even for empty snapshots.
35
* The header contains a reverse mapping ("backmap") of "offset of
36
grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
37
specifies for each grain - whether it is allocated or not.
38
Using these data structures we can implement space reclamation
39
efficiently.
40
* Due to the fact that the header now maintains two mappings:
41
* The regular one (grain directory & grain tables)
42
* A reverse one (backmap and free bitmap)
43
These data structures can lose consistency upon crash and result
44
in a corrupted VMDK.
45
Therefore, a journal is also added to the VMDK and is replayed
46
when the VMware reopens the file after a crash.
47
48
Since ESXi 6.7 - SESparse is the only snapshot format available.
49
50
Unfortunately, VMware does not provide documentation regarding the new
51
seSparse format.
52
53
This commit is based on black-box research of the seSparse format.
54
Various in-guest block operations and their effect on the snapshot file
55
were tested.
56
57
The only VMware provided source of information (regarding the underlying
58
implementation) was a log file on the ESXi:
59
60
/var/log/hostd.log
61
62
Whenever an seSparse snapshot is created - the log is being populated
63
with seSparse records.
64
65
Relevant log records are of the form:
66
67
[...] Const Header:
68
[...] constMagic = 0xcafebabe
69
[...] version = 2.1
70
[...] capacity = 204800
71
[...] grainSize = 8
72
[...] grainTableSize = 64
73
[...] flags = 0
74
[...] Extents:
75
[...] Header : <1 : 1>
76
[...] JournalHdr : <2 : 2>
77
[...] Journal : <2048 : 2048>
78
[...] GrainDirectory : <4096 : 2048>
79
[...] GrainTables : <6144 : 2048>
80
[...] FreeBitmap : <8192 : 2048>
81
[...] BackMap : <10240 : 2048>
82
[...] Grain : <12288 : 204800>
83
[...] Volatile Header:
84
[...] volatileMagic = 0xcafecafe
85
[...] FreeGTNumber = 0
86
[...] nextTxnSeqNumber = 0
87
[...] replayJournal = 0
88
89
The sizes that are seen in the log file are in sectors.
90
Extents are of the following format: <offset : size>
91
92
This commit is a strict implementation which enforces:
93
* magics
94
* version number 2.1
95
* grain size of 8 sectors (4KB)
96
* grain table size of 64 sectors
97
* zero flags
98
* extent locations
99
100
Additionally, this commit proivdes only a subset of the functionality
101
offered by seSparse's format:
102
* Read-only
103
* No journal replay
104
* No space reclamation
105
* No unmap support
106
107
Hence, journal header, journal, free bitmap and backmap extents are
108
unused, only the "classic" (L1 -> L2 -> data) grain access is
109
implemented.
110
111
However there are several differences in the grain access itself.
112
Grain directory (L1):
113
* Grain directory entries are indexes (not offsets) to grain
114
tables.
115
* Valid grain directory entries have their highest nibble set to
116
0x1.
117
* Since grain tables are always located in the beginning of the
118
file - the index can fit into 32 bits - so we can use its low
119
part if it's valid.
120
Grain table (L2):
121
* Grain table entries are indexes (not offsets) to grains.
122
* If the highest nibble of the entry is:
123
0x0:
124
The grain in not allocated.
125
The rest of the bytes are 0.
126
0x1:
127
The grain is unmapped - guest sees a zero grain.
128
The rest of the bits point to the previously mapped grain,
129
see 0x3 case.
130
0x2:
131
The grain is zero.
132
0x3:
133
The grain is allocated - to get the index calculate:
134
((entry & 0x0fff000000000000) >> 48) |
135
((entry & 0x0000ffffffffffff) << 12)
136
* The difference between 0x1 and 0x2 is that 0x1 is an unallocated
137
grain which results from the guest using sg_unmap to unmap the
138
grain - but the grain itself still exists in the grain extent - a
139
space reclamation procedure should delete it.
140
Unmapping a zero grain has no effect (0x2 will not change to 0x1)
141
but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
142
143
In order to implement seSparse some fields had to be changed to support
144
both 32-bit and 64-bit entry sizes.
145
146
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
147
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
148
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
149
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
150
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
151
Signed-off-by: Max Reitz <mreitz@redhat.com>
152
---
153
block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
154
1 file changed, 342 insertions(+), 16 deletions(-)
155
156
diff --git a/block/vmdk.c b/block/vmdk.c
157
index XXXXXXX..XXXXXXX 100644
158
--- a/block/vmdk.c
159
+++ b/block/vmdk.c
160
@@ -XXX,XX +XXX,XX @@ typedef struct {
161
uint16_t compressAlgorithm;
162
} QEMU_PACKED VMDK4Header;
163
164
+typedef struct VMDKSESparseConstHeader {
165
+ uint64_t magic;
166
+ uint64_t version;
167
+ uint64_t capacity;
168
+ uint64_t grain_size;
169
+ uint64_t grain_table_size;
170
+ uint64_t flags;
171
+ uint64_t reserved1;
172
+ uint64_t reserved2;
173
+ uint64_t reserved3;
174
+ uint64_t reserved4;
175
+ uint64_t volatile_header_offset;
176
+ uint64_t volatile_header_size;
177
+ uint64_t journal_header_offset;
178
+ uint64_t journal_header_size;
179
+ uint64_t journal_offset;
180
+ uint64_t journal_size;
181
+ uint64_t grain_dir_offset;
182
+ uint64_t grain_dir_size;
183
+ uint64_t grain_tables_offset;
184
+ uint64_t grain_tables_size;
185
+ uint64_t free_bitmap_offset;
186
+ uint64_t free_bitmap_size;
187
+ uint64_t backmap_offset;
188
+ uint64_t backmap_size;
189
+ uint64_t grains_offset;
190
+ uint64_t grains_size;
191
+ uint8_t pad[304];
192
+} QEMU_PACKED VMDKSESparseConstHeader;
193
+
194
+typedef struct VMDKSESparseVolatileHeader {
195
+ uint64_t magic;
196
+ uint64_t free_gt_number;
197
+ uint64_t next_txn_seq_number;
198
+ uint64_t replay_journal;
199
+ uint8_t pad[480];
200
+} QEMU_PACKED VMDKSESparseVolatileHeader;
201
+
202
#define L2_CACHE_SIZE 16
203
204
typedef struct VmdkExtent {
205
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
206
bool compressed;
207
bool has_marker;
208
bool has_zero_grain;
209
+ bool sesparse;
210
+ uint64_t sesparse_l2_tables_offset;
211
+ uint64_t sesparse_clusters_offset;
212
+ int32_t entry_size;
213
int version;
214
int64_t sectors;
215
int64_t end_sector;
216
int64_t flat_start_offset;
217
int64_t l1_table_offset;
218
int64_t l1_backup_table_offset;
219
- uint32_t *l1_table;
220
+ void *l1_table;
221
uint32_t *l1_backup_table;
222
unsigned int l1_size;
223
uint32_t l1_entry_sectors;
224
225
unsigned int l2_size;
226
- uint32_t *l2_cache;
227
+ void *l2_cache;
228
uint32_t l2_cache_offsets[L2_CACHE_SIZE];
229
uint32_t l2_cache_counts[L2_CACHE_SIZE];
230
231
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
232
* minimal L2 table size: 512 entries
233
* 8 TB is still more than the maximal value supported for
234
* VMDK3 & VMDK4 which is 2TB.
235
+ * 64TB - for "ESXi seSparse Extent"
236
+ * minimal cluster size: 512B (default is 4KB)
237
+ * L2 table size: 4096 entries (const).
238
+ * 64TB is more than the maximal value supported for
239
+ * seSparse VMDKs (which is slightly less than 64TB)
240
*/
241
error_setg(errp, "L1 size too big");
242
return -EFBIG;
243
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
244
extent->l2_size = l2_size;
245
extent->cluster_sectors = flat ? sectors : cluster_sectors;
246
extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
247
+ extent->entry_size = sizeof(uint32_t);
248
249
if (s->num_extents > 1) {
250
extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
251
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
252
int i;
253
254
/* read the L1 table */
255
- l1_size = extent->l1_size * sizeof(uint32_t);
256
+ l1_size = extent->l1_size * extent->entry_size;
257
extent->l1_table = g_try_malloc(l1_size);
258
if (l1_size && extent->l1_table == NULL) {
259
return -ENOMEM;
260
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
261
goto fail_l1;
262
}
263
for (i = 0; i < extent->l1_size; i++) {
264
- le32_to_cpus(&extent->l1_table[i]);
265
+ if (extent->entry_size == sizeof(uint64_t)) {
266
+ le64_to_cpus((uint64_t *)extent->l1_table + i);
267
+ } else {
268
+ assert(extent->entry_size == sizeof(uint32_t));
269
+ le32_to_cpus((uint32_t *)extent->l1_table + i);
270
+ }
271
}
272
273
if (extent->l1_backup_table_offset) {
274
+ assert(!extent->sesparse);
275
extent->l1_backup_table = g_try_malloc(l1_size);
276
if (l1_size && extent->l1_backup_table == NULL) {
277
ret = -ENOMEM;
278
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
279
}
280
281
extent->l2_cache =
282
- g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
283
+ g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
284
return 0;
285
fail_l1b:
286
g_free(extent->l1_backup_table);
287
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
288
return ret;
289
}
290
291
+#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
292
+#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
293
+
294
+/* Strict checks - format not officially documented */
295
+static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
296
+ Error **errp)
297
+{
298
+ header->magic = le64_to_cpu(header->magic);
299
+ header->version = le64_to_cpu(header->version);
300
+ header->grain_size = le64_to_cpu(header->grain_size);
301
+ header->grain_table_size = le64_to_cpu(header->grain_table_size);
302
+ header->flags = le64_to_cpu(header->flags);
303
+ header->reserved1 = le64_to_cpu(header->reserved1);
304
+ header->reserved2 = le64_to_cpu(header->reserved2);
305
+ header->reserved3 = le64_to_cpu(header->reserved3);
306
+ header->reserved4 = le64_to_cpu(header->reserved4);
307
+
308
+ header->volatile_header_offset =
309
+ le64_to_cpu(header->volatile_header_offset);
310
+ header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
311
+
312
+ header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
313
+ header->journal_header_size = le64_to_cpu(header->journal_header_size);
314
+
315
+ header->journal_offset = le64_to_cpu(header->journal_offset);
316
+ header->journal_size = le64_to_cpu(header->journal_size);
317
+
318
+ header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
319
+ header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
320
+
321
+ header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
322
+ header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
323
+
324
+ header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
325
+ header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
326
+
327
+ header->backmap_offset = le64_to_cpu(header->backmap_offset);
328
+ header->backmap_size = le64_to_cpu(header->backmap_size);
329
+
330
+ header->grains_offset = le64_to_cpu(header->grains_offset);
331
+ header->grains_size = le64_to_cpu(header->grains_size);
332
+
333
+ if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
334
+ error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
335
+ header->magic);
336
+ return -EINVAL;
337
+ }
338
+
339
+ if (header->version != 0x0000000200000001) {
340
+ error_setg(errp, "Unsupported version: 0x%016" PRIx64,
341
+ header->version);
342
+ return -ENOTSUP;
343
+ }
344
+
345
+ if (header->grain_size != 8) {
346
+ error_setg(errp, "Unsupported grain size: %" PRIu64,
347
+ header->grain_size);
348
+ return -ENOTSUP;
349
+ }
350
+
351
+ if (header->grain_table_size != 64) {
352
+ error_setg(errp, "Unsupported grain table size: %" PRIu64,
353
+ header->grain_table_size);
354
+ return -ENOTSUP;
355
+ }
356
+
357
+ if (header->flags != 0) {
358
+ error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
359
+ header->flags);
360
+ return -ENOTSUP;
361
+ }
362
+
363
+ if (header->reserved1 != 0 || header->reserved2 != 0 ||
364
+ header->reserved3 != 0 || header->reserved4 != 0) {
365
+ error_setg(errp, "Unsupported reserved bits:"
366
+ " 0x%016" PRIx64 " 0x%016" PRIx64
367
+ " 0x%016" PRIx64 " 0x%016" PRIx64,
368
+ header->reserved1, header->reserved2,
369
+ header->reserved3, header->reserved4);
370
+ return -ENOTSUP;
371
+ }
372
+
373
+ /* check that padding is 0 */
374
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
375
+ error_setg(errp, "Unsupported non-zero const header padding");
376
+ return -ENOTSUP;
377
+ }
378
+
379
+ return 0;
380
+}
381
+
382
+static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
383
+ Error **errp)
384
+{
385
+ header->magic = le64_to_cpu(header->magic);
386
+ header->free_gt_number = le64_to_cpu(header->free_gt_number);
387
+ header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
388
+ header->replay_journal = le64_to_cpu(header->replay_journal);
389
+
390
+ if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
391
+ error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
392
+ header->magic);
393
+ return -EINVAL;
394
+ }
395
+
396
+ if (header->replay_journal) {
397
+ error_setg(errp, "Image is dirty, Replaying journal not supported");
398
+ return -ENOTSUP;
399
+ }
400
+
401
+ /* check that padding is 0 */
402
+ if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
403
+ error_setg(errp, "Unsupported non-zero volatile header padding");
404
+ return -ENOTSUP;
405
+ }
406
+
407
+ return 0;
408
+}
409
+
410
+static int vmdk_open_se_sparse(BlockDriverState *bs,
411
+ BdrvChild *file,
412
+ int flags, Error **errp)
413
+{
414
+ int ret;
415
+ VMDKSESparseConstHeader const_header;
416
+ VMDKSESparseVolatileHeader volatile_header;
417
+ VmdkExtent *extent;
418
+
419
+ ret = bdrv_apply_auto_read_only(bs,
420
+ "No write support for seSparse images available", errp);
421
+ if (ret < 0) {
422
+ return ret;
423
+ }
424
+
425
+ assert(sizeof(const_header) == SECTOR_SIZE);
426
+
427
+ ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
428
+ if (ret < 0) {
429
+ bdrv_refresh_filename(file->bs);
430
+ error_setg_errno(errp, -ret,
431
+ "Could not read const header from file '%s'",
432
+ file->bs->filename);
433
+ return ret;
434
+ }
435
+
436
+ /* check const header */
437
+ ret = check_se_sparse_const_header(&const_header, errp);
438
+ if (ret < 0) {
439
+ return ret;
440
+ }
441
+
442
+ assert(sizeof(volatile_header) == SECTOR_SIZE);
443
+
444
+ ret = bdrv_pread(file,
445
+ const_header.volatile_header_offset * SECTOR_SIZE,
446
+ &volatile_header, sizeof(volatile_header));
447
+ if (ret < 0) {
448
+ bdrv_refresh_filename(file->bs);
449
+ error_setg_errno(errp, -ret,
450
+ "Could not read volatile header from file '%s'",
451
+ file->bs->filename);
452
+ return ret;
453
+ }
454
+
455
+ /* check volatile header */
456
+ ret = check_se_sparse_volatile_header(&volatile_header, errp);
457
+ if (ret < 0) {
458
+ return ret;
459
+ }
460
+
461
+ ret = vmdk_add_extent(bs, file, false,
462
+ const_header.capacity,
463
+ const_header.grain_dir_offset * SECTOR_SIZE,
464
+ 0,
465
+ const_header.grain_dir_size *
466
+ SECTOR_SIZE / sizeof(uint64_t),
467
+ const_header.grain_table_size *
468
+ SECTOR_SIZE / sizeof(uint64_t),
469
+ const_header.grain_size,
470
+ &extent,
471
+ errp);
472
+ if (ret < 0) {
473
+ return ret;
474
+ }
475
+
476
+ extent->sesparse = true;
477
+ extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
478
+ extent->sesparse_clusters_offset = const_header.grains_offset;
479
+ extent->entry_size = sizeof(uint64_t);
480
+
481
+ ret = vmdk_init_tables(bs, extent, errp);
482
+ if (ret) {
483
+ /* free extent allocated by vmdk_add_extent */
484
+ vmdk_free_last_extent(bs);
485
+ }
486
+
487
+ return ret;
488
+}
489
+
490
static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
491
QDict *options, Error **errp);
492
493
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
494
* RW [size in sectors] SPARSE "file-name.vmdk"
495
* RW [size in sectors] VMFS "file-name.vmdk"
496
* RW [size in sectors] VMFSSPARSE "file-name.vmdk"
497
+ * RW [size in sectors] SESPARSE "file-name.vmdk"
498
*/
499
flat_offset = -1;
500
matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
501
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
502
503
if (sectors <= 0 ||
504
(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
505
- strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
506
+ strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
507
+ strcmp(type, "SESPARSE")) ||
508
(strcmp(access, "RW"))) {
509
continue;
510
}
511
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
512
return ret;
513
}
514
extent = &s->extents[s->num_extents - 1];
515
+ } else if (!strcmp(type, "SESPARSE")) {
516
+ ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
517
+ if (ret) {
518
+ bdrv_unref_child(bs, extent_file);
519
+ return ret;
520
+ }
521
+ extent = &s->extents[s->num_extents - 1];
522
} else {
523
error_setg(errp, "Unsupported extent type '%s'", type);
524
bdrv_unref_child(bs, extent_file);
525
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
526
if (strcmp(ct, "monolithicFlat") &&
527
strcmp(ct, "vmfs") &&
528
strcmp(ct, "vmfsSparse") &&
529
+ strcmp(ct, "seSparse") &&
530
strcmp(ct, "twoGbMaxExtentSparse") &&
531
strcmp(ct, "twoGbMaxExtentFlat")) {
532
error_setg(errp, "Unsupported image type '%s'", ct);
533
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
534
{
535
unsigned int l1_index, l2_offset, l2_index;
536
int min_index, i, j;
537
- uint32_t min_count, *l2_table;
538
+ uint32_t min_count;
539
+ void *l2_table;
540
bool zeroed = false;
541
int64_t ret;
542
int64_t cluster_sector;
543
+ unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
544
545
if (m_data) {
546
m_data->valid = 0;
547
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
548
if (l1_index >= extent->l1_size) {
549
return VMDK_ERROR;
550
}
551
- l2_offset = extent->l1_table[l1_index];
552
+ if (extent->sesparse) {
553
+ uint64_t l2_offset_u64;
554
+
555
+ assert(extent->entry_size == sizeof(uint64_t));
556
+
557
+ l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
558
+ if (l2_offset_u64 == 0) {
559
+ l2_offset = 0;
560
+ } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
561
+ /*
562
+ * Top most nibble is 0x1 if grain table is allocated.
563
+ * strict check - top most 4 bytes must be 0x10000000 since max
564
+ * supported size is 64TB for disk - so no more than 64TB / 16MB
565
+ * grain directories which is smaller than uint32,
566
+ * where 16MB is the only supported default grain table coverage.
567
+ */
568
+ return VMDK_ERROR;
569
+ } else {
570
+ l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
571
+ l2_offset_u64 = extent->sesparse_l2_tables_offset +
572
+ l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
573
+ if (l2_offset_u64 > 0x00000000ffffffff) {
574
+ return VMDK_ERROR;
575
+ }
576
+ l2_offset = (unsigned int)(l2_offset_u64);
577
+ }
578
+ } else {
579
+ assert(extent->entry_size == sizeof(uint32_t));
580
+ l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
581
+ }
582
if (!l2_offset) {
583
return VMDK_UNALLOC;
584
}
585
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
586
extent->l2_cache_counts[j] >>= 1;
587
}
588
}
589
- l2_table = extent->l2_cache + (i * extent->l2_size);
590
+ l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
591
goto found;
592
}
593
}
594
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
595
min_index = i;
596
}
597
}
598
- l2_table = extent->l2_cache + (min_index * extent->l2_size);
599
+ l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
600
BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
601
if (bdrv_pread(extent->file,
602
(int64_t)l2_offset * 512,
603
l2_table,
604
- extent->l2_size * sizeof(uint32_t)
605
- ) != extent->l2_size * sizeof(uint32_t)) {
606
+ l2_size_bytes
607
+ ) != l2_size_bytes) {
608
return VMDK_ERROR;
609
}
610
611
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
612
extent->l2_cache_counts[min_index] = 1;
613
found:
614
l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
615
- cluster_sector = le32_to_cpu(l2_table[l2_index]);
616
617
- if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
618
- zeroed = true;
619
+ if (extent->sesparse) {
620
+ cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
621
+ switch (cluster_sector & 0xf000000000000000) {
622
+ case 0x0000000000000000:
623
+ /* unallocated grain */
624
+ if (cluster_sector != 0) {
625
+ return VMDK_ERROR;
626
+ }
627
+ break;
628
+ case 0x1000000000000000:
629
+ /* scsi-unmapped grain - fallthrough */
630
+ case 0x2000000000000000:
631
+ /* zero grain */
632
+ zeroed = true;
633
+ break;
634
+ case 0x3000000000000000:
635
+ /* allocated grain */
636
+ cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
637
+ ((cluster_sector & 0x0000ffffffffffff) << 12));
638
+ cluster_sector = extent->sesparse_clusters_offset +
639
+ cluster_sector * extent->cluster_sectors;
640
+ break;
641
+ default:
642
+ return VMDK_ERROR;
643
+ }
644
+ } else {
645
+ cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
646
+
647
+ if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
648
+ zeroed = true;
649
+ }
650
}
651
652
if (!cluster_sector || zeroed) {
653
if (!allocate) {
654
return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
655
}
656
+ assert(!extent->sesparse);
657
658
if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
659
return VMDK_ERROR;
660
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
661
m_data->l1_index = l1_index;
662
m_data->l2_index = l2_index;
663
m_data->l2_offset = l2_offset;
664
- m_data->l2_cache_entry = &l2_table[l2_index];
665
+ m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
666
}
667
}
668
*cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
669
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
670
if (!extent) {
671
return -EIO;
672
}
673
+ if (extent->sesparse) {
674
+ return -ENOTSUP;
675
+ }
676
offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
677
n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
678
- offset_in_cluster);
679
--
680
2.21.0
681
682
diff view generated by jsdifflib
1
From: Kashyap Chamarthy <kchamart@redhat.com>
1
From: Pino Toscano <ptoscano@redhat.com>
2
2
3
This patch documents (including their QMP invocations) all the four
3
Rewrite the implementation of the ssh block driver to use libssh instead
4
major kinds of live block operations:
4
of libssh2. The libssh library has various advantages over libssh2:
5
- easier API for authentication (for example for using ssh-agent)
6
- easier API for known_hosts handling
7
- supports newer types of keys in known_hosts
5
8
6
- `block-stream`
9
Use APIs/features available in libssh 0.8 conditionally, to support
7
- `block-commit`
10
older versions (which are not recommended though).
8
- `drive-mirror` (& `blockdev-mirror`)
9
- `drive-backup` (& `blockdev-backup`)
10
11
11
Things considered while writing this document:
12
Adjust the iotest 207 according to the different error message, and to
13
find the default key type for localhost (to properly compare the
14
fingerprint with).
15
Contributed-by: Max Reitz <mreitz@redhat.com>
12
16
13
- Use reStructuredText as markup language (with the goal of generating
17
Adjust the various Docker/Travis scripts to use libssh when available
14
the HTML output using the Sphinx Documentation Generator). It is
18
instead of libssh2. The mingw/mxe testing is dropped for now, as there
15
gentler on the eye, and can be trivially converted to different
19
are no packages for it.
16
formats. (Another reason: upstream QEMU is considering to switch to
17
Sphinx, which uses reStructuredText as its markup language.)
18
20
19
- Raw QMP JSON output vs. 'qmp-shell'. I debated with myself whether
21
Signed-off-by: Pino Toscano <ptoscano@redhat.com>
20
to only show raw QMP JSON output (as that is the canonical
22
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
21
representation), or use 'qmp-shell', which takes key-value pairs. I
23
Acked-by: Alex Bennée <alex.bennee@linaro.org>
22
settled on the approach of: for the first occurrence of a command,
24
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
23
use raw JSON; for subsequent occurrences, use 'qmp-shell', with an
25
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
24
occasional exception.
26
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
27
Signed-off-by: Max Reitz <mreitz@redhat.com>
28
---
29
configure | 65 +-
30
block/Makefile.objs | 6 +-
31
block/ssh.c | 652 ++++++++++--------
32
.travis.yml | 4 +-
33
block/trace-events | 14 +-
34
docs/qemu-block-drivers.texi | 2 +-
35
.../dockerfiles/debian-win32-cross.docker | 1 -
36
.../dockerfiles/debian-win64-cross.docker | 1 -
37
tests/docker/dockerfiles/fedora.docker | 4 +-
38
tests/docker/dockerfiles/ubuntu.docker | 2 +-
39
tests/docker/dockerfiles/ubuntu1804.docker | 2 +-
40
tests/qemu-iotests/207 | 54 +-
41
tests/qemu-iotests/207.out | 2 +-
42
13 files changed, 449 insertions(+), 360 deletions(-)
25
43
26
- Usage of `-blockdev` command-line.
44
diff --git a/configure b/configure
27
45
index XXXXXXX..XXXXXXX 100755
28
- Usage of 'node-name' vs. file path to refer to disks. While we have
46
--- a/configure
29
`blockdev-{mirror, backup}` as 'node-name'-alternatives for
47
+++ b/configure
30
`drive-{mirror, backup}`, the `block-commit` command still operates
48
@@ -XXX,XX +XXX,XX @@ auth_pam=""
31
on file names for parameters 'base' and 'top'. So I added a caveat
49
vte=""
32
at the beginning to that effect.
50
virglrenderer=""
33
51
tpm=""
34
Refer this related thread that I started (where I learnt
52
-libssh2=""
35
`block-stream` was recently reworked to accept 'node-name' for 'top'
53
+libssh=""
36
and 'base' parameters):
54
live_block_migration="yes"
37
https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
55
numa=""
38
"[RFC] Making 'block-stream', and 'block-commit' accept node-name"
56
tcmalloc="no"
39
57
@@ -XXX,XX +XXX,XX @@ for opt do
40
All commands showed in this document were tested while documenting.
58
;;
41
59
--enable-tpm) tpm="yes"
42
Thanks: Eric Blake for the section: "A note on points-in-time vs file
60
;;
43
names". This useful bit was originally articulated by Eric in his
61
- --disable-libssh2) libssh2="no"
44
KVMForum 2015 presentation, so I included that specific bit in this
62
+ --disable-libssh) libssh="no"
45
document.
63
;;
46
64
- --enable-libssh2) libssh2="yes"
47
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
65
+ --enable-libssh) libssh="yes"
48
Reviewed-by: Eric Blake <eblake@redhat.com>
66
;;
49
Message-id: 20170717105205.32639-3-kchamart@redhat.com
67
--disable-live-block-migration) live_block_migration="no"
50
Signed-off-by: Jeff Cody <jcody@redhat.com>
68
;;
51
---
69
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
52
docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
70
coroutine-pool coroutine freelist (better performance)
53
docs/live-block-ops.txt | 72 ---
71
glusterfs GlusterFS backend
54
2 files changed, 1088 insertions(+), 72 deletions(-)
72
tpm TPM support
55
create mode 100644 docs/interop/live-block-operations.rst
73
- libssh2 ssh block device support
56
delete mode 100644 docs/live-block-ops.txt
74
+ libssh ssh block device support
57
75
numa libnuma support
58
diff --git a/docs/interop/live-block-operations.rst b/docs/interop/live-block-operations.rst
76
libxml2 for Parallels image format
59
new file mode 100644
77
tcmalloc tcmalloc support
60
index XXXXXXX..XXXXXXX
78
@@ -XXX,XX +XXX,XX @@ EOF
61
--- /dev/null
79
fi
62
+++ b/docs/interop/live-block-operations.rst
80
81
##########################################
82
-# libssh2 probe
83
-min_libssh2_version=1.2.8
84
-if test "$libssh2" != "no" ; then
85
- if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
86
- libssh2_cflags=$($pkg_config libssh2 --cflags)
87
- libssh2_libs=$($pkg_config libssh2 --libs)
88
- libssh2=yes
89
+# libssh probe
90
+if test "$libssh" != "no" ; then
91
+ if $pkg_config --exists libssh; then
92
+ libssh_cflags=$($pkg_config libssh --cflags)
93
+ libssh_libs=$($pkg_config libssh --libs)
94
+ libssh=yes
95
else
96
- if test "$libssh2" = "yes" ; then
97
- error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
98
+ if test "$libssh" = "yes" ; then
99
+ error_exit "libssh required for --enable-libssh"
100
fi
101
- libssh2=no
102
+ libssh=no
103
fi
104
fi
105
106
##########################################
107
-# libssh2_sftp_fsync probe
108
+# Check for libssh 0.8
109
+# This is done like this instead of using the LIBSSH_VERSION_* and
110
+# SSH_VERSION_* macros because some distributions in the past shipped
111
+# snapshots of the future 0.8 from Git, and those snapshots did not
112
+# have updated version numbers (still referring to 0.7.0).
113
114
-if test "$libssh2" = "yes"; then
115
+if test "$libssh" = "yes"; then
116
cat > $TMPC <<EOF
117
-#include <stdio.h>
118
-#include <libssh2.h>
119
-#include <libssh2_sftp.h>
120
-int main(void) {
121
- LIBSSH2_SESSION *session;
122
- LIBSSH2_SFTP *sftp;
123
- LIBSSH2_SFTP_HANDLE *sftp_handle;
124
- session = libssh2_session_init ();
125
- sftp = libssh2_sftp_init (session);
126
- sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
127
- libssh2_sftp_fsync (sftp_handle);
128
- return 0;
129
-}
130
+#include <libssh/libssh.h>
131
+int main(void) { return ssh_get_server_publickey(NULL, NULL); }
132
EOF
133
- # libssh2_cflags/libssh2_libs defined in previous test.
134
- if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
135
- QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
136
+ if compile_prog "$libssh_cflags" "$libssh_libs"; then
137
+ libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
138
fi
139
fi
140
141
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
142
echo "gcov $gcov_tool"
143
echo "gcov enabled $gcov"
144
echo "TPM support $tpm"
145
-echo "libssh2 support $libssh2"
146
+echo "libssh support $libssh"
147
echo "QOM debugging $qom_cast_debug"
148
echo "Live block migration $live_block_migration"
149
echo "lzo support $lzo"
150
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
151
echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
152
fi
153
154
-if test "$libssh2" = "yes" ; then
155
- echo "CONFIG_LIBSSH2=m" >> $config_host_mak
156
- echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
157
- echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
158
+if test "$libssh" = "yes" ; then
159
+ echo "CONFIG_LIBSSH=m" >> $config_host_mak
160
+ echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
161
+ echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
162
fi
163
164
if test "$live_block_migration" = "yes" ; then
165
diff --git a/block/Makefile.objs b/block/Makefile.objs
166
index XXXXXXX..XXXXXXX 100644
167
--- a/block/Makefile.objs
168
+++ b/block/Makefile.objs
169
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
170
block-obj-$(CONFIG_RBD) += rbd.o
171
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
172
block-obj-$(CONFIG_VXHS) += vxhs.o
173
-block-obj-$(CONFIG_LIBSSH2) += ssh.o
174
+block-obj-$(CONFIG_LIBSSH) += ssh.o
175
block-obj-y += accounting.o dirty-bitmap.o
176
block-obj-y += write-threshold.o
177
block-obj-y += backup.o
178
@@ -XXX,XX +XXX,XX @@ rbd.o-libs := $(RBD_LIBS)
179
gluster.o-cflags := $(GLUSTERFS_CFLAGS)
180
gluster.o-libs := $(GLUSTERFS_LIBS)
181
vxhs.o-libs := $(VXHS_LIBS)
182
-ssh.o-cflags := $(LIBSSH2_CFLAGS)
183
-ssh.o-libs := $(LIBSSH2_LIBS)
184
+ssh.o-cflags := $(LIBSSH_CFLAGS)
185
+ssh.o-libs := $(LIBSSH_LIBS)
186
block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
187
block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
188
dmg-bz2.o-libs := $(BZIP2_LIBS)
189
diff --git a/block/ssh.c b/block/ssh.c
190
index XXXXXXX..XXXXXXX 100644
191
--- a/block/ssh.c
192
+++ b/block/ssh.c
63
@@ -XXX,XX +XXX,XX @@
193
@@ -XXX,XX +XXX,XX @@
64
+..
194
65
+ Copyright (C) 2017 Red Hat Inc.
195
#include "qemu/osdep.h"
66
+
196
67
+ This work is licensed under the terms of the GNU GPL, version 2 or
197
-#include <libssh2.h>
68
+ later. See the COPYING file in the top-level directory.
198
-#include <libssh2_sftp.h>
69
+
199
+#include <libssh/libssh.h>
70
+============================
200
+#include <libssh/sftp.h>
71
+Live Block Device Operations
201
72
+============================
202
#include "block/block_int.h"
73
+
203
#include "block/qdict.h"
74
+QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of
204
@@ -XXX,XX +XXX,XX @@
75
+live block device jobs -- stream, commit, mirror, and backup. These can
205
#include "trace.h"
76
+be used to manipulate disk image chains to accomplish certain tasks,
206
77
+namely: live copy data from backing files into overlays; shorten long
207
/*
78
+disk image chains by merging data from overlays into backing files; live
208
- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself. Note
79
+synchronize data from a disk image chain (including current active disk)
209
- * that this requires that libssh2 was specially compiled with the
80
+to another target image; and point-in-time (and incremental) backups of
210
- * `./configure --enable-debug' option, so most likely you will have
81
+a block device. Below is a description of the said block (QMP)
211
- * to compile it yourself. The meaning of <bitmask> is described
82
+primitives, and some (non-exhaustive list of) examples to illustrate
212
- * here: http://www.libssh2.org/libssh2_trace.html
83
+their use.
213
+ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
84
+
214
+ * The meaning of <level> is described here:
85
+.. note::
215
+ * http://api.libssh.org/master/group__libssh__log.html
86
+ The file ``qapi/block-core.json`` in the QEMU source tree has the
216
*/
87
+ canonical QEMU API (QAPI) schema documentation for the QMP
217
-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
88
+ primitives discussed here.
218
+#define TRACE_LIBSSH 0 /* see: SSH_LOG_* */
89
+
219
90
+.. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is
220
typedef struct BDRVSSHState {
91
+ integrated.
221
/* Coroutine. */
92
+
222
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
93
+.. contents::
223
94
+
224
/* SSH connection. */
95
+Disk image backing chain notation
225
int sock; /* socket */
96
+---------------------------------
226
- LIBSSH2_SESSION *session; /* ssh session */
97
+
227
- LIBSSH2_SFTP *sftp; /* sftp session */
98
+A simple disk image chain. (This can be created live using QMP
228
- LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
99
+``blockdev-snapshot-sync``, or offline via ``qemu-img``)::
229
+ ssh_session session; /* ssh session */
100
+
230
+ sftp_session sftp; /* sftp session */
101
+ (Live QEMU)
231
+ sftp_file sftp_handle; /* sftp remote file handle */
102
+ |
232
103
+ .
233
- /* See ssh_seek() function below. */
104
+ V
234
- int64_t offset;
105
+
235
- bool offset_op_read;
106
+ [A] <----- [B]
236
-
107
+
237
- /* File attributes at open. We try to keep the .filesize field
108
+ (backing file) (overlay)
238
+ /*
109
+
239
+ * File attributes at open. We try to keep the .size field
110
+The arrow can be read as: Image [A] is the backing file of disk image
240
* updated if it changes (eg by writing at the end of the file).
111
+[B]. And live QEMU is currently writing to image [B], consequently, it
241
*/
112
+is also referred to as the "active layer".
242
- LIBSSH2_SFTP_ATTRIBUTES attrs;
113
+
243
+ sftp_attributes attrs;
114
+There are two kinds of terminology that are common when referring to
244
115
+files in a disk image backing chain:
245
InetSocketAddress *inet;
116
+
246
117
+(1) Directional: 'base' and 'top'. Given the simple disk image chain
247
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
118
+ above, image [A] can be referred to as 'base', and image [B] as
248
{
119
+ 'top'. (This terminology can be seen in in QAPI schema file,
249
memset(s, 0, sizeof *s);
120
+ block-core.json.)
250
s->sock = -1;
121
+
251
- s->offset = -1;
122
+(2) Relational: 'backing file' and 'overlay'. Again, taking the same
252
qemu_co_mutex_init(&s->lock);
123
+ simple disk image chain from the above, disk image [A] is referred
253
}
124
+ to as the backing file, and image [B] as overlay.
254
125
+
255
@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
126
+ Throughout this document, we will use the relational terminology.
256
{
127
+
257
g_free(s->user);
128
+.. important::
258
129
+ The overlay files can generally be any format that supports a
259
+ if (s->attrs) {
130
+ backing file, although QCOW2 is the preferred format and the one
260
+ sftp_attributes_free(s->attrs);
131
+ used in this document.
261
+ }
132
+
262
if (s->sftp_handle) {
133
+
263
- libssh2_sftp_close(s->sftp_handle);
134
+Brief overview of live block QMP primitives
264
+ sftp_close(s->sftp_handle);
135
+-------------------------------------------
265
}
136
+
266
if (s->sftp) {
137
+The following are the four different kinds of live block operations that
267
- libssh2_sftp_shutdown(s->sftp);
138
+QEMU block layer supports.
268
+ sftp_free(s->sftp);
139
+
269
}
140
+(1) ``block-stream``: Live copy of data from backing files into overlay
270
if (s->session) {
141
+ files.
271
- libssh2_session_disconnect(s->session,
142
+
272
- "from qemu ssh client: "
143
+ .. note:: Once the 'stream' operation has finished, three things to
273
- "user closed the connection");
144
+ note:
274
- libssh2_session_free(s->session);
145
+
275
- }
146
+ (a) QEMU rewrites the backing chain to remove
276
- if (s->sock >= 0) {
147
+ reference to the now-streamed and redundant backing
277
- close(s->sock);
148
+ file;
278
+ ssh_disconnect(s->session);
149
+
279
+ ssh_free(s->session); /* This frees s->sock */
150
+ (b) the streamed file *itself* won't be removed by QEMU,
280
}
151
+ and must be explicitly discarded by the user;
281
}
152
+
282
153
+ (c) the streamed file remains valid -- i.e. further
283
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
154
+ overlays can be created based on it. Refer the
284
va_end(args);
155
+ ``block-stream`` section further below for more
285
156
+ details.
286
if (s->session) {
157
+
287
- char *ssh_err;
158
+(2) ``block-commit``: Live merge of data from overlay files into backing
288
+ const char *ssh_err;
159
+ files (with the optional goal of removing the overlay file from the
289
int ssh_err_code;
160
+ chain). Since QEMU 2.0, this includes "active ``block-commit``"
290
161
+ (i.e. merge the current active layer into the base image).
291
- /* This is not an errno. See <libssh2.h>. */
162
+
292
- ssh_err_code = libssh2_session_last_error(s->session,
163
+ .. note:: Once the 'commit' operation has finished, there are three
293
- &ssh_err, NULL, 0);
164
+ things to note here as well:
294
- error_setg(errp, "%s: %s (libssh2 error code: %d)",
165
+
295
+ /* This is not an errno. See <libssh/libssh.h>. */
166
+ (a) QEMU rewrites the backing chain to remove reference
296
+ ssh_err = ssh_get_error(s->session);
167
+ to now-redundant overlay images that have been
297
+ ssh_err_code = ssh_get_error_code(s->session);
168
+ committed into a backing file;
298
+ error_setg(errp, "%s: %s (libssh error code: %d)",
169
+
299
msg, ssh_err, ssh_err_code);
170
+ (b) the committed file *itself* won't be removed by QEMU
300
} else {
171
+ -- it ought to be manually removed;
301
error_setg(errp, "%s", msg);
172
+
302
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
173
+ (c) however, unlike in the case of ``block-stream``, the
303
va_end(args);
174
+ intermediate images will be rendered invalid -- i.e.
304
175
+ no more further overlays can be created based on
305
if (s->sftp) {
176
+ them. Refer the ``block-commit`` section further
306
- char *ssh_err;
177
+ below for more details.
307
+ const char *ssh_err;
178
+
308
int ssh_err_code;
179
+(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
309
- unsigned long sftp_err_code;
180
+ disk to another image.
310
+ int sftp_err_code;
181
+
311
182
+(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
312
- /* This is not an errno. See <libssh2.h>. */
183
+ of a block device to a destination.
313
- ssh_err_code = libssh2_session_last_error(s->session,
184
+
314
- &ssh_err, NULL, 0);
185
+
315
- /* See <libssh2_sftp.h>. */
186
+.. _`Interacting with a QEMU instance`:
316
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
187
+
317
+ /* This is not an errno. See <libssh/libssh.h>. */
188
+Interacting with a QEMU instance
318
+ ssh_err = ssh_get_error(s->session);
189
+--------------------------------
319
+ ssh_err_code = ssh_get_error_code(s->session);
190
+
320
+ /* See <libssh/sftp.h>. */
191
+To show some example invocations of command-line, we will use the
321
+ sftp_err_code = sftp_get_error(s->sftp);
192
+following invocation of QEMU, with a QMP server running over UNIX
322
193
+socket::
323
error_setg(errp,
194
+
324
- "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
195
+ $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
325
+ "%s: %s (libssh error code: %d, sftp error code: %d)",
196
+ -M q35 -nodefaults -m 512 \
326
msg, ssh_err, ssh_err_code, sftp_err_code);
197
+ -blockdev node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \
327
} else {
198
+ -device virtio-blk,drive=node-A,id=virtio0 \
328
error_setg(errp, "%s", msg);
199
+ -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait
329
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
200
+
330
201
+The ``-blockdev`` command-line option, used above, is available from
331
static void sftp_error_trace(BDRVSSHState *s, const char *op)
202
+QEMU 2.9 onwards. In the above invocation, notice the ``node-name``
332
{
203
+parameter that is used to refer to the disk image a.qcow2 ('node-A') --
333
- char *ssh_err;
204
+this is a cleaner way to refer to a disk image (as opposed to referring
334
+ const char *ssh_err;
205
+to it by spelling out file paths). So, we will continue to designate a
335
int ssh_err_code;
206
+``node-name`` to each further disk image created (either via
336
- unsigned long sftp_err_code;
207
+``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
337
+ int sftp_err_code;
208
+image chain, and continue to refer to the disks using their
338
209
+``node-name`` (where possible, because ``block-commit`` does not yet, as
339
- /* This is not an errno. See <libssh2.h>. */
210
+of QEMU 2.9, accept ``node-name`` parameter) when performing various
340
- ssh_err_code = libssh2_session_last_error(s->session,
211
+block operations.
341
- &ssh_err, NULL, 0);
212
+
342
- /* See <libssh2_sftp.h>. */
213
+To interact with the QEMU instance launched above, we will use the
343
- sftp_err_code = libssh2_sftp_last_error((s)->sftp);
214
+``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the
344
+ /* This is not an errno. See <libssh/libssh.h>. */
215
+QEMU source directory), which takes key-value pairs for QMP commands.
345
+ ssh_err = ssh_get_error(s->session);
216
+Invoke it as below (which will also print out the complete raw JSON
346
+ ssh_err_code = ssh_get_error_code(s->session);
217
+syntax for reference -- examples in the following sections)::
347
+ /* See <libssh/sftp.h>. */
218
+
348
+ sftp_err_code = sftp_get_error(s->sftp);
219
+ $ ./qmp-shell -v -p /tmp/qmp-sock
349
220
+ (QEMU)
350
trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
221
+
351
}
222
+.. note::
352
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
223
+ In the event we have to repeat a certain QMP command, we will: for
353
parse_uri(filename, options, errp);
224
+ the first occurrence of it, show the ``qmp-shell`` invocation, *and*
354
}
225
+ the corresponding raw JSON QMP syntax; but for subsequent
355
226
+ invocations, present just the ``qmp-shell`` syntax, and omit the
356
-static int check_host_key_knownhosts(BDRVSSHState *s,
227
+ equivalent JSON output.
357
- const char *host, int port, Error **errp)
228
+
358
+static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
229
+
359
{
230
+Example disk image chain
360
- const char *home;
231
+------------------------
361
- char *knh_file = NULL;
232
+
362
- LIBSSH2_KNOWNHOSTS *knh = NULL;
233
+We will use the below disk image chain (and occasionally spelling it
363
- struct libssh2_knownhost *found;
234
+out where appropriate) when discussing various primitives::
364
- int ret, r;
235
+
365
- const char *hostkey;
236
+ [A] <-- [B] <-- [C] <-- [D]
366
- size_t len;
237
+
367
- int type;
238
+Where [A] is the original base image; [B] and [C] are intermediate
368
-
239
+overlay images; image [D] is the active layer -- i.e. live QEMU is
369
- hostkey = libssh2_session_hostkey(s->session, &len, &type);
240
+writing to it. (The rule of thumb is: live QEMU will always be pointing
370
- if (!hostkey) {
241
+to the rightmost image in a disk image chain.)
371
+ int ret;
242
+
372
+#ifdef HAVE_LIBSSH_0_8
243
+The above image chain can be created by invoking
373
+ enum ssh_known_hosts_e state;
244
+``blockdev-snapshot-sync`` commands as following (which shows the
374
+ int r;
245
+creation of overlay image [B]) using the ``qmp-shell`` (our invocation
375
+ ssh_key pubkey;
246
+also prints the raw JSON invocation of it)::
376
+ enum ssh_keytypes_e pubkey_type;
247
+
377
+ unsigned char *server_hash = NULL;
248
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
378
+ size_t server_hash_len;
249
+ {
379
+ char *fingerprint = NULL;
250
+ "execute": "blockdev-snapshot-sync",
380
+
251
+ "arguments": {
381
+ state = ssh_session_is_known_server(s->session);
252
+ "node-name": "node-A",
382
+ trace_ssh_server_status(state);
253
+ "snapshot-file": "b.qcow2",
383
+
254
+ "format": "qcow2",
384
+ switch (state) {
255
+ "snapshot-node-name": "node-B"
385
+ case SSH_KNOWN_HOSTS_OK:
386
+ /* OK */
387
+ trace_ssh_check_host_key_knownhosts();
388
+ break;
389
+ case SSH_KNOWN_HOSTS_CHANGED:
390
ret = -EINVAL;
391
- session_error_setg(errp, s, "failed to read remote host key");
392
+ r = ssh_get_server_publickey(s->session, &pubkey);
393
+ if (r == 0) {
394
+ r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
395
+ &server_hash, &server_hash_len);
396
+ pubkey_type = ssh_key_type(pubkey);
397
+ ssh_key_free(pubkey);
398
+ }
399
+ if (r == 0) {
400
+ fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
401
+ server_hash,
402
+ server_hash_len);
403
+ ssh_clean_pubkey_hash(&server_hash);
404
+ }
405
+ if (fingerprint) {
406
+ error_setg(errp,
407
+ "host key (%s key with fingerprint %s) does not match "
408
+ "the one in known_hosts; this may be a possible attack",
409
+ ssh_key_type_to_char(pubkey_type), fingerprint);
410
+ ssh_string_free_char(fingerprint);
411
+ } else {
412
+ error_setg(errp,
413
+ "host key does not match the one in known_hosts; this "
414
+ "may be a possible attack");
415
+ }
416
goto out;
417
- }
418
-
419
- knh = libssh2_knownhost_init(s->session);
420
- if (!knh) {
421
+ case SSH_KNOWN_HOSTS_OTHER:
422
ret = -EINVAL;
423
- session_error_setg(errp, s,
424
- "failed to initialize known hosts support");
425
+ error_setg(errp,
426
+ "host key for this server not found, another type exists");
427
+ goto out;
428
+ case SSH_KNOWN_HOSTS_UNKNOWN:
429
+ ret = -EINVAL;
430
+ error_setg(errp, "no host key was found in known_hosts");
431
+ goto out;
432
+ case SSH_KNOWN_HOSTS_NOT_FOUND:
433
+ ret = -ENOENT;
434
+ error_setg(errp, "known_hosts file not found");
435
+ goto out;
436
+ case SSH_KNOWN_HOSTS_ERROR:
437
+ ret = -EINVAL;
438
+ error_setg(errp, "error while checking the host");
439
+ goto out;
440
+ default:
441
+ ret = -EINVAL;
442
+ error_setg(errp, "error while checking for known server (%d)", state);
443
goto out;
444
}
445
+#else /* !HAVE_LIBSSH_0_8 */
446
+ int state;
447
448
- home = getenv("HOME");
449
- if (home) {
450
- knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
451
- } else {
452
- knh_file = g_strdup_printf("/root/.ssh/known_hosts");
453
- }
454
-
455
- /* Read all known hosts from OpenSSH-style known_hosts file. */
456
- libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
457
+ state = ssh_is_server_known(s->session);
458
+ trace_ssh_server_status(state);
459
460
- r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
461
- LIBSSH2_KNOWNHOST_TYPE_PLAIN|
462
- LIBSSH2_KNOWNHOST_KEYENC_RAW,
463
- &found);
464
- switch (r) {
465
- case LIBSSH2_KNOWNHOST_CHECK_MATCH:
466
+ switch (state) {
467
+ case SSH_SERVER_KNOWN_OK:
468
/* OK */
469
- trace_ssh_check_host_key_knownhosts(found->key);
470
+ trace_ssh_check_host_key_knownhosts();
471
break;
472
- case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
473
+ case SSH_SERVER_KNOWN_CHANGED:
474
ret = -EINVAL;
475
- session_error_setg(errp, s,
476
- "host key does not match the one in known_hosts"
477
- " (found key %s)", found->key);
478
+ error_setg(errp,
479
+ "host key does not match the one in known_hosts; this "
480
+ "may be a possible attack");
481
goto out;
482
- case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
483
+ case SSH_SERVER_FOUND_OTHER:
484
ret = -EINVAL;
485
- session_error_setg(errp, s, "no host key was found in known_hosts");
486
+ error_setg(errp,
487
+ "host key for this server not found, another type exists");
488
+ goto out;
489
+ case SSH_SERVER_FILE_NOT_FOUND:
490
+ ret = -ENOENT;
491
+ error_setg(errp, "known_hosts file not found");
492
goto out;
493
- case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
494
+ case SSH_SERVER_NOT_KNOWN:
495
ret = -EINVAL;
496
- session_error_setg(errp, s,
497
- "failure matching the host key with known_hosts");
498
+ error_setg(errp, "no host key was found in known_hosts");
499
+ goto out;
500
+ case SSH_SERVER_ERROR:
501
+ ret = -EINVAL;
502
+ error_setg(errp, "server error");
503
goto out;
504
default:
505
ret = -EINVAL;
506
- session_error_setg(errp, s, "unknown error matching the host key"
507
- " with known_hosts (%d)", r);
508
+ error_setg(errp, "error while checking for known server (%d)", state);
509
goto out;
510
}
511
+#endif /* !HAVE_LIBSSH_0_8 */
512
513
/* known_hosts checking successful. */
514
ret = 0;
515
516
out:
517
- if (knh != NULL) {
518
- libssh2_knownhost_free(knh);
519
- }
520
- g_free(knh_file);
521
return ret;
522
}
523
524
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
525
526
static int
527
check_host_key_hash(BDRVSSHState *s, const char *hash,
528
- int hash_type, size_t fingerprint_len, Error **errp)
529
+ enum ssh_publickey_hash_type type, Error **errp)
530
{
531
- const char *fingerprint;
532
-
533
- fingerprint = libssh2_hostkey_hash(s->session, hash_type);
534
- if (!fingerprint) {
535
+ int r;
536
+ ssh_key pubkey;
537
+ unsigned char *server_hash;
538
+ size_t server_hash_len;
539
+
540
+#ifdef HAVE_LIBSSH_0_8
541
+ r = ssh_get_server_publickey(s->session, &pubkey);
542
+#else
543
+ r = ssh_get_publickey(s->session, &pubkey);
544
+#endif
545
+ if (r != SSH_OK) {
546
session_error_setg(errp, s, "failed to read remote host key");
547
return -EINVAL;
548
}
549
550
- if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
551
- hash) != 0) {
552
+ r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
553
+ ssh_key_free(pubkey);
554
+ if (r != 0) {
555
+ session_error_setg(errp, s,
556
+ "failed reading the hash of the server SSH key");
557
+ return -EINVAL;
558
+ }
559
+
560
+ r = compare_fingerprint(server_hash, server_hash_len, hash);
561
+ ssh_clean_pubkey_hash(&server_hash);
562
+ if (r != 0) {
563
error_setg(errp, "remote host key does not match host_key_check '%s'",
564
hash);
565
return -EPERM;
566
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
567
return 0;
568
}
569
570
-static int check_host_key(BDRVSSHState *s, const char *host, int port,
571
- SshHostKeyCheck *hkc, Error **errp)
572
+static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
573
{
574
SshHostKeyCheckMode mode;
575
576
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
577
case SSH_HOST_KEY_CHECK_MODE_HASH:
578
if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
579
return check_host_key_hash(s, hkc->u.hash.hash,
580
- LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
581
+ SSH_PUBLICKEY_HASH_MD5, errp);
582
} else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
583
return check_host_key_hash(s, hkc->u.hash.hash,
584
- LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
585
+ SSH_PUBLICKEY_HASH_SHA1, errp);
586
}
587
g_assert_not_reached();
588
break;
589
case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
590
- return check_host_key_knownhosts(s, host, port, errp);
591
+ return check_host_key_knownhosts(s, errp);
592
default:
593
g_assert_not_reached();
594
}
595
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
596
return -EINVAL;
597
}
598
599
-static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
600
+static int authenticate(BDRVSSHState *s, Error **errp)
601
{
602
int r, ret;
603
- const char *userauthlist;
604
- LIBSSH2_AGENT *agent = NULL;
605
- struct libssh2_agent_publickey *identity;
606
- struct libssh2_agent_publickey *prev_identity = NULL;
607
+ int method;
608
609
- userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
610
- if (strstr(userauthlist, "publickey") == NULL) {
611
+ /* Try to authenticate with the "none" method. */
612
+ r = ssh_userauth_none(s->session, NULL);
613
+ if (r == SSH_AUTH_ERROR) {
614
ret = -EPERM;
615
- error_setg(errp,
616
- "remote server does not support \"publickey\" authentication");
617
+ session_error_setg(errp, s, "failed to authenticate using none "
618
+ "authentication");
619
goto out;
620
- }
621
-
622
- /* Connect to ssh-agent and try each identity in turn. */
623
- agent = libssh2_agent_init(s->session);
624
- if (!agent) {
625
- ret = -EINVAL;
626
- session_error_setg(errp, s, "failed to initialize ssh-agent support");
627
- goto out;
628
- }
629
- if (libssh2_agent_connect(agent)) {
630
- ret = -ECONNREFUSED;
631
- session_error_setg(errp, s, "failed to connect to ssh-agent");
632
- goto out;
633
- }
634
- if (libssh2_agent_list_identities(agent)) {
635
- ret = -EINVAL;
636
- session_error_setg(errp, s,
637
- "failed requesting identities from ssh-agent");
638
+ } else if (r == SSH_AUTH_SUCCESS) {
639
+ /* Authenticated! */
640
+ ret = 0;
641
goto out;
642
}
643
644
- for(;;) {
645
- r = libssh2_agent_get_identity(agent, &identity, prev_identity);
646
- if (r == 1) { /* end of list */
647
- break;
648
- }
649
- if (r < 0) {
650
+ method = ssh_userauth_list(s->session, NULL);
651
+ trace_ssh_auth_methods(method);
652
+
653
+ /*
654
+ * Try to authenticate with publickey, using the ssh-agent
655
+ * if available.
656
+ */
657
+ if (method & SSH_AUTH_METHOD_PUBLICKEY) {
658
+ r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
659
+ if (r == SSH_AUTH_ERROR) {
660
ret = -EINVAL;
661
- session_error_setg(errp, s,
662
- "failed to obtain identity from ssh-agent");
663
+ session_error_setg(errp, s, "failed to authenticate using "
664
+ "publickey authentication");
665
goto out;
666
- }
667
- r = libssh2_agent_userauth(agent, user, identity);
668
- if (r == 0) {
669
+ } else if (r == SSH_AUTH_SUCCESS) {
670
/* Authenticated! */
671
ret = 0;
672
goto out;
673
}
674
- /* Failed to authenticate with this identity, try the next one. */
675
- prev_identity = identity;
676
}
677
678
ret = -EPERM;
679
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
680
"and the identities held by your ssh-agent");
681
682
out:
683
- if (agent != NULL) {
684
- /* Note: libssh2 implementation implicitly calls
685
- * libssh2_agent_disconnect if necessary.
686
- */
687
- libssh2_agent_free(agent);
688
- }
689
-
690
return ret;
691
}
692
693
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
694
int ssh_flags, int creat_mode, Error **errp)
695
{
696
int r, ret;
697
- long port = 0;
698
+ unsigned int port = 0;
699
+ int new_sock = -1;
700
701
if (opts->has_user) {
702
s->user = g_strdup(opts->user);
703
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
704
s->inet = opts->server;
705
opts->server = NULL;
706
707
- if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
708
+ if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
709
error_setg(errp, "Use only numeric port value");
710
ret = -EINVAL;
711
goto err;
712
}
713
714
/* Open the socket and connect. */
715
- s->sock = inet_connect_saddr(s->inet, errp);
716
- if (s->sock < 0) {
717
+ new_sock = inet_connect_saddr(s->inet, errp);
718
+ if (new_sock < 0) {
719
ret = -EIO;
720
goto err;
721
}
722
723
+ /*
724
+ * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
725
+ * but do not fail if it cannot be disabled.
726
+ */
727
+ r = socket_set_nodelay(new_sock);
728
+ if (r < 0) {
729
+ warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
730
+ s->inet->host, strerror(errno));
731
+ }
732
+
733
/* Create SSH session. */
734
- s->session = libssh2_session_init();
735
+ s->session = ssh_new();
736
if (!s->session) {
737
ret = -EINVAL;
738
- session_error_setg(errp, s, "failed to initialize libssh2 session");
739
+ session_error_setg(errp, s, "failed to initialize libssh session");
740
goto err;
741
}
742
743
-#if TRACE_LIBSSH2 != 0
744
- libssh2_trace(s->session, TRACE_LIBSSH2);
745
-#endif
746
+ /*
747
+ * Make sure we are in blocking mode during the connection and
748
+ * authentication phases.
749
+ */
750
+ ssh_set_blocking(s->session, 1);
751
752
- r = libssh2_session_handshake(s->session, s->sock);
753
- if (r != 0) {
754
+ r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
755
+ if (r < 0) {
756
+ ret = -EINVAL;
757
+ session_error_setg(errp, s,
758
+ "failed to set the user in the libssh session");
759
+ goto err;
760
+ }
761
+
762
+ r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
763
+ if (r < 0) {
764
+ ret = -EINVAL;
765
+ session_error_setg(errp, s,
766
+ "failed to set the host in the libssh session");
767
+ goto err;
768
+ }
769
+
770
+ if (port > 0) {
771
+ r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
772
+ if (r < 0) {
773
+ ret = -EINVAL;
774
+ session_error_setg(errp, s,
775
+ "failed to set the port in the libssh session");
776
+ goto err;
256
+ }
777
+ }
257
+ }
778
+ }
258
+
779
+
259
+Here, "node-A" is the name QEMU internally uses to refer to the base
780
+ r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
260
+image [A] -- it is the backing file, based on which the overlay image,
781
+ if (r < 0) {
261
+[B], is created.
782
+ ret = -EINVAL;
262
+
783
+ session_error_setg(errp, s,
263
+To create the rest of the overlay images, [C], and [D] (omitting the raw
784
+ "failed to disable the compression in the libssh "
264
+JSON output for brevity)::
785
+ "session");
265
+
786
+ goto err;
266
+ (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
787
+ }
267
+ (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
788
+
268
+
789
+ /* Read ~/.ssh/config. */
269
+
790
+ r = ssh_options_parse_config(s->session, NULL);
270
+A note on points-in-time vs file names
791
+ if (r < 0) {
271
+--------------------------------------
792
+ ret = -EINVAL;
272
+
793
+ session_error_setg(errp, s, "failed to parse ~/.ssh/config");
273
+In our disk image chain::
794
+ goto err;
274
+
795
+ }
275
+ [A] <-- [B] <-- [C] <-- [D]
796
+
276
+
797
+ r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
277
+We have *three* points in time and an active layer:
798
+ if (r < 0) {
278
+
799
+ ret = -EINVAL;
279
+- Point 1: Guest state when [B] was created is contained in file [A]
800
+ session_error_setg(errp, s,
280
+- Point 2: Guest state when [C] was created is contained in [A] + [B]
801
+ "failed to set the socket in the libssh session");
281
+- Point 3: Guest state when [D] was created is contained in
802
+ goto err;
282
+ [A] + [B] + [C]
803
+ }
283
+- Active layer: Current guest state is contained in [A] + [B] + [C] +
804
+ /* libssh took ownership of the socket. */
284
+ [D]
805
+ s->sock = new_sock;
285
+
806
+ new_sock = -1;
286
+Therefore, be aware with naming choices:
807
+
287
+
808
+ /* Connect. */
288
+- Naming a file after the time it is created is misleading -- the
809
+ r = ssh_connect(s->session);
289
+ guest data for that point in time is *not* contained in that file
810
+ if (r != SSH_OK) {
290
+ (as explained earlier)
811
ret = -EINVAL;
291
+- Rather, think of files as a *delta* from the backing file
812
session_error_setg(errp, s, "failed to establish SSH session");
292
+
813
goto err;
293
+
814
}
294
+Live block streaming --- ``block-stream``
815
295
+-----------------------------------------
816
/* Check the remote host's key against known_hosts. */
296
+
817
- ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
297
+The ``block-stream`` command allows you to do live copy data from backing
818
+ ret = check_host_key(s, opts->host_key_check, errp);
298
+files into overlay images.
819
if (ret < 0) {
299
+
820
goto err;
300
+Given our original example disk image chain from earlier::
821
}
301
+
822
302
+ [A] <-- [B] <-- [C] <-- [D]
823
/* Authenticate. */
303
+
824
- ret = authenticate(s, s->user, errp);
304
+The disk image chain can be shortened in one of the following different
825
+ ret = authenticate(s, errp);
305
+ways (not an exhaustive list).
826
if (ret < 0) {
306
+
827
goto err;
307
+.. _`Case-1`:
828
}
308
+
829
309
+(1) Merge everything into the active layer: I.e. copy all contents from
830
/* Start SFTP. */
310
+ the base image, [A], and overlay images, [B] and [C], into [D],
831
- s->sftp = libssh2_sftp_init(s->session);
311
+ *while* the guest is running. The resulting chain will be a
832
+ s->sftp = sftp_new(s->session);
312
+ standalone image, [D] -- with contents from [A], [B] and [C] merged
833
if (!s->sftp) {
313
+ into it (where live QEMU writes go to)::
834
- session_error_setg(errp, s, "failed to initialize sftp handle");
314
+
835
+ session_error_setg(errp, s, "failed to create sftp handle");
315
+ [D]
836
+ ret = -EINVAL;
316
+
837
+ goto err;
317
+.. _`Case-2`:
838
+ }
318
+
839
+
319
+(2) Taking the same example disk image chain mentioned earlier, merge
840
+ r = sftp_init(s->sftp);
320
+ only images [B] and [C] into [D], the active layer. The result will
841
+ if (r < 0) {
321
+ be contents of images [B] and [C] will be copied into [D], and the
842
+ sftp_error_setg(errp, s, "failed to initialize sftp handle");
322
+ backing file pointer of image [D] will be adjusted to point to image
843
ret = -EINVAL;
323
+ [A]. The resulting chain will be::
844
goto err;
324
+
845
}
325
+ [A] <-- [D]
846
326
+
847
/* Open the remote file. */
327
+.. _`Case-3`:
848
trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
328
+
849
- s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
329
+(3) Intermediate streaming (available since QEMU 2.8): Starting afresh
850
- creat_mode);
330
+ with the original example disk image chain, with a total of four
851
+ s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
331
+ images, it is possible to copy contents from image [B] into image
852
if (!s->sftp_handle) {
332
+ [C]. Once the copy is finished, image [B] can now be (optionally)
853
- session_error_setg(errp, s, "failed to open remote file '%s'",
333
+ discarded; and the backing file pointer of image [C] will be
854
- opts->path);
334
+ adjusted to point to [A]. I.e. after performing "intermediate
855
+ sftp_error_setg(errp, s, "failed to open remote file '%s'",
335
+ streaming" of [B] into [C], the resulting image chain will be (where
856
+ opts->path);
336
+ live QEMU is writing to [D])::
857
ret = -EINVAL;
337
+
858
goto err;
338
+ [A] <-- [C] <-- [D]
859
}
339
+
860
340
+
861
- r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
341
+QMP invocation for ``block-stream``
862
- if (r < 0) {
342
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
863
+ /* Make sure the SFTP file is handled in blocking mode. */
343
+
864
+ sftp_file_set_blocking(s->sftp_handle);
344
+For `Case-1`_, to merge contents of all the backing files into the
865
+
345
+active layer, where 'node-D' is the current active image (by default
866
+ s->attrs = sftp_fstat(s->sftp_handle);
346
+``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its
867
+ if (!s->attrs) {
347
+corresponding JSON output)::
868
sftp_error_setg(errp, s, "failed to read file attributes");
348
+
869
return -EINVAL;
349
+ (QEMU) block-stream device=node-D job-id=job0
870
}
350
+ {
871
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
351
+ "execute": "block-stream",
872
return 0;
352
+ "arguments": {
873
353
+ "device": "node-D",
874
err:
354
+ "job-id": "job0"
875
+ if (s->attrs) {
876
+ sftp_attributes_free(s->attrs);
877
+ }
878
+ s->attrs = NULL;
879
if (s->sftp_handle) {
880
- libssh2_sftp_close(s->sftp_handle);
881
+ sftp_close(s->sftp_handle);
882
}
883
s->sftp_handle = NULL;
884
if (s->sftp) {
885
- libssh2_sftp_shutdown(s->sftp);
886
+ sftp_free(s->sftp);
887
}
888
s->sftp = NULL;
889
if (s->session) {
890
- libssh2_session_disconnect(s->session,
891
- "from qemu ssh client: "
892
- "error opening connection");
893
- libssh2_session_free(s->session);
894
+ ssh_disconnect(s->session);
895
+ ssh_free(s->session);
896
}
897
s->session = NULL;
898
+ s->sock = -1;
899
+ if (new_sock >= 0) {
900
+ close(new_sock);
901
+ }
902
903
return ret;
904
}
905
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
906
907
ssh_state_init(s);
908
909
- ssh_flags = LIBSSH2_FXF_READ;
910
+ ssh_flags = 0;
911
if (bdrv_flags & BDRV_O_RDWR) {
912
- ssh_flags |= LIBSSH2_FXF_WRITE;
913
+ ssh_flags |= O_RDWR;
914
+ } else {
915
+ ssh_flags |= O_RDONLY;
916
}
917
918
opts = ssh_parse_options(options, errp);
919
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
920
}
921
922
/* Go non-blocking. */
923
- libssh2_session_set_blocking(s->session, 0);
924
+ ssh_set_blocking(s->session, 0);
925
926
qapi_free_BlockdevOptionsSsh(opts);
927
928
return 0;
929
930
err:
931
- if (s->sock >= 0) {
932
- close(s->sock);
933
- }
934
- s->sock = -1;
935
-
936
qapi_free_BlockdevOptionsSsh(opts);
937
938
return ret;
939
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
940
{
941
ssize_t ret;
942
char c[1] = { '\0' };
943
- int was_blocking = libssh2_session_get_blocking(s->session);
944
+ int was_blocking = ssh_is_blocking(s->session);
945
946
/* offset must be strictly greater than the current size so we do
947
* not overwrite anything */
948
- assert(offset > 0 && offset > s->attrs.filesize);
949
+ assert(offset > 0 && offset > s->attrs->size);
950
951
- libssh2_session_set_blocking(s->session, 1);
952
+ ssh_set_blocking(s->session, 1);
953
954
- libssh2_sftp_seek64(s->sftp_handle, offset - 1);
955
- ret = libssh2_sftp_write(s->sftp_handle, c, 1);
956
+ sftp_seek64(s->sftp_handle, offset - 1);
957
+ ret = sftp_write(s->sftp_handle, c, 1);
958
959
- libssh2_session_set_blocking(s->session, was_blocking);
960
+ ssh_set_blocking(s->session, was_blocking);
961
962
if (ret < 0) {
963
sftp_error_setg(errp, s, "Failed to grow file");
964
return -EIO;
965
}
966
967
- s->attrs.filesize = offset;
968
+ s->attrs->size = offset;
969
return 0;
970
}
971
972
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
973
ssh_state_init(&s);
974
975
ret = connect_to_ssh(&s, opts->location,
976
- LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
977
- LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
978
+ O_RDWR | O_CREAT | O_TRUNC,
979
0644, errp);
980
if (ret < 0) {
981
goto fail;
982
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
983
/* Assume false, unless we can positively prove it's true. */
984
int has_zero_init = 0;
985
986
- if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
987
- if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
988
- has_zero_init = 1;
989
- }
990
+ if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
991
+ has_zero_init = 1;
992
}
993
994
return has_zero_init;
995
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
996
.co = qemu_coroutine_self()
997
};
998
999
- r = libssh2_session_block_directions(s->session);
1000
+ r = ssh_get_poll_flags(s->session);
1001
1002
- if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
1003
+ if (r & SSH_READ_PENDING) {
1004
rd_handler = restart_coroutine;
1005
}
1006
- if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
1007
+ if (r & SSH_WRITE_PENDING) {
1008
wr_handler = restart_coroutine;
1009
}
1010
1011
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
1012
trace_ssh_co_yield_back(s->sock);
1013
}
1014
1015
-/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
1016
- * in the remote file. Notice that it just updates a field in the
1017
- * sftp_handle structure, so there is no network traffic and it cannot
1018
- * fail.
1019
- *
1020
- * However, `libssh2_sftp_seek64' does have a catastrophic effect on
1021
- * performance since it causes the handle to throw away all in-flight
1022
- * reads and buffered readahead data. Therefore this function tries
1023
- * to be intelligent about when to call the underlying libssh2 function.
1024
- */
1025
-#define SSH_SEEK_WRITE 0
1026
-#define SSH_SEEK_READ 1
1027
-#define SSH_SEEK_FORCE 2
1028
-
1029
-static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
1030
-{
1031
- bool op_read = (flags & SSH_SEEK_READ) != 0;
1032
- bool force = (flags & SSH_SEEK_FORCE) != 0;
1033
-
1034
- if (force || op_read != s->offset_op_read || offset != s->offset) {
1035
- trace_ssh_seek(offset);
1036
- libssh2_sftp_seek64(s->sftp_handle, offset);
1037
- s->offset = offset;
1038
- s->offset_op_read = op_read;
1039
- }
1040
-}
1041
-
1042
static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1043
int64_t offset, size_t size,
1044
QEMUIOVector *qiov)
1045
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1046
1047
trace_ssh_read(offset, size);
1048
1049
- ssh_seek(s, offset, SSH_SEEK_READ);
1050
+ trace_ssh_seek(offset);
1051
+ sftp_seek64(s->sftp_handle, offset);
1052
1053
/* This keeps track of the current iovec element ('i'), where we
1054
* will write to next ('buf'), and the end of the current iovec
1055
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
1056
buf = i->iov_base;
1057
end_of_vec = i->iov_base + i->iov_len;
1058
1059
- /* libssh2 has a hard-coded limit of 2000 bytes per request,
1060
- * although it will also do readahead behind our backs. Therefore
1061
- * we may have to do repeated reads here until we have read 'size'
1062
- * bytes.
1063
- */
1064
for (got = 0; got < size; ) {
1065
+ size_t request_read_size;
1066
again:
1067
- trace_ssh_read_buf(buf, end_of_vec - buf);
1068
- r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
1069
- trace_ssh_read_return(r);
1070
+ /*
1071
+ * The size of SFTP packets is limited to 32K bytes, so limit
1072
+ * the amount of data requested to 16K, as libssh currently
1073
+ * does not handle multiple requests on its own.
1074
+ */
1075
+ request_read_size = MIN(end_of_vec - buf, 16384);
1076
+ trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
1077
+ r = sftp_read(s->sftp_handle, buf, request_read_size);
1078
+ trace_ssh_read_return(r, sftp_get_error(s->sftp));
1079
1080
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1081
+ if (r == SSH_AGAIN) {
1082
co_yield(s, bs);
1083
goto again;
1084
}
1085
- if (r < 0) {
1086
- sftp_error_trace(s, "read");
1087
- s->offset = -1;
1088
- return -EIO;
1089
- }
1090
- if (r == 0) {
1091
+ if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
1092
/* EOF: Short read so pad the buffer with zeroes and return it. */
1093
qemu_iovec_memset(qiov, got, 0, size - got);
1094
return 0;
1095
}
1096
+ if (r <= 0) {
1097
+ sftp_error_trace(s, "read");
1098
+ return -EIO;
355
+ }
1099
+ }
1100
1101
got += r;
1102
buf += r;
1103
- s->offset += r;
1104
if (buf >= end_of_vec && got < size) {
1105
i++;
1106
buf = i->iov_base;
1107
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1108
1109
trace_ssh_write(offset, size);
1110
1111
- ssh_seek(s, offset, SSH_SEEK_WRITE);
1112
+ trace_ssh_seek(offset);
1113
+ sftp_seek64(s->sftp_handle, offset);
1114
1115
/* This keeps track of the current iovec element ('i'), where we
1116
* will read from next ('buf'), and the end of the current iovec
1117
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
1118
end_of_vec = i->iov_base + i->iov_len;
1119
1120
for (written = 0; written < size; ) {
1121
+ size_t request_write_size;
1122
again:
1123
- trace_ssh_write_buf(buf, end_of_vec - buf);
1124
- r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
1125
- trace_ssh_write_return(r);
1126
+ /*
1127
+ * Avoid too large data packets, as libssh currently does not
1128
+ * handle multiple requests on its own.
1129
+ */
1130
+ request_write_size = MIN(end_of_vec - buf, 131072);
1131
+ trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
1132
+ r = sftp_write(s->sftp_handle, buf, request_write_size);
1133
+ trace_ssh_write_return(r, sftp_get_error(s->sftp));
1134
1135
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
1136
+ if (r == SSH_AGAIN) {
1137
co_yield(s, bs);
1138
goto again;
1139
}
1140
if (r < 0) {
1141
sftp_error_trace(s, "write");
1142
- s->offset = -1;
1143
return -EIO;
1144
}
1145
- /* The libssh2 API is very unclear about this. A comment in
1146
- * the code says "nothing was acked, and no EAGAIN was
1147
- * received!" which apparently means that no data got sent
1148
- * out, and the underlying channel didn't return any EAGAIN
1149
- * indication. I think this is a bug in either libssh2 or
1150
- * OpenSSH (server-side). In any case, forcing a seek (to
1151
- * discard libssh2 internal buffers), and then trying again
1152
- * works for me.
1153
- */
1154
- if (r == 0) {
1155
- ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
1156
- co_yield(s, bs);
1157
- goto again;
1158
- }
1159
1160
written += r;
1161
buf += r;
1162
- s->offset += r;
1163
if (buf >= end_of_vec && written < size) {
1164
i++;
1165
buf = i->iov_base;
1166
end_of_vec = i->iov_base + i->iov_len;
1167
}
1168
1169
- if (offset + written > s->attrs.filesize)
1170
- s->attrs.filesize = offset + written;
1171
+ if (offset + written > s->attrs->size) {
1172
+ s->attrs->size = offset + written;
1173
+ }
1174
}
1175
1176
return 0;
1177
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
1178
}
1179
}
1180
1181
-#ifdef HAS_LIBSSH2_SFTP_FSYNC
1182
+#ifdef HAVE_LIBSSH_0_8
1183
1184
static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
1185
{
1186
int r;
1187
1188
trace_ssh_flush();
1189
+
1190
+ if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
1191
+ unsafe_flush_warning(s, "OpenSSH >= 6.3");
1192
+ return 0;
356
+ }
1193
+ }
357
+
1194
again:
358
+For `Case-2`_, merge contents of the images [B] and [C] into [D], where
1195
- r = libssh2_sftp_fsync(s->sftp_handle);
359
+image [D] ends up referring to image [A] as its backing file::
1196
- if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
360
+
1197
+ r = sftp_fsync(s->sftp_handle);
361
+ (QEMU) block-stream device=node-D base-node=node-A job-id=job0
1198
+ if (r == SSH_AGAIN) {
362
+
1199
co_yield(s, bs);
363
+And for `Case-3`_, of "intermediate" streaming", merge contents of
1200
goto again;
364
+images [B] into [C], where [C] ends up referring to [A] as its backing
1201
}
365
+image::
1202
- if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
366
+
1203
- libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
367
+ (QEMU) block-stream device=node-C base-node=node-A job-id=job0
1204
- unsafe_flush_warning(s, "OpenSSH >= 6.3");
368
+
1205
- return 0;
369
+Progress of a ``block-stream`` operation can be monitored via the QMP
1206
- }
370
+command::
1207
if (r < 0) {
371
+
1208
sftp_error_trace(s, "fsync");
372
+ (QEMU) query-block-jobs
1209
return -EIO;
373
+ {
1210
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
374
+ "execute": "query-block-jobs",
1211
return ret;
375
+ "arguments": {}
1212
}
376
+ }
1213
377
+
1214
-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
378
+
1215
+#else /* !HAVE_LIBSSH_0_8 */
379
+Once the ``block-stream`` operation has completed, QEMU will emit an
1216
380
+event, ``BLOCK_JOB_COMPLETED``. The intermediate overlays remain valid,
1217
static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
381
+and can now be (optionally) discarded, or retained to create further
1218
{
382
+overlays based on them. Finally, the ``block-stream`` jobs can be
1219
BDRVSSHState *s = bs->opaque;
383
+restarted at anytime.
1220
384
+
1221
- unsafe_flush_warning(s, "libssh2 >= 1.4.4");
385
+
1222
+ unsafe_flush_warning(s, "libssh >= 0.8.0");
386
+Live block commit --- ``block-commit``
1223
return 0;
387
+--------------------------------------
1224
}
388
+
1225
389
+The ``block-commit`` command lets you merge live data from overlay
1226
-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
390
+images into backing file(s). Since QEMU 2.0, this includes "live active
1227
+#endif /* !HAVE_LIBSSH_0_8 */
391
+commit" (i.e. it is possible to merge the "active layer", the right-most
1228
392
+image in a disk image chain where live QEMU will be writing to, into the
1229
static int64_t ssh_getlength(BlockDriverState *bs)
393
+base image). This is analogous to ``block-stream``, but in the opposite
1230
{
394
+direction.
1231
BDRVSSHState *s = bs->opaque;
395
+
1232
int64_t length;
396
+Again, starting afresh with our example disk image chain, where live
1233
397
+QEMU is writing to the right-most image in the chain, [D]::
1234
- /* Note we cannot make a libssh2 call here. */
398
+
1235
- length = (int64_t) s->attrs.filesize;
399
+ [A] <-- [B] <-- [C] <-- [D]
1236
+ /* Note we cannot make a libssh call here. */
400
+
1237
+ length = (int64_t) s->attrs->size;
401
+The disk image chain can be shortened in one of the following ways:
1238
trace_ssh_getlength(length);
402
+
1239
403
+.. _`block-commit_Case-1`:
1240
return length;
404
+
1241
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
405
+(1) Commit content from only image [B] into image [A]. The resulting
1242
return -ENOTSUP;
406
+ chain is the following, where image [C] is adjusted to point at [A]
1243
}
407
+ as its new backing file::
1244
408
+
1245
- if (offset < s->attrs.filesize) {
409
+ [A] <-- [C] <-- [D]
1246
+ if (offset < s->attrs->size) {
410
+
1247
error_setg(errp, "ssh driver does not support shrinking files");
411
+(2) Commit content from images [B] and [C] into image [A]. The
1248
return -ENOTSUP;
412
+ resulting chain, where image [D] is adjusted to point to image [A]
1249
}
413
+ as its new backing file::
1250
414
+
1251
- if (offset == s->attrs.filesize) {
415
+ [A] <-- [D]
1252
+ if (offset == s->attrs->size) {
416
+
1253
return 0;
417
+.. _`block-commit_Case-3`:
1254
}
418
+
1255
419
+(3) Commit content from images [B], [C], and the active layer [D] into
1256
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
420
+ image [A]. The resulting chain (in this case, a consolidated single
1257
{
421
+ image)::
1258
int r;
422
+
1259
423
+ [A]
1260
- r = libssh2_init(0);
424
+
1261
+ r = ssh_init();
425
+(4) Commit content from image only image [C] into image [B]. The
1262
if (r != 0) {
426
+ resulting chain::
1263
- fprintf(stderr, "libssh2 initialization failed, %d\n", r);
427
+
1264
+ fprintf(stderr, "libssh initialization failed, %d\n", r);
428
+    [A] <-- [B] <-- [D]
1265
exit(EXIT_FAILURE);
429
+
1266
}
430
+(5) Commit content from image [C] and the active layer [D] into image
1267
431
+ [B]. The resulting chain::
1268
+#if TRACE_LIBSSH != 0
432
+
1269
+ ssh_set_log_level(TRACE_LIBSSH);
433
+    [A] <-- [B]
1270
+#endif
434
+
1271
+
435
+
1272
bdrv_register(&bdrv_ssh);
436
+QMP invocation for ``block-commit``
1273
}
437
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1274
438
+
1275
diff --git a/.travis.yml b/.travis.yml
439
+For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
1276
index XXXXXXX..XXXXXXX 100644
440
+image [B] into image [A], the invocation is as follows::
1277
--- a/.travis.yml
441
+
1278
+++ b/.travis.yml
442
+ (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
1279
@@ -XXX,XX +XXX,XX @@ addons:
443
+ {
1280
- libseccomp-dev
444
+ "execute": "block-commit",
1281
- libspice-protocol-dev
445
+ "arguments": {
1282
- libspice-server-dev
446
+ "device": "node-D",
1283
- - libssh2-1-dev
447
+ "job-id": "job0",
1284
+ - libssh-dev
448
+ "top": "b.qcow2",
1285
- liburcu-dev
449
+ "base": "a.qcow2"
1286
- libusb-1.0-0-dev
450
+ }
1287
- libvte-2.91-dev
451
+ }
1288
@@ -XXX,XX +XXX,XX @@ matrix:
452
+
1289
- libseccomp-dev
453
+Once the above ``block-commit`` operation has completed, a
1290
- libspice-protocol-dev
454
+``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
1291
- libspice-server-dev
455
+required. As the end result, the backing file of image [C] is adjusted
1292
- - libssh2-1-dev
456
+to point to image [A], and the original 4-image chain will end up being
1293
+ - libssh-dev
457
+transformed to::
1294
- liburcu-dev
458
+
1295
- libusb-1.0-0-dev
459
+ [A] <-- [C] <-- [D]
1296
- libvte-2.91-dev
460
+
1297
diff --git a/block/trace-events b/block/trace-events
461
+.. note::
1298
index XXXXXXX..XXXXXXX 100644
462
+ The intermediate image [B] is invalid (as in: no more further
1299
--- a/block/trace-events
463
+ overlays based on it can be created).
1300
+++ b/block/trace-events
464
+
1301
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
465
+ Reasoning: An intermediate image after a 'stream' operation still
1302
# ssh.c
466
+ represents that old point-in-time, and may be valid in that context.
1303
ssh_restart_coroutine(void *co) "co=%p"
467
+ However, an intermediate image after a 'commit' operation no longer
1304
ssh_flush(void) "fsync"
468
+ represents any point-in-time, and is invalid in any context.
1305
-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
469
+
1306
+ssh_check_host_key_knownhosts(void) "host key OK"
470
+
1307
ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
471
+However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active
1308
ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
472
+``block-commit``") is a *two-phase* operation: In the first phase, the
1309
ssh_co_yield_back(int sock) "s->sock=%d - back"
473
+content from the active overlay, along with the intermediate overlays,
1310
ssh_getlength(int64_t length) "length=%" PRIi64
474
+is copied into the backing file (also called the base image). In the
1311
ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
475
+second phase, adjust the said backing file as the current active image
1312
ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
476
+-- possible via issuing the command ``block-job-complete``. Optionally,
1313
-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
477
+the ``block-commit`` operation can be cancelled by issuing the command
1314
-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
478
+``block-job-cancel``, but be careful when doing this.
1315
+ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
479
+
1316
+ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
480
+Once the ``block-commit`` operation has completed, the event
1317
ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
481
+``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization
1318
-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
482
+has finished. Now the job can be gracefully completed by issuing the
1319
-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
483
+command ``block-job-complete`` -- until such a command is issued, the
1320
+ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
484
+'commit' operation remains active.
1321
+ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
485
+
1322
ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
486
+The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to
1323
+ssh_auth_methods(int methods) "auth methods=0x%x"
487
+convert a disk image chain such as this::
1324
+ssh_server_status(int status) "server status=%d"
488
+
1325
489
+ [A] <-- [B] <-- [C] <-- [D]
1326
# curl.c
490
+
1327
curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
491
+Into::
1328
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
492
+
1329
sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
493
+ [A]
1330
494
+
1331
# ssh.c
495
+Where content from all the subsequent overlays, [B], and [C], including
1332
-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
496
+the active layer, [D], is committed back to [A] -- which is where live
1333
+sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
497
+QEMU is performing all its current writes).
1334
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
498
+
1335
index XXXXXXX..XXXXXXX 100644
499
+Start the "active ``block-commit``" operation::
1336
--- a/docs/qemu-block-drivers.texi
500
+
1337
+++ b/docs/qemu-block-drivers.texi
501
+ (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
1338
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
502
+ {
1339
503
+ "execute": "block-commit",
1340
warning: ssh server @code{ssh.example.com:22} does not support fsync
504
+ "arguments": {
1341
505
+ "device": "node-D",
1342
-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
506
+ "job-id": "job0",
1343
+With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
507
+ "top": "d.qcow2",
1344
supported.
508
+ "base": "a.qcow2"
1345
509
+ }
1346
@node disk_images_nvme
510
+ }
1347
diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
511
+
1348
index XXXXXXX..XXXXXXX 100644
512
+
1349
--- a/tests/docker/dockerfiles/debian-win32-cross.docker
513
+Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will
1350
+++ b/tests/docker/dockerfiles/debian-win32-cross.docker
514
+be emitted.
1351
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
515
+
1352
mxe-$TARGET-w64-mingw32.shared-curl \
516
+Then, optionally query for the status of the active block operations.
1353
mxe-$TARGET-w64-mingw32.shared-glib \
517
+We can see the 'commit' job is now ready to be completed, as indicated
1354
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
518
+by the line *"ready": true*::
1355
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
519
+
1356
mxe-$TARGET-w64-mingw32.shared-libusb1 \
520
+ (QEMU) query-block-jobs
1357
mxe-$TARGET-w64-mingw32.shared-lzo \
521
+ {
1358
mxe-$TARGET-w64-mingw32.shared-nettle \
522
+ "execute": "query-block-jobs",
1359
diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
523
+ "arguments": {}
1360
index XXXXXXX..XXXXXXX 100644
524
+ }
1361
--- a/tests/docker/dockerfiles/debian-win64-cross.docker
525
+ {
1362
+++ b/tests/docker/dockerfiles/debian-win64-cross.docker
526
+ "return": [
1363
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
527
+ {
1364
mxe-$TARGET-w64-mingw32.shared-curl \
528
+ "busy": false,
1365
mxe-$TARGET-w64-mingw32.shared-glib \
529
+ "type": "commit",
1366
mxe-$TARGET-w64-mingw32.shared-libgcrypt \
530
+ "len": 1376256,
1367
- mxe-$TARGET-w64-mingw32.shared-libssh2 \
531
+ "paused": false,
1368
mxe-$TARGET-w64-mingw32.shared-libusb1 \
532
+ "ready": true,
1369
mxe-$TARGET-w64-mingw32.shared-lzo \
533
+ "io-status": "ok",
1370
mxe-$TARGET-w64-mingw32.shared-nettle \
534
+ "offset": 1376256,
1371
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
535
+ "device": "job0",
1372
index XXXXXXX..XXXXXXX 100644
536
+ "speed": 0
1373
--- a/tests/docker/dockerfiles/fedora.docker
537
+ }
1374
+++ b/tests/docker/dockerfiles/fedora.docker
538
+ ]
1375
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
539
+ }
1376
libpng-devel \
540
+
1377
librbd-devel \
541
+Gracefully complete the 'commit' block device job::
1378
libseccomp-devel \
542
+
1379
- libssh2-devel \
543
+ (QEMU) block-job-complete device=job0
1380
+ libssh-devel \
544
+ {
1381
libubsan \
545
+ "execute": "block-job-complete",
1382
libusbx-devel \
546
+ "arguments": {
1383
libxml2-devel \
547
+ "device": "job0"
1384
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
548
+ }
1385
mingw32-gtk3 \
549
+ }
1386
mingw32-libjpeg-turbo \
550
+ {
1387
mingw32-libpng \
551
+ "return": {}
1388
- mingw32-libssh2 \
552
+ }
1389
mingw32-libtasn1 \
553
+
1390
mingw32-nettle \
554
+Finally, once the above job is completed, an event
1391
mingw32-pixman \
555
+``BLOCK_JOB_COMPLETED`` will be emitted.
1392
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
556
+
1393
mingw64-gtk3 \
557
+.. note::
1394
mingw64-libjpeg-turbo \
558
+ The invocation for rest of the cases (2, 4, and 5), discussed in the
1395
mingw64-libpng \
559
+ previous section, is omitted for brevity.
1396
- mingw64-libssh2 \
560
+
1397
mingw64-libtasn1 \
561
+
1398
mingw64-nettle \
562
+Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
1399
mingw64-pixman \
563
+----------------------------------------------------------------------
1400
diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
564
+
1401
index XXXXXXX..XXXXXXX 100644
565
+Synchronize a running disk image chain (all or part of it) to a target
1402
--- a/tests/docker/dockerfiles/ubuntu.docker
566
+image.
1403
+++ b/tests/docker/dockerfiles/ubuntu.docker
567
+
1404
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
568
+Again, given our familiar disk image chain::
1405
libsnappy-dev \
569
+
1406
libspice-protocol-dev \
570
+ [A] <-- [B] <-- [C] <-- [D]
1407
libspice-server-dev \
571
+
1408
- libssh2-1-dev \
572
+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
1409
+ libssh-dev \
573
+you to copy data from the entire chain into a single target image (which
1410
libusb-1.0-0-dev \
574
+can be located on a different host).
1411
libusbredirhost-dev \
575
+
1412
libvdeplug-dev \
576
+Once a 'mirror' job has started, there are two possible actions while a
1413
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
577
+``drive-mirror`` job is active:
1414
index XXXXXXX..XXXXXXX 100644
578
+
1415
--- a/tests/docker/dockerfiles/ubuntu1804.docker
579
+(1) Issuing the command ``block-job-cancel`` after it emits the event
1416
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
580
+ ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
1417
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
581
+ the content from the disk image chain to the target image, [E])
1418
libsnappy-dev \
582
+ create a point-in-time (which is at the time of *triggering* the
1419
libspice-protocol-dev \
583
+ cancel command) copy, contained in image [E], of the the entire disk
1420
libspice-server-dev \
584
+ image chain (or only the top-most image, depending on the ``sync``
1421
- libssh2-1-dev \
585
+ mode).
1422
+ libssh-dev \
586
+
1423
libusb-1.0-0-dev \
587
+(2) Issuing the command ``block-job-complete`` after it emits the event
1424
libusbredirhost-dev \
588
+ ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
1425
libvdeplug-dev \
589
+ the content, adjust the guest device (i.e. live QEMU) to point to
1426
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
590
+ the target image, and, causing all the new writes from this point on
1427
index XXXXXXX..XXXXXXX 100755
591
+ to happen there. One use case for this is live storage migration.
1428
--- a/tests/qemu-iotests/207
592
+
1429
+++ b/tests/qemu-iotests/207
593
+About synchronization modes: The synchronization mode determines
1430
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
594
+*which* part of the disk image chain will be copied to the target.
1431
595
+Currently, there are four different kinds:
1432
iotests.img_info_log(remote_path)
596
+
1433
597
+(1) ``full`` -- Synchronize the content of entire disk image chain to
1434
- md5_key = subprocess.check_output(
598
+ the target
1435
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
599
+
1436
- 'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
600
+(2) ``top`` -- Synchronize only the contents of the top-most disk image
1437
- shell=True).rstrip().decode('ascii')
601
+ in the chain to the target
1438
+ keys = subprocess.check_output(
602
+
1439
+ 'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
603
+(3) ``none`` -- Synchronize only the new writes from this point on.
1440
+ 'cut -d" " -f3',
604
+
1441
+ shell=True).rstrip().decode('ascii').split('\n')
605
+ .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
1442
+
606
+ the behavior of ``none`` synchronization mode is different.
1443
+ # Mappings of base64 representations to digests
607
+ Normally, a ``backup`` job consists of two parts: Anything
1444
+ md5_keys = {}
608
+ that is overwritten by the guest is first copied out to
1445
+ sha1_keys = {}
609
+ the backup, and in the background the whole image is
1446
+
610
+ copied from start to end. With ``sync=none``, it's only
1447
+ for key in keys:
611
+ the first part.
1448
+ md5_keys[key] = subprocess.check_output(
612
+
1449
+ 'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
613
+(4) ``incremental`` -- Synchronize content that is described by the
1450
+ shell=True).rstrip().decode('ascii')
614
+ dirty bitmap
1451
+
615
+
1452
+ sha1_keys[key] = subprocess.check_output(
616
+.. note::
1453
+ 'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
617
+ Refer to the :doc:`bitmaps` document in the QEMU source
1454
+ shell=True).rstrip().decode('ascii')
618
+ tree to learn about the detailed workings of the ``incremental``
1455
619
+ synchronization mode.
1456
vm.launch()
620
+
1457
+
621
+
1458
+ # Find correct key first
622
+QMP invocation for ``drive-mirror``
1459
+ matching_key = None
623
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1460
+ for key in keys:
624
+
1461
+ result = vm.qmp('blockdev-add',
625
+To copy the contents of the entire disk image chain, from [A] all the
1462
+ driver='ssh', node_name='node0', path=disk_path,
626
+way to [D], to a new target (``drive-mirror`` will create the destination
1463
+ server={
627
+file, if it doesn't already exist), call it [E]::
1464
+ 'host': '127.0.0.1',
628
+
1465
+ 'port': '22',
629
+ (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0
1466
+ }, host_key_check={
630
+ {
1467
+ 'mode': 'hash',
631
+ "execute": "drive-mirror",
1468
+ 'type': 'md5',
632
+ "arguments": {
1469
+ 'hash': md5_keys[key],
633
+ "device": "node-D",
1470
+ })
634
+ "job-id": "job0",
1471
+
635
+ "target": "e.qcow2",
1472
+ if 'error' not in result:
636
+ "sync": "full"
1473
+ vm.qmp('blockdev-del', node_name='node0')
637
+ }
1474
+ matching_key = key
638
+ }
1475
+ break
639
+
1476
+
640
+The ``"sync": "full"``, from the above, means: copy the *entire* chain
1477
+ if matching_key is None:
641
+to the destination.
1478
+ vm.shutdown()
642
+
1479
+ iotests.notrun('Did not find a key that fits 127.0.0.1')
643
+Following the above, querying for active block jobs will show that a
1480
+
644
+'mirror' job is "ready" to be completed (and QEMU will also emit an
1481
blockdev_create(vm, { 'driver': 'ssh',
645
+event, ``BLOCK_JOB_READY``)::
1482
'location': {
646
+
1483
'path': disk_path,
647
+ (QEMU) query-block-jobs
1484
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
648
+ {
1485
'host-key-check': {
649
+ "execute": "query-block-jobs",
1486
'mode': 'hash',
650
+ "arguments": {}
1487
'type': 'md5',
651
+ }
1488
- 'hash': md5_key,
652
+ {
1489
+ 'hash': md5_keys[matching_key],
653
+ "return": [
1490
}
654
+ {
1491
},
655
+ "busy": false,
1492
'size': 8388608 })
656
+ "type": "mirror",
1493
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
657
+ "len": 21757952,
1494
658
+ "paused": false,
1495
iotests.img_info_log(remote_path)
659
+ "ready": true,
1496
660
+ "io-status": "ok",
1497
- sha1_key = subprocess.check_output(
661
+ "offset": 21757952,
1498
- 'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
662
+ "device": "job0",
1499
- 'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
663
+ "speed": 0
1500
- shell=True).rstrip().decode('ascii')
664
+ }
665
+ ]
666
+ }
667
+
668
+And, as noted in the previous section, there are two possible actions
669
+at this point:
670
+
671
+(a) Create a point-in-time snapshot by ending the synchronization. The
672
+ point-in-time is at the time of *ending* the sync. (The result of
673
+ the following being: the target image, [E], will be populated with
674
+ content from the entire chain, [A] to [D])::
675
+
676
+ (QEMU) block-job-cancel device=job0
677
+ {
678
+ "execute": "block-job-cancel",
679
+ "arguments": {
680
+ "device": "job0"
681
+ }
682
+ }
683
+
684
+(b) Or, complete the operation and pivot the live QEMU to the target
685
+ copy::
686
+
687
+ (QEMU) block-job-complete device=job0
688
+
689
+In either of the above cases, if you once again run the
690
+`query-block-jobs` command, there should not be any active block
691
+operation.
692
+
693
+Comparing 'commit' and 'mirror': In both then cases, the overlay images
694
+can be discarded. However, with 'commit', the *existing* base image
695
+will be modified (by updating it with contents from overlays); while in
696
+the case of 'mirror', a *new* target image is populated with the data
697
+from the disk image chain.
698
+
699
+
700
+QMP invocation for live storage migration with ``drive-mirror`` + NBD
701
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
702
+
703
+Live storage migration (without shared storage setup) is one of the most
704
+common use-cases that takes advantage of the ``drive-mirror`` primitive
705
+and QEMU's built-in Network Block Device (NBD) server. Here's a quick
706
+walk-through of this setup.
707
+
708
+Given the disk image chain::
709
+
710
+ [A] <-- [B] <-- [C] <-- [D]
711
+
712
+Instead of copying content from the entire chain, synchronize *only* the
713
+contents of the *top*-most disk image (i.e. the active layer), [D], to a
714
+target, say, [TargetDisk].
715
+
716
+.. important::
717
+ The destination host must already have the contents of the backing
718
+ chain, involving images [A], [B], and [C], visible via other means
719
+ -- whether by ``cp``, ``rsync``, or by some storage array-specific
720
+ command.)
721
+
722
+Sometimes, this is also referred to as "shallow copy" -- because only
723
+the "active layer", and not the rest of the image chain, is copied to
724
+the destination.
725
+
726
+.. note::
727
+ In this example, for the sake of simplicity, we'll be using the same
728
+ ``localhost`` as both source and destination.
729
+
730
+As noted earlier, on the destination host the contents of the backing
731
+chain -- from images [A] to [C] -- are already expected to exist in some
732
+form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``). Now, on the
733
+destination host, let's create a target overlay image (with the image
734
+``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
735
+of image [D] (from the source QEMU) will be mirrored to::
736
+
737
+ $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
738
+ -F qcow2 ./target-disk.qcow2
739
+
740
+And start the destination QEMU (we already have the source QEMU running
741
+-- discussed in the section: `Interacting with a QEMU instance`_)
742
+instance, with the following invocation. (As noted earlier, for
743
+simplicity's sake, the destination QEMU is started on the same host, but
744
+it could be located elsewhere)::
745
+
746
+ $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
747
+ -M q35 -nodefaults -m 512 \
748
+ -blockdev node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \
749
+ -device virtio-blk,drive=node-TargetDisk,id=virtio0 \
750
+ -S -monitor stdio -qmp unix:./qmp-sock2,server,nowait \
751
+ -incoming tcp:localhost:6666
752
+
753
+Given the disk image chain on source QEMU::
754
+
755
+ [A] <-- [B] <-- [C] <-- [D]
756
+
757
+On the destination host, it is expected that the contents of the chain
758
+``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only*
759
+the content of image [D].
760
+
761
+(1) [On *destination* QEMU] As part of the first step, start the
762
+ built-in NBD server on a given host (local host, represented by
763
+ ``::``)and port::
764
+
765
+ (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}}
766
+ {
767
+ "execute": "nbd-server-start",
768
+ "arguments": {
769
+ "addr": {
770
+ "data": {
771
+ "host": "::",
772
+ "port": "49153"
773
+ },
774
+ "type": "inet"
775
+ }
776
+ }
777
+ }
778
+
779
+(2) [On *destination* QEMU] And export the destination disk image using
780
+ QEMU's built-in NBD server::
781
+
782
+ (QEMU) nbd-server-add device=node-TargetDisk writable=true
783
+ {
784
+ "execute": "nbd-server-add",
785
+ "arguments": {
786
+ "device": "node-TargetDisk"
787
+ }
788
+ }
789
+
790
+(3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're
791
+ running ``drive-mirror`` with ``mode=existing`` (meaning:
792
+ synchronize to a pre-created file, therefore 'existing', file on the
793
+ target host), with the synchronization mode as 'top' (``"sync:
794
+ "top"``)::
795
+
796
+ (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0
797
+ {
798
+ "execute": "drive-mirror",
799
+ "arguments": {
800
+ "device": "node-D",
801
+ "mode": "existing",
802
+ "job-id": "job0",
803
+ "target": "nbd:localhost:49153:exportname=node-TargetDisk",
804
+ "sync": "top"
805
+ }
806
+ }
807
+
808
+(4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the
809
+ event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to
810
+ gracefully end the synchronization, from source QEMU::
811
+
812
+ (QEMU) block-job-cancel device=job0
813
+ {
814
+ "execute": "block-job-cancel",
815
+ "arguments": {
816
+ "device": "job0"
817
+ }
818
+ }
819
+
820
+(5) [On *destination* QEMU] Then, stop the NBD server::
821
+
822
+ (QEMU) nbd-server-stop
823
+ {
824
+ "execute": "nbd-server-stop",
825
+ "arguments": {}
826
+ }
827
+
828
+(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
829
+ QMP command `cont`::
830
+
831
+ (QEMU) cont
832
+ {
833
+ "execute": "cont",
834
+ "arguments": {}
835
+ }
836
+
837
+.. note::
838
+ Higher-level libraries (e.g. libvirt) automate the entire above
839
+ process (although note that libvirt does not allow same-host
840
+ migrations to localhost for other reasons).
841
+
842
+
843
+Notes on ``blockdev-mirror``
844
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
845
+
846
+The ``blockdev-mirror`` command is equivalent in core functionality to
847
+``drive-mirror``, except that it operates at node-level in a BDS graph.
848
+
849
+Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly
850
+created (using ``qemu-img``) and attach it to live QEMU via
851
+``blockdev-add``, which assigns a name to the to-be created target node.
852
+
853
+E.g. the sequence of actions to create a point-in-time backup of an
854
+entire disk image chain, to a target, using ``blockdev-mirror`` would be:
855
+
856
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
857
+ depth
858
+
859
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
860
+
861
+(2) Attach the above created file (``e.qcow2``), run-time, using
862
+ ``blockdev-add`` to QEMU
863
+
864
+(3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the
865
+ entire chain to the target). And notice the event
866
+ ``BLOCK_JOB_READY``
867
+
868
+(4) Optionally, query for active block jobs, there should be a 'mirror'
869
+ job ready to be completed
870
+
871
+(5) Gracefully complete the 'mirror' block device job, and notice the
872
+ the event ``BLOCK_JOB_COMPLETED``
873
+
874
+(6) Shutdown the guest by issuing the QMP ``quit`` command so that
875
+ caches are flushed
876
+
877
+(7) Then, finally, compare the contents of the disk image chain, and
878
+ the target copy with ``qemu-img compare``. You should notice:
879
+ "Images are identical"
880
+
881
+
882
+QMP invocation for ``blockdev-mirror``
883
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
884
+
885
+Given the disk image chain::
886
+
887
+ [A] <-- [B] <-- [C] <-- [D]
888
+
889
+To copy the contents of the entire disk image chain, from [A] all the
890
+way to [D], to a new target, call it [E]. The following is the flow.
891
+
892
+Create the overlay images, [B], [C], and [D]::
893
+
894
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
895
+ (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
896
+ (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
897
+
898
+Create the target image, [E]::
899
+
900
+ $ qemu-img create -f qcow2 e.qcow2 39M
901
+
902
+Add the above created target image to QEMU, via ``blockdev-add``::
903
+
904
+ (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
905
+ {
906
+ "execute": "blockdev-add",
907
+ "arguments": {
908
+ "node-name": "node-E",
909
+ "driver": "qcow2",
910
+ "file": {
911
+ "driver": "file",
912
+ "filename": "e.qcow2"
913
+ }
914
+ }
915
+ }
916
+
917
+Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``::
918
+
919
+ (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0
920
+ {
921
+ "execute": "blockdev-mirror",
922
+ "arguments": {
923
+ "device": "node-D",
924
+ "job-id": "job0",
925
+ "target": "node-E",
926
+ "sync": "full"
927
+ }
928
+ }
929
+
930
+Query for active block jobs, there should be a 'mirror' job ready::
931
+
932
+ (QEMU) query-block-jobs
933
+ {
934
+ "execute": "query-block-jobs",
935
+ "arguments": {}
936
+ }
937
+ {
938
+ "return": [
939
+ {
940
+ "busy": false,
941
+ "type": "mirror",
942
+ "len": 21561344,
943
+ "paused": false,
944
+ "ready": true,
945
+ "io-status": "ok",
946
+ "offset": 21561344,
947
+ "device": "job0",
948
+ "speed": 0
949
+ }
950
+ ]
951
+ }
952
+
953
+Gracefully complete the block device job operation, and notice the
954
+event ``BLOCK_JOB_COMPLETED``::
955
+
956
+ (QEMU) block-job-complete device=job0
957
+ {
958
+ "execute": "block-job-complete",
959
+ "arguments": {
960
+ "device": "job0"
961
+ }
962
+ }
963
+ {
964
+ "return": {}
965
+ }
966
+
967
+Shutdown the guest, by issuing the ``quit`` QMP command::
968
+
969
+ (QEMU) quit
970
+ {
971
+ "execute": "quit",
972
+ "arguments": {}
973
+ }
974
+
975
+
976
+Live disk backup --- ``drive-backup`` and ``blockdev-backup``
977
+-------------------------------------------------------------
978
+
979
+The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
980
+you to create a point-in-time snapshot.
981
+
982
+In this case, the point-in-time is when you *start* the ``drive-backup``
983
+(or its newer equivalent ``blockdev-backup``) command.
984
+
985
+
986
+QMP invocation for ``drive-backup``
987
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
988
+
989
+Yet again, starting afresh with our example disk image chain::
990
+
991
+ [A] <-- [B] <-- [C] <-- [D]
992
+
993
+To create a target image [E], with content populated from image [A] to
994
+[D], from the above chain, the following is the syntax. (If the target
995
+image does not exist, ``drive-backup`` will create it)::
996
+
997
+ (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0
998
+ {
999
+ "execute": "drive-backup",
1000
+ "arguments": {
1001
+ "device": "node-D",
1002
+ "job-id": "job0",
1003
+ "sync": "full",
1004
+ "target": "e.qcow2"
1005
+ }
1006
+ }
1007
+
1008
+Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event
1009
+will be issued, indicating the live block device job operation has
1010
+completed, and no further action is required.
1011
+
1012
+
1013
+Notes on ``blockdev-backup``
1014
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1015
+
1016
+The ``blockdev-backup`` command is equivalent in functionality to
1017
+``drive-backup``, except that it operates at node-level in a Block Driver
1018
+State (BDS) graph.
1019
+
1020
+E.g. the sequence of actions to create a point-in-time backup
1021
+of an entire disk image chain, to a target, using ``blockdev-backup``
1022
+would be:
1023
+
1024
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
1025
+ depth
1026
+
1027
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
1028
+
1029
+(2) Attach the above created file (``e.qcow2``), run-time, using
1030
+ ``blockdev-add`` to QEMU
1031
+
1032
+(3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the
1033
+ entire chain to the target). And notice the event
1034
+ ``BLOCK_JOB_COMPLETED``
1035
+
1036
+(4) Shutdown the guest, by issuing the QMP ``quit`` command, so that
1037
+ caches are flushed
1038
+
1039
+(5) Then, finally, compare the contents of the disk image chain, and
1040
+ the target copy with ``qemu-img compare``. You should notice:
1041
+ "Images are identical"
1042
+
1043
+The following section shows an example QMP invocation for
1044
+``blockdev-backup``.
1045
+
1046
+QMP invocation for ``blockdev-backup``
1047
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1048
+
1049
+Given a disk image chain of depth 1 where image [B] is the active
1050
+overlay (live QEMU is writing to it)::
1051
+
1052
+ [A] <-- [B]
1053
+
1054
+The following is the procedure to copy the content from the entire chain
1055
+to a target image (say, [E]), which has the full content from [A] and
1056
+[B].
1057
+
1058
+Create the overlay [B]::
1059
+
1060
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
1061
+ {
1062
+ "execute": "blockdev-snapshot-sync",
1063
+ "arguments": {
1064
+ "node-name": "node-A",
1065
+ "snapshot-file": "b.qcow2",
1066
+ "format": "qcow2",
1067
+ "snapshot-node-name": "node-B"
1068
+ }
1069
+ }
1070
+
1071
+
1072
+Create a target image that will contain the copy::
1073
+
1074
+ $ qemu-img create -f qcow2 e.qcow2 39M
1075
+
1076
+Then add it to QEMU via ``blockdev-add``::
1077
+
1078
+ (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
1079
+ {
1080
+ "execute": "blockdev-add",
1081
+ "arguments": {
1082
+ "node-name": "node-E",
1083
+ "driver": "qcow2",
1084
+ "file": {
1085
+ "driver": "file",
1086
+ "filename": "e.qcow2"
1087
+ }
1088
+ }
1089
+ }
1090
+
1091
+Then invoke ``blockdev-backup`` to copy the contents from the entire
1092
+image chain, consisting of images [A] and [B] to the target image
1093
+'e.qcow2'::
1094
+
1095
+ (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0
1096
+ {
1097
+ "execute": "blockdev-backup",
1098
+ "arguments": {
1099
+ "device": "node-B",
1100
+ "job-id": "job0",
1101
+ "target": "node-E",
1102
+ "sync": "full"
1103
+ }
1104
+ }
1105
+
1106
+Once the above 'backup' operation has completed, the event,
1107
+``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful
1108
+completion.
1109
+
1110
+Next, query for any active block device jobs (there should be none)::
1111
+
1112
+ (QEMU) query-block-jobs
1113
+ {
1114
+ "execute": "query-block-jobs",
1115
+ "arguments": {}
1116
+ }
1117
+
1118
+Shutdown the guest::
1119
+
1120
+ (QEMU) quit
1121
+ {
1122
+ "execute": "quit",
1123
+ "arguments": {}
1124
+ }
1125
+ "return": {}
1126
+ }
1127
+
1128
+.. note::
1129
+ The above step is really important; if forgotten, an error, "Failed
1130
+ to get shared "write" lock on e.qcow2", will be thrown when you do
1131
+ ``qemu-img compare`` to verify the integrity of the disk image
1132
+ with the backup content.
1133
+
1134
+
1135
+The end result will be the image 'e.qcow2' containing a
1136
+point-in-time backup of the disk image chain -- i.e. contents from
1137
+images [A] and [B] at the time the ``blockdev-backup`` command was
1138
+initiated.
1139
+
1140
+One way to confirm the backup disk image contains the identical content
1141
+with the disk image chain is to compare the backup and the contents of
1142
+the chain, you should see "Images are identical". (NB: this is assuming
1143
+QEMU was launched with ``-S`` option, which will not start the CPUs at
1144
+guest boot up)::
1145
+
1146
+ $ qemu-img compare b.qcow2 e.qcow2
1147
+ Warning: Image size mismatch!
1148
+ Images are identical.
1149
+
1150
+NOTE: The "Warning: Image size mismatch!" is expected, as we created the
1151
+target image (e.qcow2) with 39M size.
1152
diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
1153
deleted file mode 100644
1154
index XXXXXXX..XXXXXXX
1155
--- a/docs/live-block-ops.txt
1156
+++ /dev/null
1157
@@ -XXX,XX +XXX,XX @@
1158
-LIVE BLOCK OPERATIONS
1159
-=====================
1160
-
1501
-
1161
-High level description of live block operations. Note these are not
1502
vm.launch()
1162
-supported for use with the raw format at the moment.
1503
blockdev_create(vm, { 'driver': 'ssh',
1163
-
1504
'location': {
1164
-Note also that this document is incomplete and it currently only
1505
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
1165
-covers the 'stream' operation. Other operations supported by QEMU such
1506
'host-key-check': {
1166
-as 'commit', 'mirror' and 'backup' are not described here yet. Please
1507
'mode': 'hash',
1167
-refer to the qapi/block-core.json file for an overview of those.
1508
'type': 'sha1',
1168
-
1509
- 'hash': sha1_key,
1169
-Snapshot live merge
1510
+ 'hash': sha1_keys[matching_key],
1170
-===================
1511
}
1171
-
1512
},
1172
-Given a snapshot chain, described in this document in the following
1513
'size': 4194304 })
1173
-format:
1514
diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
1174
-
1515
index XXXXXXX..XXXXXXX 100644
1175
-[A] <- [B] <- [C] <- [D] <- [E]
1516
--- a/tests/qemu-iotests/207.out
1176
-
1517
+++ b/tests/qemu-iotests/207.out
1177
-Where the rightmost object ([E] in the example) described is the current
1518
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
1178
-image which the guest OS has write access to. To the left of it is its base
1519
1179
-image, and so on accordingly until the leftmost image, which has no
1520
{"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
1180
-base.
1521
{"return": {}}
1181
-
1522
-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
1182
-The snapshot live merge operation transforms such a chain into a
1523
+Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
1183
-smaller one with fewer elements, such as this transformation relative
1524
{"execute": "job-dismiss", "arguments": {"id": "job0"}}
1184
-to the first example:
1525
{"return": {}}
1185
-
1526
1186
-[A] <- [E]
1187
-
1188
-Data is copied in the right direction with destination being the
1189
-rightmost image, but any other intermediate image can be specified
1190
-instead. In this example data is copied from [C] into [D], so [D] can
1191
-be backed by [B]:
1192
-
1193
-[A] <- [B] <- [D] <- [E]
1194
-
1195
-The operation is implemented in QEMU through image streaming facilities.
1196
-
1197
-The basic idea is to execute 'block_stream virtio0' while the guest is
1198
-running. Progress can be monitored using 'info block-jobs'. When the
1199
-streaming operation completes it raises a QMP event. 'block_stream'
1200
-copies data from the backing file(s) into the active image. When finished,
1201
-it adjusts the backing file pointer.
1202
-
1203
-The 'base' parameter specifies an image which data need not be
1204
-streamed from. This image will be used as the backing file for the
1205
-destination image when the operation is finished.
1206
-
1207
-In the first example above, the command would be:
1208
-
1209
-(qemu) block_stream virtio0 file-A.img
1210
-
1211
-In order to specify a destination image different from the active
1212
-(rightmost) one we can use its node name instead.
1213
-
1214
-In the second example above, the command would be:
1215
-
1216
-(qemu) block_stream node-D file-B.img
1217
-
1218
-Live block copy
1219
-===============
1220
-
1221
-To copy an in use image to another destination in the filesystem, one
1222
-should create a live snapshot in the desired destination, then stream
1223
-into that image. Example:
1224
-
1225
-(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
1226
-
1227
-(qemu) block_stream ide0-hd0
1228
-
1229
-
1230
--
1527
--
1231
2.9.4
1528
2.21.0
1232
1529
1233
1530
diff view generated by jsdifflib
1
From: Kashyap Chamarthy <kchamart@redhat.com>
1
Tests should place their files into the test directory. This includes
2
Unix sockets. 205 currently fails to do so, which prevents it from
3
being run concurrently.
2
4
3
This is part of the on-going effort to convert QEMU upstream
5
Signed-off-by: Max Reitz <mreitz@redhat.com>
4
documentation syntax to reStructuredText (rST).
6
Message-id: 20190618210238.9524-1-mreitz@redhat.com
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Signed-off-by: Max Reitz <mreitz@redhat.com>
9
---
10
tests/qemu-iotests/205 | 2 +-
11
1 file changed, 1 insertion(+), 1 deletion(-)
5
12
6
The conversion to rST was done using:
13
diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
7
14
index XXXXXXX..XXXXXXX 100755
8
$ pandoc -f markdown -t rst bitmaps.md -o bitmaps.rst
15
--- a/tests/qemu-iotests/205
9
16
+++ b/tests/qemu-iotests/205
10
Then, make a couple of small syntactical adjustments. While at it,
17
@@ -XXX,XX +XXX,XX @@ import iotests
11
reword a statement to avoid ambiguity. Addressing the feedback from
18
import time
12
this thread:
19
from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
13
20
14
https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg05428.html
21
-nbd_sock = 'nbd_sock'
15
22
+nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
16
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
23
nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
17
Reviewed-by: John Snow <jsnow@redhat.com>
24
disk = os.path.join(iotests.test_dir, 'disk')
18
Reviewed-by: Eric Blake <eblake@redhat.com>
25
19
Message-id: 20170717105205.32639-2-kchamart@redhat.com
20
Signed-off-by: Jeff Cody <jcody@redhat.com>
21
---
22
docs/devel/bitmaps.md | 505 ------------------------------------------
23
docs/interop/bitmaps.rst | 555 +++++++++++++++++++++++++++++++++++++++++++++++
24
2 files changed, 555 insertions(+), 505 deletions(-)
25
delete mode 100644 docs/devel/bitmaps.md
26
create mode 100644 docs/interop/bitmaps.rst
27
28
diff --git a/docs/devel/bitmaps.md b/docs/devel/bitmaps.md
29
deleted file mode 100644
30
index XXXXXXX..XXXXXXX
31
--- a/docs/devel/bitmaps.md
32
+++ /dev/null
33
@@ -XXX,XX +XXX,XX @@
34
-<!--
35
-Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
36
-All rights reserved.
37
-
38
-This file is licensed via The FreeBSD Documentation License, the full text of
39
-which is included at the end of this document.
40
--->
41
-
42
-# Dirty Bitmaps and Incremental Backup
43
-
44
-* Dirty Bitmaps are objects that track which data needs to be backed up for the
45
- next incremental backup.
46
-
47
-* Dirty bitmaps can be created at any time and attached to any node
48
- (not just complete drives.)
49
-
50
-## Dirty Bitmap Names
51
-
52
-* A dirty bitmap's name is unique to the node, but bitmaps attached to different
53
- nodes can share the same name.
54
-
55
-* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
56
- name, but any user-created bitmaps may not be. There can be any number of
57
- anonymous bitmaps per node.
58
-
59
-* The name of a user-created bitmap must not be empty ("").
60
-
61
-## Bitmap Modes
62
-
63
-* A Bitmap can be "frozen," which means that it is currently in-use by a backup
64
- operation and cannot be deleted, renamed, written to, reset,
65
- etc.
66
-
67
-* The normal operating mode for a bitmap is "active."
68
-
69
-## Basic QMP Usage
70
-
71
-### Supported Commands ###
72
-
73
-* block-dirty-bitmap-add
74
-* block-dirty-bitmap-remove
75
-* block-dirty-bitmap-clear
76
-
77
-### Creation
78
-
79
-* To create a new bitmap, enabled, on the drive with id=drive0:
80
-
81
-```json
82
-{ "execute": "block-dirty-bitmap-add",
83
- "arguments": {
84
- "node": "drive0",
85
- "name": "bitmap0"
86
- }
87
-}
88
-```
89
-
90
-* This bitmap will have a default granularity that matches the cluster size of
91
- its associated drive, if available, clamped to between [4KiB, 64KiB].
92
- The current default for qcow2 is 64KiB.
93
-
94
-* To create a new bitmap that tracks changes in 32KiB segments:
95
-
96
-```json
97
-{ "execute": "block-dirty-bitmap-add",
98
- "arguments": {
99
- "node": "drive0",
100
- "name": "bitmap0",
101
- "granularity": 32768
102
- }
103
-}
104
-```
105
-
106
-### Deletion
107
-
108
-* Bitmaps that are frozen cannot be deleted.
109
-
110
-* Deleting the bitmap does not impact any other bitmaps attached to the same
111
- node, nor does it affect any backups already created from this node.
112
-
113
-* Because bitmaps are only unique to the node to which they are attached,
114
- you must specify the node/drive name here, too.
115
-
116
-```json
117
-{ "execute": "block-dirty-bitmap-remove",
118
- "arguments": {
119
- "node": "drive0",
120
- "name": "bitmap0"
121
- }
122
-}
123
-```
124
-
125
-### Resetting
126
-
127
-* Resetting a bitmap will clear all information it holds.
128
-
129
-* An incremental backup created from an empty bitmap will copy no data,
130
- as if nothing has changed.
131
-
132
-```json
133
-{ "execute": "block-dirty-bitmap-clear",
134
- "arguments": {
135
- "node": "drive0",
136
- "name": "bitmap0"
137
- }
138
-}
139
-```
140
-
141
-## Transactions
142
-
143
-### Justification
144
-
145
-Bitmaps can be safely modified when the VM is paused or halted by using
146
-the basic QMP commands. For instance, you might perform the following actions:
147
-
148
-1. Boot the VM in a paused state.
149
-2. Create a full drive backup of drive0.
150
-3. Create a new bitmap attached to drive0.
151
-4. Resume execution of the VM.
152
-5. Incremental backups are ready to be created.
153
-
154
-At this point, the bitmap and drive backup would be correctly in sync,
155
-and incremental backups made from this point forward would be correctly aligned
156
-to the full drive backup.
157
-
158
-This is not particularly useful if we decide we want to start incremental
159
-backups after the VM has been running for a while, for which we will need to
160
-perform actions such as the following:
161
-
162
-1. Boot the VM and begin execution.
163
-2. Using a single transaction, perform the following operations:
164
- * Create bitmap0.
165
- * Create a full drive backup of drive0.
166
-3. Incremental backups are now ready to be created.
167
-
168
-### Supported Bitmap Transactions
169
-
170
-* block-dirty-bitmap-add
171
-* block-dirty-bitmap-clear
172
-
173
-The usages are identical to their respective QMP commands, but see below
174
-for examples.
175
-
176
-### Example: New Incremental Backup
177
-
178
-As outlined in the justification, perhaps we want to create a new incremental
179
-backup chain attached to a drive.
180
-
181
-```json
182
-{ "execute": "transaction",
183
- "arguments": {
184
- "actions": [
185
- {"type": "block-dirty-bitmap-add",
186
- "data": {"node": "drive0", "name": "bitmap0"} },
187
- {"type": "drive-backup",
188
- "data": {"device": "drive0", "target": "/path/to/full_backup.img",
189
- "sync": "full", "format": "qcow2"} }
190
- ]
191
- }
192
-}
193
-```
194
-
195
-### Example: New Incremental Backup Anchor Point
196
-
197
-Maybe we just want to create a new full backup with an existing bitmap and
198
-want to reset the bitmap to track the new chain.
199
-
200
-```json
201
-{ "execute": "transaction",
202
- "arguments": {
203
- "actions": [
204
- {"type": "block-dirty-bitmap-clear",
205
- "data": {"node": "drive0", "name": "bitmap0"} },
206
- {"type": "drive-backup",
207
- "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
208
- "sync": "full", "format": "qcow2"} }
209
- ]
210
- }
211
-}
212
-```
213
-
214
-## Incremental Backups
215
-
216
-The star of the show.
217
-
218
-**Nota Bene!** Only incremental backups of entire drives are supported for now.
219
-So despite the fact that you can attach a bitmap to any arbitrary node, they are
220
-only currently useful when attached to the root node. This is because
221
-drive-backup only supports drives/devices instead of arbitrary nodes.
222
-
223
-### Example: First Incremental Backup
224
-
225
-1. Create a full backup and sync it to the dirty bitmap, as in the transactional
226
-examples above; or with the VM offline, manually create a full copy and then
227
-create a new bitmap before the VM begins execution.
228
-
229
- * Let's assume the full backup is named 'full_backup.img'.
230
- * Let's assume the bitmap you created is 'bitmap0' attached to 'drive0'.
231
-
232
-2. Create a destination image for the incremental backup that utilizes the
233
-full backup as a backing image.
234
-
235
- * Let's assume it is named 'incremental.0.img'.
236
-
237
- ```sh
238
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
239
- ```
240
-
241
-3. Issue the incremental backup command:
242
-
243
- ```json
244
- { "execute": "drive-backup",
245
- "arguments": {
246
- "device": "drive0",
247
- "bitmap": "bitmap0",
248
- "target": "incremental.0.img",
249
- "format": "qcow2",
250
- "sync": "incremental",
251
- "mode": "existing"
252
- }
253
- }
254
- ```
255
-
256
-### Example: Second Incremental Backup
257
-
258
-1. Create a new destination image for the incremental backup that points to the
259
- previous one, e.g.: 'incremental.1.img'
260
-
261
- ```sh
262
- # qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
263
- ```
264
-
265
-2. Issue a new incremental backup command. The only difference here is that we
266
- have changed the target image below.
267
-
268
- ```json
269
- { "execute": "drive-backup",
270
- "arguments": {
271
- "device": "drive0",
272
- "bitmap": "bitmap0",
273
- "target": "incremental.1.img",
274
- "format": "qcow2",
275
- "sync": "incremental",
276
- "mode": "existing"
277
- }
278
- }
279
- ```
280
-
281
-## Errors
282
-
283
-* In the event of an error that occurs after a backup job is successfully
284
- launched, either by a direct QMP command or a QMP transaction, the user
285
- will receive a BLOCK_JOB_COMPLETE event with a failure message, accompanied
286
- by a BLOCK_JOB_ERROR event.
287
-
288
-* In the case of an event being cancelled, the user will receive a
289
- BLOCK_JOB_CANCELLED event instead of a pair of COMPLETE and ERROR events.
290
-
291
-* In either case, the incremental backup data contained within the bitmap is
292
- safely rolled back, and the data within the bitmap is not lost. The image
293
- file created for the failed attempt can be safely deleted.
294
-
295
-* Once the underlying problem is fixed (e.g. more storage space is freed up),
296
- you can simply retry the incremental backup command with the same bitmap.
297
-
298
-### Example
299
-
300
-1. Create a target image:
301
-
302
- ```sh
303
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
304
- ```
305
-
306
-2. Attempt to create an incremental backup via QMP:
307
-
308
- ```json
309
- { "execute": "drive-backup",
310
- "arguments": {
311
- "device": "drive0",
312
- "bitmap": "bitmap0",
313
- "target": "incremental.0.img",
314
- "format": "qcow2",
315
- "sync": "incremental",
316
- "mode": "existing"
317
- }
318
- }
319
- ```
320
-
321
-3. Receive an event notifying us of failure:
322
-
323
- ```json
324
- { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
325
- "data": { "speed": 0, "offset": 0, "len": 67108864,
326
- "error": "No space left on device",
327
- "device": "drive1", "type": "backup" },
328
- "event": "BLOCK_JOB_COMPLETED" }
329
- ```
330
-
331
-4. Delete the failed incremental, and re-create the image.
332
-
333
- ```sh
334
- # rm incremental.0.img
335
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
336
- ```
337
-
338
-5. Retry the command after fixing the underlying problem,
339
- such as freeing up space on the backup volume:
340
-
341
- ```json
342
- { "execute": "drive-backup",
343
- "arguments": {
344
- "device": "drive0",
345
- "bitmap": "bitmap0",
346
- "target": "incremental.0.img",
347
- "format": "qcow2",
348
- "sync": "incremental",
349
- "mode": "existing"
350
- }
351
- }
352
- ```
353
-
354
-6. Receive confirmation that the job completed successfully:
355
-
356
- ```json
357
- { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
358
- "data": { "device": "drive1", "type": "backup",
359
- "speed": 0, "len": 67108864, "offset": 67108864},
360
- "event": "BLOCK_JOB_COMPLETED" }
361
- ```
362
-
363
-### Partial Transactional Failures
364
-
365
-* Sometimes, a transaction will succeed in launching and return success,
366
- but then later the backup jobs themselves may fail. It is possible that
367
- a management application may have to deal with a partial backup failure
368
- after a successful transaction.
369
-
370
-* If multiple backup jobs are specified in a single transaction, when one of
371
- them fails, it will not interact with the other backup jobs in any way.
372
-
373
-* The job(s) that succeeded will clear the dirty bitmap associated with the
374
- operation, but the job(s) that failed will not. It is not "safe" to delete
375
- any incremental backups that were created successfully in this scenario,
376
- even though others failed.
377
-
378
-#### Example
379
-
380
-* QMP example highlighting two backup jobs:
381
-
382
- ```json
383
- { "execute": "transaction",
384
- "arguments": {
385
- "actions": [
386
- { "type": "drive-backup",
387
- "data": { "device": "drive0", "bitmap": "bitmap0",
388
- "format": "qcow2", "mode": "existing",
389
- "sync": "incremental", "target": "d0-incr-1.qcow2" } },
390
- { "type": "drive-backup",
391
- "data": { "device": "drive1", "bitmap": "bitmap1",
392
- "format": "qcow2", "mode": "existing",
393
- "sync": "incremental", "target": "d1-incr-1.qcow2" } },
394
- ]
395
- }
396
- }
397
- ```
398
-
399
-* QMP example response, highlighting one success and one failure:
400
- * Acknowledgement that the Transaction was accepted and jobs were launched:
401
- ```json
402
- { "return": {} }
403
- ```
404
-
405
- * Later, QEMU sends notice that the first job was completed:
406
- ```json
407
- { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
408
- "data": { "device": "drive0", "type": "backup",
409
- "speed": 0, "len": 67108864, "offset": 67108864 },
410
- "event": "BLOCK_JOB_COMPLETED"
411
- }
412
- ```
413
-
414
- * Later yet, QEMU sends notice that the second job has failed:
415
- ```json
416
- { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
417
- "data": { "device": "drive1", "action": "report",
418
- "operation": "read" },
419
- "event": "BLOCK_JOB_ERROR" }
420
- ```
421
-
422
- ```json
423
- { "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
424
- "data": { "speed": 0, "offset": 0, "len": 67108864,
425
- "error": "Input/output error",
426
- "device": "drive1", "type": "backup" },
427
- "event": "BLOCK_JOB_COMPLETED" }
428
-
429
-* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
430
- but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
431
- incremental backup of all drives at a point-in-time is to be made,
432
- new backups for both drives will need to be made, taking into account
433
- that a new incremental backup for drive0 needs to be based on top of
434
- "d0-incr-1.qcow2."
435
-
436
-### Grouped Completion Mode
437
-
438
-* While jobs launched by transactions normally complete or fail on their own,
439
- it is possible to instruct them to complete or fail together as a group.
440
-
441
-* QMP transactions take an optional properties structure that can affect
442
- the semantics of the transaction.
443
-
444
-* The "completion-mode" transaction property can be either "individual"
445
- which is the default, legacy behavior described above, or "grouped,"
446
- a new behavior detailed below.
447
-
448
-* Delayed Completion: In grouped completion mode, no jobs will report
449
- success until all jobs are ready to report success.
450
-
451
-* Grouped failure: If any job fails in grouped completion mode, all remaining
452
- jobs will be cancelled. Any incremental backups will restore their dirty
453
- bitmap objects as if no backup command was ever issued.
454
-
455
- * Regardless of if QEMU reports a particular incremental backup job as
456
- CANCELLED or as an ERROR, the in-memory bitmap will be restored.
457
-
458
-#### Example
459
-
460
-* Here's the same example scenario from above with the new property:
461
-
462
- ```json
463
- { "execute": "transaction",
464
- "arguments": {
465
- "actions": [
466
- { "type": "drive-backup",
467
- "data": { "device": "drive0", "bitmap": "bitmap0",
468
- "format": "qcow2", "mode": "existing",
469
- "sync": "incremental", "target": "d0-incr-1.qcow2" } },
470
- { "type": "drive-backup",
471
- "data": { "device": "drive1", "bitmap": "bitmap1",
472
- "format": "qcow2", "mode": "existing",
473
- "sync": "incremental", "target": "d1-incr-1.qcow2" } },
474
- ],
475
- "properties": {
476
- "completion-mode": "grouped"
477
- }
478
- }
479
- }
480
- ```
481
-
482
-* QMP example response, highlighting a failure for drive2:
483
- * Acknowledgement that the Transaction was accepted and jobs were launched:
484
- ```json
485
- { "return": {} }
486
- ```
487
-
488
- * Later, QEMU sends notice that the second job has errored out,
489
- but that the first job was also cancelled:
490
- ```json
491
- { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
492
- "data": { "device": "drive1", "action": "report",
493
- "operation": "read" },
494
- "event": "BLOCK_JOB_ERROR" }
495
- ```
496
-
497
- ```json
498
- { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
499
- "data": { "speed": 0, "offset": 0, "len": 67108864,
500
- "error": "Input/output error",
501
- "device": "drive1", "type": "backup" },
502
- "event": "BLOCK_JOB_COMPLETED" }
503
- ```
504
-
505
- ```json
506
- { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
507
- "data": { "device": "drive0", "type": "backup", "speed": 0,
508
- "len": 67108864, "offset": 16777216 },
509
- "event": "BLOCK_JOB_CANCELLED" }
510
- ```
511
-
512
-<!--
513
-The FreeBSD Documentation License
514
-
515
-Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
516
-PDF, PostScript, RTF and so forth) with or without modification, are permitted
517
-provided that the following conditions are met:
518
-
519
-Redistributions of source code (Markdown) must retain the above copyright
520
-notice, this list of conditions and the following disclaimer of this file
521
-unmodified.
522
-
523
-Redistributions in compiled form (transformed to other DTDs, converted to PDF,
524
-PostScript, RTF and other formats) must reproduce the above copyright notice,
525
-this list of conditions and the following disclaimer in the documentation and/or
526
-other materials provided with the distribution.
527
-
528
-THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
529
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
530
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
531
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
532
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
533
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
534
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
535
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
536
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
537
-THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
538
--->
539
diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
540
new file mode 100644
541
index XXXXXXX..XXXXXXX
542
--- /dev/null
543
+++ b/docs/interop/bitmaps.rst
544
@@ -XXX,XX +XXX,XX @@
545
+..
546
+ Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
547
+ All rights reserved.
548
+
549
+ This file is licensed via The FreeBSD Documentation License, the full
550
+ text of which is included at the end of this document.
551
+
552
+====================================
553
+Dirty Bitmaps and Incremental Backup
554
+====================================
555
+
556
+- Dirty Bitmaps are objects that track which data needs to be backed up
557
+ for the next incremental backup.
558
+
559
+- Dirty bitmaps can be created at any time and attached to any node
560
+ (not just complete drives).
561
+
562
+.. contents::
563
+
564
+Dirty Bitmap Names
565
+------------------
566
+
567
+- A dirty bitmap's name is unique to the node, but bitmaps attached to
568
+ different nodes can share the same name.
569
+
570
+- Dirty bitmaps created for internal use by QEMU may be anonymous and
571
+ have no name, but any user-created bitmaps must have a name. There
572
+ can be any number of anonymous bitmaps per node.
573
+
574
+- The name of a user-created bitmap must not be empty ("").
575
+
576
+Bitmap Modes
577
+------------
578
+
579
+- A bitmap can be "frozen," which means that it is currently in-use by
580
+ a backup operation and cannot be deleted, renamed, written to, reset,
581
+ etc.
582
+
583
+- The normal operating mode for a bitmap is "active."
584
+
585
+Basic QMP Usage
586
+---------------
587
+
588
+Supported Commands
589
+~~~~~~~~~~~~~~~~~~
590
+
591
+- ``block-dirty-bitmap-add``
592
+- ``block-dirty-bitmap-remove``
593
+- ``block-dirty-bitmap-clear``
594
+
595
+Creation
596
+~~~~~~~~
597
+
598
+- To create a new bitmap, enabled, on the drive with id=drive0:
599
+
600
+.. code:: json
601
+
602
+ { "execute": "block-dirty-bitmap-add",
603
+ "arguments": {
604
+ "node": "drive0",
605
+ "name": "bitmap0"
606
+ }
607
+ }
608
+
609
+- This bitmap will have a default granularity that matches the cluster
610
+ size of its associated drive, if available, clamped to between [4KiB,
611
+ 64KiB]. The current default for qcow2 is 64KiB.
612
+
613
+- To create a new bitmap that tracks changes in 32KiB segments:
614
+
615
+.. code:: json
616
+
617
+ { "execute": "block-dirty-bitmap-add",
618
+ "arguments": {
619
+ "node": "drive0",
620
+ "name": "bitmap0",
621
+ "granularity": 32768
622
+ }
623
+ }
624
+
625
+Deletion
626
+~~~~~~~~
627
+
628
+- Bitmaps that are frozen cannot be deleted.
629
+
630
+- Deleting the bitmap does not impact any other bitmaps attached to the
631
+ same node, nor does it affect any backups already created from this
632
+ node.
633
+
634
+- Because bitmaps are only unique to the node to which they are
635
+ attached, you must specify the node/drive name here, too.
636
+
637
+.. code:: json
638
+
639
+ { "execute": "block-dirty-bitmap-remove",
640
+ "arguments": {
641
+ "node": "drive0",
642
+ "name": "bitmap0"
643
+ }
644
+ }
645
+
646
+Resetting
647
+~~~~~~~~~
648
+
649
+- Resetting a bitmap will clear all information it holds.
650
+
651
+- An incremental backup created from an empty bitmap will copy no data,
652
+ as if nothing has changed.
653
+
654
+.. code:: json
655
+
656
+ { "execute": "block-dirty-bitmap-clear",
657
+ "arguments": {
658
+ "node": "drive0",
659
+ "name": "bitmap0"
660
+ }
661
+ }
662
+
663
+Transactions
664
+------------
665
+
666
+Justification
667
+~~~~~~~~~~~~~
668
+
669
+Bitmaps can be safely modified when the VM is paused or halted by using
670
+the basic QMP commands. For instance, you might perform the following
671
+actions:
672
+
673
+1. Boot the VM in a paused state.
674
+2. Create a full drive backup of drive0.
675
+3. Create a new bitmap attached to drive0.
676
+4. Resume execution of the VM.
677
+5. Incremental backups are ready to be created.
678
+
679
+At this point, the bitmap and drive backup would be correctly in sync,
680
+and incremental backups made from this point forward would be correctly
681
+aligned to the full drive backup.
682
+
683
+This is not particularly useful if we decide we want to start
684
+incremental backups after the VM has been running for a while, for which
685
+we will need to perform actions such as the following:
686
+
687
+1. Boot the VM and begin execution.
688
+2. Using a single transaction, perform the following operations:
689
+
690
+ - Create ``bitmap0``.
691
+ - Create a full drive backup of ``drive0``.
692
+
693
+3. Incremental backups are now ready to be created.
694
+
695
+Supported Bitmap Transactions
696
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
697
+
698
+- ``block-dirty-bitmap-add``
699
+- ``block-dirty-bitmap-clear``
700
+
701
+The usages are identical to their respective QMP commands, but see below
702
+for examples.
703
+
704
+Example: New Incremental Backup
705
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
706
+
707
+As outlined in the justification, perhaps we want to create a new
708
+incremental backup chain attached to a drive.
709
+
710
+.. code:: json
711
+
712
+ { "execute": "transaction",
713
+ "arguments": {
714
+ "actions": [
715
+ {"type": "block-dirty-bitmap-add",
716
+ "data": {"node": "drive0", "name": "bitmap0"} },
717
+ {"type": "drive-backup",
718
+ "data": {"device": "drive0", "target": "/path/to/full_backup.img",
719
+ "sync": "full", "format": "qcow2"} }
720
+ ]
721
+ }
722
+ }
723
+
724
+Example: New Incremental Backup Anchor Point
725
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
726
+
727
+Maybe we just want to create a new full backup with an existing bitmap
728
+and want to reset the bitmap to track the new chain.
729
+
730
+.. code:: json
731
+
732
+ { "execute": "transaction",
733
+ "arguments": {
734
+ "actions": [
735
+ {"type": "block-dirty-bitmap-clear",
736
+ "data": {"node": "drive0", "name": "bitmap0"} },
737
+ {"type": "drive-backup",
738
+ "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
739
+ "sync": "full", "format": "qcow2"} }
740
+ ]
741
+ }
742
+ }
743
+
744
+Incremental Backups
745
+-------------------
746
+
747
+The star of the show.
748
+
749
+**Nota Bene!** Only incremental backups of entire drives are supported
750
+for now. So despite the fact that you can attach a bitmap to any
751
+arbitrary node, they are only currently useful when attached to the root
752
+node. This is because drive-backup only supports drives/devices instead
753
+of arbitrary nodes.
754
+
755
+Example: First Incremental Backup
756
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
757
+
758
+1. Create a full backup and sync it to the dirty bitmap, as in the
759
+ transactional examples above; or with the VM offline, manually create
760
+ a full copy and then create a new bitmap before the VM begins
761
+ execution.
762
+
763
+ - Let's assume the full backup is named ``full_backup.img``.
764
+ - Let's assume the bitmap you created is ``bitmap0`` attached to
765
+ ``drive0``.
766
+
767
+2. Create a destination image for the incremental backup that utilizes
768
+ the full backup as a backing image.
769
+
770
+ - Let's assume the new incremental image is named
771
+ ``incremental.0.img``.
772
+
773
+ .. code:: bash
774
+
775
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
776
+
777
+3. Issue the incremental backup command:
778
+
779
+ .. code:: json
780
+
781
+ { "execute": "drive-backup",
782
+ "arguments": {
783
+ "device": "drive0",
784
+ "bitmap": "bitmap0",
785
+ "target": "incremental.0.img",
786
+ "format": "qcow2",
787
+ "sync": "incremental",
788
+ "mode": "existing"
789
+ }
790
+ }
791
+
792
+Example: Second Incremental Backup
793
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
794
+
795
+1. Create a new destination image for the incremental backup that points
796
+ to the previous one, e.g.: ``incremental.1.img``
797
+
798
+ .. code:: bash
799
+
800
+ $ qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
801
+
802
+2. Issue a new incremental backup command. The only difference here is
803
+ that we have changed the target image below.
804
+
805
+ .. code:: json
806
+
807
+ { "execute": "drive-backup",
808
+ "arguments": {
809
+ "device": "drive0",
810
+ "bitmap": "bitmap0",
811
+ "target": "incremental.1.img",
812
+ "format": "qcow2",
813
+ "sync": "incremental",
814
+ "mode": "existing"
815
+ }
816
+ }
817
+
818
+Errors
819
+------
820
+
821
+- In the event of an error that occurs after a backup job is
822
+ successfully launched, either by a direct QMP command or a QMP
823
+ transaction, the user will receive a ``BLOCK_JOB_COMPLETE`` event with
824
+ a failure message, accompanied by a ``BLOCK_JOB_ERROR`` event.
825
+
826
+- In the case of an event being cancelled, the user will receive a
827
+ ``BLOCK_JOB_CANCELLED`` event instead of a pair of COMPLETE and ERROR
828
+ events.
829
+
830
+- In either case, the incremental backup data contained within the
831
+ bitmap is safely rolled back, and the data within the bitmap is not
832
+ lost. The image file created for the failed attempt can be safely
833
+ deleted.
834
+
835
+- Once the underlying problem is fixed (e.g. more storage space is
836
+ freed up), you can simply retry the incremental backup command with
837
+ the same bitmap.
838
+
839
+Example
840
+~~~~~~~
841
+
842
+1. Create a target image:
843
+
844
+ .. code:: bash
845
+
846
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
847
+
848
+2. Attempt to create an incremental backup via QMP:
849
+
850
+ .. code:: json
851
+
852
+ { "execute": "drive-backup",
853
+ "arguments": {
854
+ "device": "drive0",
855
+ "bitmap": "bitmap0",
856
+ "target": "incremental.0.img",
857
+ "format": "qcow2",
858
+ "sync": "incremental",
859
+ "mode": "existing"
860
+ }
861
+ }
862
+
863
+3. Receive an event notifying us of failure:
864
+
865
+ .. code:: json
866
+
867
+ { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
868
+ "data": { "speed": 0, "offset": 0, "len": 67108864,
869
+ "error": "No space left on device",
870
+ "device": "drive1", "type": "backup" },
871
+ "event": "BLOCK_JOB_COMPLETED" }
872
+
873
+4. Delete the failed incremental, and re-create the image.
874
+
875
+ .. code:: bash
876
+
877
+ $ rm incremental.0.img
878
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
879
+
880
+5. Retry the command after fixing the underlying problem, such as
881
+ freeing up space on the backup volume:
882
+
883
+ .. code:: json
884
+
885
+ { "execute": "drive-backup",
886
+ "arguments": {
887
+ "device": "drive0",
888
+ "bitmap": "bitmap0",
889
+ "target": "incremental.0.img",
890
+ "format": "qcow2",
891
+ "sync": "incremental",
892
+ "mode": "existing"
893
+ }
894
+ }
895
+
896
+6. Receive confirmation that the job completed successfully:
897
+
898
+ .. code:: json
899
+
900
+ { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
901
+ "data": { "device": "drive1", "type": "backup",
902
+ "speed": 0, "len": 67108864, "offset": 67108864},
903
+ "event": "BLOCK_JOB_COMPLETED" }
904
+
905
+Partial Transactional Failures
906
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
907
+
908
+- Sometimes, a transaction will succeed in launching and return
909
+ success, but then later the backup jobs themselves may fail. It is
910
+ possible that a management application may have to deal with a
911
+ partial backup failure after a successful transaction.
912
+
913
+- If multiple backup jobs are specified in a single transaction, when
914
+ one of them fails, it will not interact with the other backup jobs in
915
+ any way.
916
+
917
+- The job(s) that succeeded will clear the dirty bitmap associated with
918
+ the operation, but the job(s) that failed will not. It is not "safe"
919
+ to delete any incremental backups that were created successfully in
920
+ this scenario, even though others failed.
921
+
922
+Example
923
+^^^^^^^
924
+
925
+- QMP example highlighting two backup jobs:
926
+
927
+ .. code:: json
928
+
929
+ { "execute": "transaction",
930
+ "arguments": {
931
+ "actions": [
932
+ { "type": "drive-backup",
933
+ "data": { "device": "drive0", "bitmap": "bitmap0",
934
+ "format": "qcow2", "mode": "existing",
935
+ "sync": "incremental", "target": "d0-incr-1.qcow2" } },
936
+ { "type": "drive-backup",
937
+ "data": { "device": "drive1", "bitmap": "bitmap1",
938
+ "format": "qcow2", "mode": "existing",
939
+ "sync": "incremental", "target": "d1-incr-1.qcow2" } },
940
+ ]
941
+ }
942
+ }
943
+
944
+- QMP example response, highlighting one success and one failure:
945
+
946
+ - Acknowledgement that the Transaction was accepted and jobs were
947
+ launched:
948
+
949
+ .. code:: json
950
+
951
+ { "return": {} }
952
+
953
+ - Later, QEMU sends notice that the first job was completed:
954
+
955
+ .. code:: json
956
+
957
+ { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
958
+ "data": { "device": "drive0", "type": "backup",
959
+ "speed": 0, "len": 67108864, "offset": 67108864 },
960
+ "event": "BLOCK_JOB_COMPLETED"
961
+ }
962
+
963
+ - Later yet, QEMU sends notice that the second job has failed:
964
+
965
+ .. code:: json
966
+
967
+ { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
968
+ "data": { "device": "drive1", "action": "report",
969
+ "operation": "read" },
970
+ "event": "BLOCK_JOB_ERROR" }
971
+
972
+ .. code:: json
973
+
974
+ { "timestamp": { "seconds": 1447192399, "microseconds":
975
+ 685853 }, "data": { "speed": 0, "offset": 0, "len": 67108864,
976
+ "error": "Input/output error", "device": "drive1", "type":
977
+ "backup" }, "event": "BLOCK_JOB_COMPLETED" }
978
+
979
+- In the above example, ``d0-incr-1.qcow2`` is valid and must be kept,
980
+ but ``d1-incr-1.qcow2`` is invalid and should be deleted. If a VM-wide
981
+ incremental backup of all drives at a point-in-time is to be made,
982
+ new backups for both drives will need to be made, taking into account
983
+ that a new incremental backup for drive0 needs to be based on top of
984
+ ``d0-incr-1.qcow2``.
985
+
986
+Grouped Completion Mode
987
+~~~~~~~~~~~~~~~~~~~~~~~
988
+
989
+- While jobs launched by transactions normally complete or fail on
990
+ their own, it is possible to instruct them to complete or fail
991
+ together as a group.
992
+
993
+- QMP transactions take an optional properties structure that can
994
+ affect the semantics of the transaction.
995
+
996
+- The "completion-mode" transaction property can be either "individual"
997
+ which is the default, legacy behavior described above, or "grouped,"
998
+ a new behavior detailed below.
999
+
1000
+- Delayed Completion: In grouped completion mode, no jobs will report
1001
+ success until all jobs are ready to report success.
1002
+
1003
+- Grouped failure: If any job fails in grouped completion mode, all
1004
+ remaining jobs will be cancelled. Any incremental backups will
1005
+ restore their dirty bitmap objects as if no backup command was ever
1006
+ issued.
1007
+
1008
+ - Regardless of if QEMU reports a particular incremental backup job
1009
+ as CANCELLED or as an ERROR, the in-memory bitmap will be
1010
+ restored.
1011
+
1012
+Example
1013
+^^^^^^^
1014
+
1015
+- Here's the same example scenario from above with the new property:
1016
+
1017
+ .. code:: json
1018
+
1019
+ { "execute": "transaction",
1020
+ "arguments": {
1021
+ "actions": [
1022
+ { "type": "drive-backup",
1023
+ "data": { "device": "drive0", "bitmap": "bitmap0",
1024
+ "format": "qcow2", "mode": "existing",
1025
+ "sync": "incremental", "target": "d0-incr-1.qcow2" } },
1026
+ { "type": "drive-backup",
1027
+ "data": { "device": "drive1", "bitmap": "bitmap1",
1028
+ "format": "qcow2", "mode": "existing",
1029
+ "sync": "incremental", "target": "d1-incr-1.qcow2" } },
1030
+ ],
1031
+ "properties": {
1032
+ "completion-mode": "grouped"
1033
+ }
1034
+ }
1035
+ }
1036
+
1037
+- QMP example response, highlighting a failure for ``drive2``:
1038
+
1039
+ - Acknowledgement that the Transaction was accepted and jobs were
1040
+ launched:
1041
+
1042
+ .. code:: json
1043
+
1044
+ { "return": {} }
1045
+
1046
+ - Later, QEMU sends notice that the second job has errored out, but
1047
+ that the first job was also cancelled:
1048
+
1049
+ .. code:: json
1050
+
1051
+ { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
1052
+ "data": { "device": "drive1", "action": "report",
1053
+ "operation": "read" },
1054
+ "event": "BLOCK_JOB_ERROR" }
1055
+
1056
+ .. code:: json
1057
+
1058
+ { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
1059
+ "data": { "speed": 0, "offset": 0, "len": 67108864,
1060
+ "error": "Input/output error",
1061
+ "device": "drive1", "type": "backup" },
1062
+ "event": "BLOCK_JOB_COMPLETED" }
1063
+
1064
+ .. code:: json
1065
+
1066
+ { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
1067
+ "data": { "device": "drive0", "type": "backup", "speed": 0,
1068
+ "len": 67108864, "offset": 16777216 },
1069
+ "event": "BLOCK_JOB_CANCELLED" }
1070
+
1071
+.. raw:: html
1072
+
1073
+ <!--
1074
+ The FreeBSD Documentation License
1075
+
1076
+ Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
1077
+ PDF, PostScript, RTF and so forth) with or without modification, are permitted
1078
+ provided that the following conditions are met:
1079
+
1080
+ Redistributions of source code (Markdown) must retain the above copyright
1081
+ notice, this list of conditions and the following disclaimer of this file
1082
+ unmodified.
1083
+
1084
+ Redistributions in compiled form (transformed to other DTDs, converted to PDF,
1085
+ PostScript, RTF and other formats) must reproduce the above copyright notice,
1086
+ this list of conditions and the following disclaimer in the documentation and/or
1087
+ other materials provided with the distribution.
1088
+
1089
+ THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
1090
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1091
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
1092
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
1093
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
1094
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
1095
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
1096
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
1097
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
1098
+ THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1099
+ -->
1100
--
26
--
1101
2.9.4
27
2.21.0
1102
28
1103
29
diff view generated by jsdifflib