1
The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:
1
The following changes since commit 352998df1c53b366413690d95b35f76d0721ebed:
2
2
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)
3
Merge tag 'i2c-20220314' of https://github.com/philmd/qemu into staging (2022-03-14 14:39:33 +0000)
4
4
5
are available in the git repository at:
5
are available in the git repository at:
6
6
7
https://github.com/jasowang/qemu.git tags/net-pull-request
7
https://github.com/jasowang/qemu.git tags/net-pull-request
8
8
9
for you to fetch changes up to 41aa2e3f9b27fd259a13711545d933a20f1d2f16:
9
for you to fetch changes up to 12a195fa343aae2ead1301ce04727bd0ae25eb15:
10
10
11
l2tpv3: fix RFC number typo in qemu-options.hx (2020-03-02 15:30:08 +0800)
11
vdpa: Expose VHOST_F_LOG_ALL on SVQ (2022-03-15 13:57:44 +0800)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
14
15
Changes since V2:
16
- fix 32bit build errros
17
15
----------------------------------------------------------------
18
----------------------------------------------------------------
16
Bin Meng (1):
19
Eugenio Pérez (14):
17
hw: net: cadence_gem: Fix build errors in DB_PRINT()
20
vhost: Add VhostShadowVirtqueue
21
vhost: Add Shadow VirtQueue kick forwarding capabilities
22
vhost: Add Shadow VirtQueue call forwarding capabilities
23
vhost: Add vhost_svq_valid_features to shadow vq
24
virtio: Add vhost_svq_get_vring_addr
25
vdpa: adapt vhost_ops callbacks to svq
26
vhost: Shadow virtqueue buffers forwarding
27
util: Add iova_tree_alloc_map
28
util: add iova_tree_find_iova
29
vhost: Add VhostIOVATree
30
vdpa: Add custom IOTLB translations to SVQ
31
vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
32
vdpa: Never set log_base addr if SVQ is enabled
33
vdpa: Expose VHOST_F_LOG_ALL on SVQ
18
34
19
Finn Thain (14):
35
Jason Wang (1):
20
dp8393x: Mask EOL bit from descriptor addresses
36
virtio-net: fix map leaking on error during receive
21
dp8393x: Always use 32-bit accesses
22
dp8393x: Clean up endianness hacks
23
dp8393x: Have dp8393x_receive() return the packet size
24
dp8393x: Update LLFA and CRDA registers from rx descriptor
25
dp8393x: Clear RRRA command register bit only when appropriate
26
dp8393x: Implement packet size limit and RBAE interrupt
27
dp8393x: Don't clobber packet checksum
28
dp8393x: Use long-word-aligned RRA pointers in 32-bit mode
29
dp8393x: Pad frames to word or long word boundary
30
dp8393x: Clear descriptor in_use field to release packet
31
dp8393x: Always update RRA pointers and sequence numbers
32
dp8393x: Don't reset Silicon Revision register
33
dp8393x: Don't stop reception upon RBE interrupt assertion
34
37
35
Lukas Straub (4):
38
hw/net/virtio-net.c | 1 +
36
block/replication.c: Ignore requests after failover
39
hw/virtio/meson.build | 2 +-
37
tests/test-replication.c: Add test for for secondary node continuing replication
40
hw/virtio/vhost-iova-tree.c | 110 +++++++
38
net/filter.c: Add Options to insert filters anywhere in the filter list
41
hw/virtio/vhost-iova-tree.h | 27 ++
39
colo: Update Documentation for continuous replication
42
hw/virtio/vhost-shadow-virtqueue.c | 636 +++++++++++++++++++++++++++++++++++++
40
43
hw/virtio/vhost-shadow-virtqueue.h | 87 +++++
41
Stefan Hajnoczi (1):
44
hw/virtio/vhost-vdpa.c | 522 +++++++++++++++++++++++++++++-
42
l2tpv3: fix RFC number typo in qemu-options.hx
45
include/hw/virtio/vhost-vdpa.h | 8 +
43
46
include/qemu/iova-tree.h | 38 ++-
44
Yuri Benditovich (3):
47
util/iova-tree.c | 170 ++++++++++
45
e1000e: Avoid hw_error if legacy mode used
48
10 files changed, 1584 insertions(+), 17 deletions(-)
46
NetRxPkt: Introduce support for additional hash types
49
create mode 100644 hw/virtio/vhost-iova-tree.c
47
NetRxPkt: fix hash calculation of IPV6 TCP
50
create mode 100644 hw/virtio/vhost-iova-tree.h
48
51
create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
49
block/replication.c | 35 ++++++-
52
create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
50
docs/COLO-FT.txt | 224 +++++++++++++++++++++++++++++++++------------
51
docs/block-replication.txt | 28 ++++--
52
hw/net/cadence_gem.c | 11 ++-
53
hw/net/dp8393x.c | 200 ++++++++++++++++++++++++++--------------
54
hw/net/e1000e_core.c | 15 +--
55
hw/net/net_rx_pkt.c | 44 ++++++++-
56
hw/net/net_rx_pkt.h | 6 +-
57
hw/net/trace-events | 4 +
58
include/net/filter.h | 2 +
59
net/filter.c | 92 ++++++++++++++++++-
60
qemu-options.hx | 35 +++++--
61
tests/test-replication.c | 52 +++++++++++
62
13 files changed, 591 insertions(+), 157 deletions(-)
63
53
64
54
55
diff view generated by jsdifflib
Deleted patch
1
From: Finn Thain <fthain@telegraphics.com.au>
2
1
3
The Least Significant bit of a descriptor address register is used as
4
an EOL flag. It has to be masked when the register value is to be used
5
as an actual address for copying memory around. But when the registers
6
are to be updated the EOL bit should not be masked.
7
8
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
9
Tested-by: Laurent Vivier <laurent@vivier.eu>
10
---
11
hw/net/dp8393x.c | 17 +++++++++++------
12
1 file changed, 11 insertions(+), 6 deletions(-)
13
14
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/net/dp8393x.c
17
+++ b/hw/net/dp8393x.c
18
@@ -XXX,XX +XXX,XX @@ do { printf("sonic ERROR: %s: " fmt, __func__ , ## __VA_ARGS__); } while (0)
19
#define SONIC_ISR_PINT 0x0800
20
#define SONIC_ISR_LCD 0x1000
21
22
+#define SONIC_DESC_EOL 0x0001
23
+#define SONIC_DESC_ADDR 0xFFFE
24
+
25
#define TYPE_DP8393X "dp8393x"
26
#define DP8393X(obj) OBJECT_CHECK(dp8393xState, (obj), TYPE_DP8393X)
27
28
@@ -XXX,XX +XXX,XX @@ static uint32_t dp8393x_crba(dp8393xState *s)
29
30
static uint32_t dp8393x_crda(dp8393xState *s)
31
{
32
- return (s->regs[SONIC_URDA] << 16) | s->regs[SONIC_CRDA];
33
+ return (s->regs[SONIC_URDA] << 16) |
34
+ (s->regs[SONIC_CRDA] & SONIC_DESC_ADDR);
35
}
36
37
static uint32_t dp8393x_rbwc(dp8393xState *s)
38
@@ -XXX,XX +XXX,XX @@ static uint32_t dp8393x_tsa(dp8393xState *s)
39
40
static uint32_t dp8393x_ttda(dp8393xState *s)
41
{
42
- return (s->regs[SONIC_UTDA] << 16) | s->regs[SONIC_TTDA];
43
+ return (s->regs[SONIC_UTDA] << 16) |
44
+ (s->regs[SONIC_TTDA] & SONIC_DESC_ADDR);
45
}
46
47
static uint32_t dp8393x_wt(dp8393xState *s)
48
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_transmit_packets(dp8393xState *s)
49
MEMTXATTRS_UNSPECIFIED, s->data,
50
size);
51
s->regs[SONIC_CTDA] = dp8393x_get(s, width, 0) & ~0x1;
52
- if (dp8393x_get(s, width, 0) & 0x1) {
53
+ if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
54
/* EOL detected */
55
break;
56
}
57
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
58
/* XXX: Check byte ordering */
59
60
/* Check for EOL */
61
- if (s->regs[SONIC_LLFA] & 0x1) {
62
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
63
/* Are we still in resource exhaustion? */
64
size = sizeof(uint16_t) * 1 * width;
65
address = dp8393x_crda(s) + sizeof(uint16_t) * 5 * width;
66
address_space_read(&s->as, address, MEMTXATTRS_UNSPECIFIED,
67
s->data, size);
68
- if (dp8393x_get(s, width, 0) & 0x1) {
69
+ if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
70
/* Still EOL ; stop reception */
71
return -1;
72
} else {
73
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
74
dp8393x_crda(s) + sizeof(uint16_t) * 5 * width,
75
MEMTXATTRS_UNSPECIFIED, s->data, size);
76
s->regs[SONIC_LLFA] = dp8393x_get(s, width, 0);
77
- if (s->regs[SONIC_LLFA] & 0x1) {
78
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
79
/* EOL detected */
80
s->regs[SONIC_ISR] |= SONIC_ISR_RDE;
81
} else {
82
--
83
2.5.0
84
85
diff view generated by jsdifflib
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
1
Commit bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
2
tries to fix the use after free of the sg by caching the virtqueue
3
elements in an array and unmap them at once after receiving the
4
packets, But it forgot to unmap the cached elements on error which
5
will lead to leaking of mapping and other unexpected results.
2
6
3
https://bugzilla.redhat.com/show_bug.cgi?id=1787142
7
Fixing this by detaching the cached elements on error. This addresses
4
The emulation issues hw_error if PSRCTL register
8
CVE-2022-26353.
5
is written, for example, with zero value.
6
Such configuration does not present any problem when
7
DTYP bits of RCTL register define legacy format of
8
transfer descriptors. Current commit discards check
9
for BSIZE0 and BSIZE1 when legacy mode used.
10
9
11
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
10
Reported-by: Victor Tom <vv474172261@gmail.com>
12
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
11
Cc: qemu-stable@nongnu.org
12
Fixes: CVE-2022-26353
13
Fixes: bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
14
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
16
---
15
hw/net/e1000e_core.c | 13 ++++++++-----
17
hw/net/virtio-net.c | 1 +
16
1 file changed, 8 insertions(+), 5 deletions(-)
18
1 file changed, 1 insertion(+)
17
19
18
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
20
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
19
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/net/e1000e_core.c
22
--- a/hw/net/virtio-net.c
21
+++ b/hw/net/e1000e_core.c
23
+++ b/hw/net/virtio-net.c
22
@@ -XXX,XX +XXX,XX @@ e1000e_set_eitr(E1000ECore *core, int index, uint32_t val)
24
@@ -XXX,XX +XXX,XX @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
23
static void
25
24
e1000e_set_psrctl(E1000ECore *core, int index, uint32_t val)
26
err:
25
{
27
for (j = 0; j < i; j++) {
26
- if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
28
+ virtqueue_detach_element(q->rx_vq, elems[j], lens[j]);
27
- hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
29
g_free(elems[j]);
28
- }
29
+ if (core->mac[RCTL] & E1000_RCTL_DTYP_MASK) {
30
+
31
+ if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
32
+ hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
33
+ }
34
35
- if ((val & E1000_PSRCTL_BSIZE1_MASK) == 0) {
36
- hw_error("e1000e: PSRCTL.BSIZE1 cannot be zero");
37
+ if ((val & E1000_PSRCTL_BSIZE1_MASK) == 0) {
38
+ hw_error("e1000e: PSRCTL.BSIZE1 cannot be zero");
39
+ }
40
}
30
}
41
31
42
core->mac[PSRCTL] = val;
43
--
32
--
44
2.5.0
33
2.7.4
45
46
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
The existing code has a bug where the Remaining Buffer Word Count (RBWC)
3
Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
4
is calculated with a truncating division, which gives the wrong result
4
notifications and buffers, allowing qemu to track them. While qemu is
5
for odd-sized packets.
5
forwarding the buffers and virtqueue changes, it is able to commit the
6
memory it's being dirtied, the same way regular qemu's VirtIO devices
7
do.
6
8
7
Section 1.4.1 of the datasheet says,
9
This commit only exposes basic SVQ allocation and free. Next patches of
10
the series add functionality like notifications and buffers forwarding.
8
11
9
Once the end of the packet has been reached, the serializer will
12
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
10
fill out the last word (16-bit mode) or long word (32-bit mode)
13
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11
if the last byte did not end on a word or long word boundary
12
respectively. The fill byte will be 0FFh.
13
14
Implement buffer padding so that buffer limits are correctly enforced.
15
16
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
17
Tested-by: Laurent Vivier <laurent@vivier.eu>
18
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
19
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
Signed-off-by: Jason Wang <jasowang@redhat.com>
20
---
15
---
21
hw/net/dp8393x.c | 39 ++++++++++++++++++++++++++++-----------
16
hw/virtio/meson.build | 2 +-
22
1 file changed, 28 insertions(+), 11 deletions(-)
17
hw/virtio/vhost-shadow-virtqueue.c | 62 ++++++++++++++++++++++++++++++++++++++
18
hw/virtio/vhost-shadow-virtqueue.h | 28 +++++++++++++++++
19
3 files changed, 91 insertions(+), 1 deletion(-)
20
create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
21
create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
23
22
24
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
23
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
25
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/net/dp8393x.c
25
--- a/hw/virtio/meson.build
27
+++ b/hw/net/dp8393x.c
26
+++ b/hw/virtio/meson.build
28
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
27
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
29
dp8393xState *s = qemu_get_nic_opaque(nc);
28
30
int packet_type;
29
virtio_ss = ss.source_set()
31
uint32_t available, address;
30
virtio_ss.add(files('virtio.c'))
32
- int width, rx_len = pkt_size;
31
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
33
+ int width, rx_len, padded_len;
32
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
34
uint32_t checksum;
33
virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
35
int size;
34
virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
36
35
virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
37
- width = (s->regs[SONIC_DCR] & SONIC_DCR_DW) ? 2 : 1;
36
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
38
-
37
new file mode 100644
39
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
38
index XXXXXXX..XXXXXXX
40
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
39
--- /dev/null
41
40
+++ b/hw/virtio/vhost-shadow-virtqueue.c
42
- if (pkt_size + 4 > dp8393x_rbwc(s) * 2) {
41
@@ -XXX,XX +XXX,XX @@
43
+ rx_len = pkt_size + sizeof(checksum);
42
+/*
44
+ if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
43
+ * vhost shadow virtqueue
45
+ width = 2;
44
+ *
46
+ padded_len = ((rx_len - 1) | 3) + 1;
45
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
47
+ } else {
46
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
48
+ width = 1;
47
+ *
49
+ padded_len = ((rx_len - 1) | 1) + 1;
48
+ * SPDX-License-Identifier: GPL-2.0-or-later
49
+ */
50
+
51
+#include "qemu/osdep.h"
52
+#include "hw/virtio/vhost-shadow-virtqueue.h"
53
+
54
+#include "qemu/error-report.h"
55
+
56
+/**
57
+ * Creates vhost shadow virtqueue, and instructs the vhost device to use the
58
+ * shadow methods and file descriptors.
59
+ *
60
+ * Returns the new virtqueue or NULL.
61
+ *
62
+ * In case of error, reason is reported through error_report.
63
+ */
64
+VhostShadowVirtqueue *vhost_svq_new(void)
65
+{
66
+ g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
67
+ int r;
68
+
69
+ r = event_notifier_init(&svq->hdev_kick, 0);
70
+ if (r != 0) {
71
+ error_report("Couldn't create kick event notifier: %s (%d)",
72
+ g_strerror(errno), errno);
73
+ goto err_init_hdev_kick;
50
+ }
74
+ }
51
+
75
+
52
+ if (padded_len > dp8393x_rbwc(s) * 2) {
76
+ r = event_notifier_init(&svq->hdev_call, 0);
53
DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
77
+ if (r != 0) {
54
s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
78
+ error_report("Couldn't create call event notifier: %s (%d)",
55
dp8393x_update_irq(s);
79
+ g_strerror(errno), errno);
56
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
80
+ goto err_init_hdev_call;
57
s->regs[SONIC_TRBA0] = s->regs[SONIC_CRBA0];
58
59
/* Calculate the ethernet checksum */
60
- checksum = cpu_to_le32(crc32(0, buf, rx_len));
61
+ checksum = cpu_to_le32(crc32(0, buf, pkt_size));
62
63
/* Put packet into RBA */
64
DPRINTF("Receive packet at %08x\n", dp8393x_crba(s));
65
address = dp8393x_crba(s);
66
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
67
- buf, rx_len);
68
- address += rx_len;
69
+ buf, pkt_size);
70
+ address += pkt_size;
71
+
72
+ /* Put frame checksum into RBA */
73
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
74
- &checksum, 4);
75
- address += 4;
76
- rx_len += 4;
77
+ &checksum, sizeof(checksum));
78
+ address += sizeof(checksum);
79
+
80
+ /* Pad short packets to keep pointers aligned */
81
+ if (rx_len < padded_len) {
82
+ size = padded_len - rx_len;
83
+ address_space_rw(&s->as, address, MEMTXATTRS_UNSPECIFIED,
84
+ (uint8_t *)"\xFF\xFF\xFF", size, 1);
85
+ address += size;
86
+ }
81
+ }
87
+
82
+
88
s->regs[SONIC_CRBA1] = address >> 16;
83
+ return g_steal_pointer(&svq);
89
s->regs[SONIC_CRBA0] = address & 0xffff;
84
+
90
available = dp8393x_rbwc(s);
85
+err_init_hdev_call:
91
- available -= rx_len / 2;
86
+ event_notifier_cleanup(&svq->hdev_kick);
92
+ available -= padded_len >> 1;
87
+
93
s->regs[SONIC_RBWC1] = available >> 16;
88
+err_init_hdev_kick:
94
s->regs[SONIC_RBWC0] = available & 0xffff;
89
+ return NULL;
95
90
+}
91
+
92
+/**
93
+ * Free the resources of the shadow virtqueue.
94
+ *
95
+ * @pvq: gpointer to SVQ so it can be used by autofree functions.
96
+ */
97
+void vhost_svq_free(gpointer pvq)
98
+{
99
+ VhostShadowVirtqueue *vq = pvq;
100
+ event_notifier_cleanup(&vq->hdev_kick);
101
+ event_notifier_cleanup(&vq->hdev_call);
102
+ g_free(vq);
103
+}
104
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
105
new file mode 100644
106
index XXXXXXX..XXXXXXX
107
--- /dev/null
108
+++ b/hw/virtio/vhost-shadow-virtqueue.h
109
@@ -XXX,XX +XXX,XX @@
110
+/*
111
+ * vhost shadow virtqueue
112
+ *
113
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
114
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
115
+ *
116
+ * SPDX-License-Identifier: GPL-2.0-or-later
117
+ */
118
+
119
+#ifndef VHOST_SHADOW_VIRTQUEUE_H
120
+#define VHOST_SHADOW_VIRTQUEUE_H
121
+
122
+#include "qemu/event_notifier.h"
123
+
124
+/* Shadow virtqueue to relay notifications */
125
+typedef struct VhostShadowVirtqueue {
126
+ /* Shadow kick notifier, sent to vhost */
127
+ EventNotifier hdev_kick;
128
+ /* Shadow call notifier, sent to vhost */
129
+ EventNotifier hdev_call;
130
+} VhostShadowVirtqueue;
131
+
132
+VhostShadowVirtqueue *vhost_svq_new(void);
133
+
134
+void vhost_svq_free(gpointer vq);
135
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
136
+
137
+#endif
96
--
138
--
97
2.5.0
139
2.7.4
98
140
99
141
diff view generated by jsdifflib
1
From: Lukas Straub <lukasstraub2@web.de>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
After failover the Secondary side of replication shouldn't change state, because
3
At this mode no buffer forwarding will be performed in SVQ mode: Qemu
4
it now functions as our primary disk.
4
will just forward the guest's kicks to the device.
5
5
6
In replication_start, replication_do_checkpoint, replication_stop, ignore
6
Host memory notifiers regions are left out for simplicity, and they will
7
the request if current state is BLOCK_REPLICATION_DONE (sucessful failover) or
7
not be addressed in this series.
8
BLOCK_REPLICATION_FAILOVER (failover in progres i.e. currently merging active
8
9
and hidden images into the base image).
9
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
10
10
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
12
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
13
Acked-by: Max Reitz <mreitz@redhat.com>
14
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
15
---
12
---
16
block/replication.c | 35 ++++++++++++++++++++++++++++++++++-
13
hw/virtio/vhost-shadow-virtqueue.c | 55 ++++++++++++++
17
1 file changed, 34 insertions(+), 1 deletion(-)
14
hw/virtio/vhost-shadow-virtqueue.h | 14 ++++
18
15
hw/virtio/vhost-vdpa.c | 144 ++++++++++++++++++++++++++++++++++++-
19
diff --git a/block/replication.c b/block/replication.c
16
include/hw/virtio/vhost-vdpa.h | 4 ++
17
4 files changed, 215 insertions(+), 2 deletions(-)
18
19
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
20
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
21
--- a/block/replication.c
21
--- a/hw/virtio/vhost-shadow-virtqueue.c
22
+++ b/block/replication.c
22
+++ b/hw/virtio/vhost-shadow-virtqueue.c
23
@@ -XXX,XX +XXX,XX @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
23
@@ -XXX,XX +XXX,XX @@
24
aio_context_acquire(aio_context);
24
#include "hw/virtio/vhost-shadow-virtqueue.h"
25
s = bs->opaque;
25
26
26
#include "qemu/error-report.h"
27
+ if (s->stage == BLOCK_REPLICATION_DONE ||
27
+#include "qemu/main-loop.h"
28
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
28
+#include "linux-headers/linux/vhost.h"
29
+ /*
29
+
30
+ * This case happens when a secondary is promoted to primary.
30
+/**
31
+ * Ignore the request because the secondary side of replication
31
+ * Forward guest notifications.
32
+ * doesn't have to do anything anymore.
32
+ *
33
+ */
33
+ * @n: guest kick event notifier, the one that guest set to notify svq.
34
+ aio_context_release(aio_context);
34
+ */
35
+static void vhost_handle_guest_kick(EventNotifier *n)
36
+{
37
+ VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue, svq_kick);
38
+ event_notifier_test_and_clear(n);
39
+ event_notifier_set(&svq->hdev_kick);
40
+}
41
+
42
+/**
43
+ * Set a new file descriptor for the guest to kick the SVQ and notify for avail
44
+ *
45
+ * @svq: The svq
46
+ * @svq_kick_fd: The svq kick fd
47
+ *
48
+ * Note that the SVQ will never close the old file descriptor.
49
+ */
50
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
51
+{
52
+ EventNotifier *svq_kick = &svq->svq_kick;
53
+ bool poll_stop = VHOST_FILE_UNBIND != event_notifier_get_fd(svq_kick);
54
+ bool poll_start = svq_kick_fd != VHOST_FILE_UNBIND;
55
+
56
+ if (poll_stop) {
57
+ event_notifier_set_handler(svq_kick, NULL);
58
+ }
59
+
60
+ /*
61
+ * event_notifier_set_handler already checks for guest's notifications if
62
+ * they arrive at the new file descriptor in the switch, so there is no
63
+ * need to explicitly check for them.
64
+ */
65
+ if (poll_start) {
66
+ event_notifier_init_fd(svq_kick, svq_kick_fd);
67
+ event_notifier_set(svq_kick);
68
+ event_notifier_set_handler(svq_kick, vhost_handle_guest_kick);
69
+ }
70
+}
71
+
72
+/**
73
+ * Stop the shadow virtqueue operation.
74
+ * @svq: Shadow Virtqueue
75
+ */
76
+void vhost_svq_stop(VhostShadowVirtqueue *svq)
77
+{
78
+ event_notifier_set_handler(&svq->svq_kick, NULL);
79
+}
80
81
/**
82
* Creates vhost shadow virtqueue, and instructs the vhost device to use the
83
@@ -XXX,XX +XXX,XX @@ VhostShadowVirtqueue *vhost_svq_new(void)
84
goto err_init_hdev_call;
85
}
86
87
+ event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
88
return g_steal_pointer(&svq);
89
90
err_init_hdev_call:
91
@@ -XXX,XX +XXX,XX @@ err_init_hdev_kick:
92
void vhost_svq_free(gpointer pvq)
93
{
94
VhostShadowVirtqueue *vq = pvq;
95
+ vhost_svq_stop(vq);
96
event_notifier_cleanup(&vq->hdev_kick);
97
event_notifier_cleanup(&vq->hdev_call);
98
g_free(vq);
99
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
100
index XXXXXXX..XXXXXXX 100644
101
--- a/hw/virtio/vhost-shadow-virtqueue.h
102
+++ b/hw/virtio/vhost-shadow-virtqueue.h
103
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
104
EventNotifier hdev_kick;
105
/* Shadow call notifier, sent to vhost */
106
EventNotifier hdev_call;
107
+
108
+ /*
109
+ * Borrowed virtqueue's guest to host notifier. To borrow it in this event
110
+ * notifier allows to recover the VhostShadowVirtqueue from the event loop
111
+ * easily. If we use the VirtQueue's one, we don't have an easy way to
112
+ * retrieve VhostShadowVirtqueue.
113
+ *
114
+ * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
115
+ */
116
+ EventNotifier svq_kick;
117
} VhostShadowVirtqueue;
118
119
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
120
+
121
+void vhost_svq_stop(VhostShadowVirtqueue *svq);
122
+
123
VhostShadowVirtqueue *vhost_svq_new(void);
124
125
void vhost_svq_free(gpointer vq);
126
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
127
index XXXXXXX..XXXXXXX 100644
128
--- a/hw/virtio/vhost-vdpa.c
129
+++ b/hw/virtio/vhost-vdpa.c
130
@@ -XXX,XX +XXX,XX @@
131
#include "hw/virtio/vhost.h"
132
#include "hw/virtio/vhost-backend.h"
133
#include "hw/virtio/virtio-net.h"
134
+#include "hw/virtio/vhost-shadow-virtqueue.h"
135
#include "hw/virtio/vhost-vdpa.h"
136
#include "exec/address-spaces.h"
137
#include "qemu/main-loop.h"
138
#include "cpu.h"
139
#include "trace.h"
140
#include "qemu-common.h"
141
+#include "qapi/error.h"
142
143
/*
144
* Return one past the end of the end of section. Be careful with uint64_t
145
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
146
return v->index != 0;
147
}
148
149
+static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
150
+ Error **errp)
151
+{
152
+ g_autoptr(GPtrArray) shadow_vqs = NULL;
153
+
154
+ if (!v->shadow_vqs_enabled) {
155
+ return 0;
156
+ }
157
+
158
+ shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
159
+ for (unsigned n = 0; n < hdev->nvqs; ++n) {
160
+ g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
161
+
162
+ if (unlikely(!svq)) {
163
+ error_setg(errp, "Cannot create svq %u", n);
164
+ return -1;
165
+ }
166
+ g_ptr_array_add(shadow_vqs, g_steal_pointer(&svq));
167
+ }
168
+
169
+ v->shadow_vqs = g_steal_pointer(&shadow_vqs);
170
+ return 0;
171
+}
172
+
173
static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
174
{
175
struct vhost_vdpa *v;
176
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
177
dev->opaque = opaque ;
178
v->listener = vhost_vdpa_memory_listener;
179
v->msg_type = VHOST_IOTLB_MSG_V2;
180
+ ret = vhost_vdpa_init_svq(dev, v, errp);
181
+ if (ret) {
182
+ goto err;
183
+ }
184
185
vhost_vdpa_get_iova_range(v);
186
187
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
188
VIRTIO_CONFIG_S_DRIVER);
189
190
return 0;
191
+
192
+err:
193
+ ram_block_discard_disable(false);
194
+ return ret;
195
}
196
197
static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
198
@@ -XXX,XX +XXX,XX @@ static void vhost_vdpa_host_notifiers_uninit(struct vhost_dev *dev, int n)
199
200
static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
201
{
202
+ struct vhost_vdpa *v = dev->opaque;
203
int i;
204
205
+ if (v->shadow_vqs_enabled) {
206
+ /* FIXME SVQ is not compatible with host notifiers mr */
35
+ return;
207
+ return;
36
+ }
208
+ }
37
+
209
+
38
if (s->stage != BLOCK_REPLICATION_NONE) {
210
for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
39
error_setg(errp, "Block replication is running or done");
211
if (vhost_vdpa_host_notifier_init(dev, i)) {
40
aio_context_release(aio_context);
212
goto err;
41
@@ -XXX,XX +XXX,XX @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
213
@@ -XXX,XX +XXX,XX @@ err:
42
aio_context_acquire(aio_context);
214
return;
43
s = bs->opaque;
215
}
44
216
45
+ if (s->stage == BLOCK_REPLICATION_DONE ||
217
+static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
46
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
218
+{
47
+ /*
219
+ struct vhost_vdpa *v = dev->opaque;
48
+ * This case happens when a secondary was promoted to primary.
220
+ size_t idx;
49
+ * Ignore the request because the secondary side of replication
221
+
50
+ * doesn't have to do anything anymore.
222
+ if (!v->shadow_vqs) {
51
+ */
52
+ aio_context_release(aio_context);
53
+ return;
223
+ return;
54
+ }
224
+ }
55
+
225
+
56
if (s->mode == REPLICATION_MODE_SECONDARY) {
226
+ for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
57
secondary_do_checkpoint(s, errp);
227
+ vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
58
}
228
+ }
59
@@ -XXX,XX +XXX,XX @@ static void replication_get_error(ReplicationState *rs, Error **errp)
229
+ g_ptr_array_free(v->shadow_vqs, true);
60
aio_context_acquire(aio_context);
230
+}
61
s = bs->opaque;
231
+
62
232
static int vhost_vdpa_cleanup(struct vhost_dev *dev)
63
- if (s->stage != BLOCK_REPLICATION_RUNNING) {
233
{
64
+ if (s->stage == BLOCK_REPLICATION_NONE) {
234
struct vhost_vdpa *v;
65
error_setg(errp, "Block replication is not running");
235
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
66
aio_context_release(aio_context);
236
trace_vhost_vdpa_cleanup(dev, v);
67
return;
237
vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
68
@@ -XXX,XX +XXX,XX @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
238
memory_listener_unregister(&v->listener);
69
aio_context_acquire(aio_context);
239
+ vhost_vdpa_svq_cleanup(dev);
70
s = bs->opaque;
240
71
241
dev->opaque = NULL;
72
+ if (s->stage == BLOCK_REPLICATION_DONE ||
242
ram_block_discard_disable(false);
73
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
243
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
74
+ /*
244
return ret;
75
+ * This case happens when a secondary was promoted to primary.
245
}
76
+ * Ignore the request because the secondary side of replication
246
77
+ * doesn't have to do anything anymore.
247
+static void vhost_vdpa_reset_svq(struct vhost_vdpa *v)
78
+ */
248
+{
79
+ aio_context_release(aio_context);
249
+ if (!v->shadow_vqs_enabled) {
80
+ return;
250
+ return;
81
+ }
251
+ }
82
+
252
+
83
if (s->stage != BLOCK_REPLICATION_RUNNING) {
253
+ for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
84
error_setg(errp, "Block replication is not running");
254
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
85
aio_context_release(aio_context);
255
+ vhost_svq_stop(svq);
256
+ }
257
+}
258
+
259
static int vhost_vdpa_reset_device(struct vhost_dev *dev)
260
{
261
+ struct vhost_vdpa *v = dev->opaque;
262
int ret;
263
uint8_t status = 0;
264
265
+ vhost_vdpa_reset_svq(v);
266
+
267
ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
268
trace_vhost_vdpa_reset_device(dev, status);
269
return ret;
270
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
271
return ret;
272
}
273
274
+static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
275
+ struct vhost_vring_file *file)
276
+{
277
+ trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
278
+ return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
279
+}
280
+
281
+/**
282
+ * Set the shadow virtqueue descriptors to the device
283
+ *
284
+ * @dev: The vhost device model
285
+ * @svq: The shadow virtqueue
286
+ * @idx: The index of the virtqueue in the vhost device
287
+ * @errp: Error
288
+ */
289
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
290
+ VhostShadowVirtqueue *svq, unsigned idx,
291
+ Error **errp)
292
+{
293
+ struct vhost_vring_file file = {
294
+ .index = dev->vq_index + idx,
295
+ };
296
+ const EventNotifier *event_notifier = &svq->hdev_kick;
297
+ int r;
298
+
299
+ file.fd = event_notifier_get_fd(event_notifier);
300
+ r = vhost_vdpa_set_vring_dev_kick(dev, &file);
301
+ if (unlikely(r != 0)) {
302
+ error_setg_errno(errp, -r, "Can't set device kick fd");
303
+ }
304
+
305
+ return r == 0;
306
+}
307
+
308
+static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
309
+{
310
+ struct vhost_vdpa *v = dev->opaque;
311
+ Error *err = NULL;
312
+ unsigned i;
313
+
314
+ if (!v->shadow_vqs) {
315
+ return true;
316
+ }
317
+
318
+ for (i = 0; i < v->shadow_vqs->len; ++i) {
319
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
320
+ bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
321
+ if (unlikely(!ok)) {
322
+ error_reportf_err(err, "Cannot setup SVQ %u: ", i);
323
+ return false;
324
+ }
325
+ }
326
+
327
+ return true;
328
+}
329
+
330
static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
331
{
332
struct vhost_vdpa *v = dev->opaque;
333
+ bool ok;
334
trace_vhost_vdpa_dev_start(dev, started);
335
336
if (started) {
337
vhost_vdpa_host_notifiers_init(dev);
338
+ ok = vhost_vdpa_svqs_start(dev);
339
+ if (unlikely(!ok)) {
340
+ return -1;
341
+ }
342
vhost_vdpa_set_vring_ready(dev);
343
} else {
344
vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
345
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
346
static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
347
struct vhost_vring_file *file)
348
{
349
- trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
350
- return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
351
+ struct vhost_vdpa *v = dev->opaque;
352
+ int vdpa_idx = file->index - dev->vq_index;
353
+
354
+ if (v->shadow_vqs_enabled) {
355
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
356
+ vhost_svq_set_svq_kick_fd(svq, file->fd);
357
+ return 0;
358
+ } else {
359
+ return vhost_vdpa_set_vring_dev_kick(dev, file);
360
+ }
361
}
362
363
static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
364
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
365
index XXXXXXX..XXXXXXX 100644
366
--- a/include/hw/virtio/vhost-vdpa.h
367
+++ b/include/hw/virtio/vhost-vdpa.h
368
@@ -XXX,XX +XXX,XX @@
369
#ifndef HW_VIRTIO_VHOST_VDPA_H
370
#define HW_VIRTIO_VHOST_VDPA_H
371
372
+#include <gmodule.h>
373
+
374
#include "hw/virtio/virtio.h"
375
#include "standard-headers/linux/vhost_types.h"
376
377
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
378
bool iotlb_batch_begin_sent;
379
MemoryListener listener;
380
struct vhost_vdpa_iova_range iova_range;
381
+ bool shadow_vqs_enabled;
382
+ GPtrArray *shadow_vqs;
383
struct vhost_dev *dev;
384
VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
385
} VhostVDPA;
86
--
386
--
87
2.5.0
387
2.7.4
88
388
89
389
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
Section 3.4.7 of the datasheet explains that,
3
This will make qemu aware of the device used buffers, allowing it to
4
write the guest memory with its contents if needed.
4
5
5
The RBE bit in the Interrupt Status register is set when the
6
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
6
SONIC finishes using the second to last receive buffer and reads
7
Acked-by: Michael S. Tsirkin <mst@redhat.com>
7
the last RRA descriptor. Actually, the SONIC is not truly out of
8
resources, but gives the system an early warning of an impending
9
out of resources condition.
10
11
RBE does not mean actual receive buffer exhaustion, and reception should
12
not be stopped. This is important because Linux will not check and clear
13
the RBE interrupt until it receives another packet. But that won't
14
happen if can_receive returns false. This bug causes the SONIC to become
15
deaf (until reset).
16
17
Fix this with a new flag to indicate actual receive buffer exhaustion.
18
19
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
20
Tested-by: Laurent Vivier <laurent@vivier.eu>
21
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
Signed-off-by: Jason Wang <jasowang@redhat.com>
22
---
9
---
23
hw/net/dp8393x.c | 35 ++++++++++++++++++++++-------------
10
hw/virtio/vhost-shadow-virtqueue.c | 38 ++++++++++++++++++++++++++++++++++++++
24
1 file changed, 22 insertions(+), 13 deletions(-)
11
hw/virtio/vhost-shadow-virtqueue.h | 4 ++++
12
hw/virtio/vhost-vdpa.c | 31 +++++++++++++++++++++++++++++--
13
3 files changed, 71 insertions(+), 2 deletions(-)
25
14
26
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
15
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
27
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
28
--- a/hw/net/dp8393x.c
17
--- a/hw/virtio/vhost-shadow-virtqueue.c
29
+++ b/hw/net/dp8393x.c
18
+++ b/hw/virtio/vhost-shadow-virtqueue.c
30
@@ -XXX,XX +XXX,XX @@ typedef struct dp8393xState {
19
@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(EventNotifier *n)
31
/* Hardware */
20
}
32
uint8_t it_shift;
21
33
bool big_endian;
22
/**
34
+ bool last_rba_is_full;
23
+ * Forward vhost notifications
35
qemu_irq irq;
24
+ *
36
#ifdef DEBUG_SONIC
25
+ * @n: hdev call event notifier, the one that device set to notify svq.
37
int irq_level;
26
+ */
38
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_read_rra(dp8393xState *s)
27
+static void vhost_svq_handle_call(EventNotifier *n)
39
s->regs[SONIC_RRP] = s->regs[SONIC_RSA];
28
+{
29
+ VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
30
+ hdev_call);
31
+ event_notifier_test_and_clear(n);
32
+ event_notifier_set(&svq->svq_call);
33
+}
34
+
35
+/**
36
+ * Set the call notifier for the SVQ to call the guest
37
+ *
38
+ * @svq: Shadow virtqueue
39
+ * @call_fd: call notifier
40
+ *
41
+ * Called on BQL context.
42
+ */
43
+void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd)
44
+{
45
+ if (call_fd == VHOST_FILE_UNBIND) {
46
+ /*
47
+ * Fail event_notifier_set if called handling device call.
48
+ *
49
+ * SVQ still needs device notifications, since it needs to keep
50
+ * forwarding used buffers even with the unbind.
51
+ */
52
+ memset(&svq->svq_call, 0, sizeof(svq->svq_call));
53
+ } else {
54
+ event_notifier_init_fd(&svq->svq_call, call_fd);
55
+ }
56
+}
57
+
58
+/**
59
* Set a new file descriptor for the guest to kick the SVQ and notify for avail
60
*
61
* @svq: The svq
62
@@ -XXX,XX +XXX,XX @@ VhostShadowVirtqueue *vhost_svq_new(void)
40
}
63
}
41
64
42
- /* Check resource exhaustion */
65
event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
43
+ /* Warn the host if CRBA now has the last available resource */
66
+ event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
44
if (s->regs[SONIC_RRP] == s->regs[SONIC_RWP])
67
return g_steal_pointer(&svq);
45
{
68
46
s->regs[SONIC_ISR] |= SONIC_ISR_RBE;
69
err_init_hdev_call:
47
dp8393x_update_irq(s);
70
@@ -XXX,XX +XXX,XX @@ void vhost_svq_free(gpointer pvq)
48
}
71
VhostShadowVirtqueue *vq = pvq;
72
vhost_svq_stop(vq);
73
event_notifier_cleanup(&vq->hdev_kick);
74
+ event_notifier_set_handler(&vq->hdev_call, NULL);
75
event_notifier_cleanup(&vq->hdev_call);
76
g_free(vq);
77
}
78
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/virtio/vhost-shadow-virtqueue.h
81
+++ b/hw/virtio/vhost-shadow-virtqueue.h
82
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
83
* So shadow virtqueue must not clean it, or we would lose VirtQueue one.
84
*/
85
EventNotifier svq_kick;
49
+
86
+
50
+ /* Allow packet reception */
87
+ /* Guest's call notifier, where the SVQ calls guest. */
51
+ s->last_rba_is_full = false;
88
+ EventNotifier svq_call;
89
} VhostShadowVirtqueue;
90
91
void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
92
+void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
93
94
void vhost_svq_stop(VhostShadowVirtqueue *svq);
95
96
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
97
index XXXXXXX..XXXXXXX 100644
98
--- a/hw/virtio/vhost-vdpa.c
99
+++ b/hw/virtio/vhost-vdpa.c
100
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
101
return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
52
}
102
}
53
103
54
static void dp8393x_do_software_reset(dp8393xState *s)
104
+static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
55
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
105
+ struct vhost_vring_file *file)
56
dp8393x_do_read_rra(s);
106
+{
57
}
107
+ trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
58
dp8393x_update_irq(s);
108
+ return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
59
- if (dp8393x_can_receive(s->nic->ncs)) {
109
+}
60
- qemu_flush_queued_packets(qemu_get_queue(s->nic));
110
+
61
- }
111
/**
62
break;
112
* Set the shadow virtqueue descriptors to the device
63
/* The guest is required to store aligned pointers here */
113
*
64
case SONIC_RSA:
114
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
65
@@ -XXX,XX +XXX,XX @@ static int dp8393x_can_receive(NetClientState *nc)
115
* @svq: The shadow virtqueue
66
116
* @idx: The index of the virtqueue in the vhost device
67
if (!(s->regs[SONIC_CR] & SONIC_CR_RXEN))
117
* @errp: Error
68
return 0;
118
+ *
69
- if (s->regs[SONIC_ISR] & SONIC_ISR_RBE)
119
+ * Note that this function does not rewind kick file descriptor if cannot set
70
- return 0;
120
+ * call one.
71
return 1;
121
*/
72
}
122
static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
73
123
VhostShadowVirtqueue *svq, unsigned idx,
74
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
124
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
75
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
125
r = vhost_vdpa_set_vring_dev_kick(dev, &file);
76
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
126
if (unlikely(r != 0)) {
77
127
error_setg_errno(errp, -r, "Can't set device kick fd");
78
+ if (s->last_rba_is_full) {
128
+ return false;
79
+ return pkt_size;
80
+ }
129
+ }
81
+
130
+
82
rx_len = pkt_size + sizeof(checksum);
131
+ event_notifier = &svq->hdev_call;
83
if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
132
+ file.fd = event_notifier_get_fd(event_notifier);
84
width = 2;
133
+ r = vhost_vdpa_set_vring_dev_call(dev, &file);
85
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
134
+ if (unlikely(r != 0)) {
86
DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
135
+ error_setg_errno(errp, -r, "Can't set device call fd");
87
s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
88
dp8393x_update_irq(s);
89
- dp8393x_do_read_rra(s);
90
- return pkt_size;
91
+ s->regs[SONIC_RCR] |= SONIC_RCR_LPKT;
92
+ goto done;
93
}
136
}
94
137
95
packet_type = dp8393x_receive_filter(s, buf, pkt_size);
138
return r == 0;
96
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
139
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
97
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
140
static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
98
}
141
struct vhost_vring_file *file)
99
142
{
100
+ dp8393x_update_irq(s);
143
- trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
144
- return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
145
+ struct vhost_vdpa *v = dev->opaque;
101
+
146
+
102
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) |
147
+ if (v->shadow_vqs_enabled) {
103
((s->regs[SONIC_RSC] + 1) & 0x00ff);
148
+ int vdpa_idx = file->index - dev->vq_index;
104
149
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
105
+done:
106
+
150
+
107
if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
151
+ vhost_svq_set_svq_call_fd(svq, file->fd);
108
- /* Read next RRA */
152
+ return 0;
109
- dp8393x_do_read_rra(s);
153
+ } else {
110
+ if (s->regs[SONIC_RRP] == s->regs[SONIC_RWP]) {
154
+ return vhost_vdpa_set_vring_dev_call(dev, file);
111
+ /* Stop packet reception */
155
+ }
112
+ s->last_rba_is_full = true;
113
+ } else {
114
+ /* Read next resource */
115
+ dp8393x_do_read_rra(s);
116
+ }
117
}
118
119
- /* Done */
120
- dp8393x_update_irq(s);
121
-
122
return pkt_size;
123
}
156
}
124
157
158
static int vhost_vdpa_get_features(struct vhost_dev *dev,
125
--
159
--
126
2.5.0
160
2.7.4
127
161
128
162
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
When the SONIC receives a packet into the last available descriptor, it
3
This allows SVQ to negotiate features with the guest and the device. For
4
retains ownership of that descriptor for as long as necessary.
4
the device, SVQ is a driver. While this function bypasses all
5
non-transport features, it needs to disable the features that SVQ does
6
not support when forwarding buffers. This includes packed vq layout,
7
indirect descriptors or event idx.
5
8
6
Section 3.4.7 of the datasheet says,
9
Future changes can add support to offer more features to the guest,
10
since the use of VirtQueue gives this for free. This is left out at the
11
moment for simplicity.
7
12
8
When the system appends more descriptors, the SONIC releases ownership
13
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
9
of the descriptor after writing 0000h to the RXpkt.in_use field.
14
Acked-by: Michael S. Tsirkin <mst@redhat.com>
10
11
The packet can now be processed by the host, so raise a PKTRX interrupt,
12
just like the normal case.
13
14
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
15
Tested-by: Laurent Vivier <laurent@vivier.eu>
16
Signed-off-by: Jason Wang <jasowang@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
17
---
16
---
18
hw/net/dp8393x.c | 10 ++++++++++
17
hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++++++++++
19
1 file changed, 10 insertions(+)
18
hw/virtio/vhost-shadow-virtqueue.h | 2 ++
19
hw/virtio/vhost-vdpa.c | 15 +++++++++++++
20
3 files changed, 61 insertions(+)
20
21
21
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
22
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
22
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
23
--- a/hw/net/dp8393x.c
24
--- a/hw/virtio/vhost-shadow-virtqueue.c
24
+++ b/hw/net/dp8393x.c
25
+++ b/hw/virtio/vhost-shadow-virtqueue.c
25
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
26
@@ -XXX,XX +XXX,XX @@
26
return -1;
27
#include "hw/virtio/vhost-shadow-virtqueue.h"
27
}
28
28
/* Link has been updated by host */
29
#include "qemu/error-report.h"
30
+#include "qapi/error.h"
31
#include "qemu/main-loop.h"
32
#include "linux-headers/linux/vhost.h"
33
34
/**
35
+ * Validate the transport device features that both guests can use with the SVQ
36
+ * and SVQs can use with the device.
37
+ *
38
+ * @dev_features: The features
39
+ * @errp: Error pointer
40
+ */
41
+bool vhost_svq_valid_features(uint64_t features, Error **errp)
42
+{
43
+ bool ok = true;
44
+ uint64_t svq_features = features;
29
+
45
+
30
+ /* Clear in_use */
46
+ for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
31
+ size = sizeof(uint16_t) * width;
47
+ ++b) {
32
+ address = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
48
+ switch (b) {
33
+ dp8393x_put(s, width, 0, 0);
49
+ case VIRTIO_F_ANY_LAYOUT:
34
+ address_space_rw(&s->as, address, MEMTXATTRS_UNSPECIFIED,
50
+ continue;
35
+ (uint8_t *)s->data, size, 1);
36
+
51
+
37
+ /* Move to next descriptor */
52
+ case VIRTIO_F_ACCESS_PLATFORM:
38
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
53
+ /* SVQ trust in the host's IOMMU to translate addresses */
39
+ s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
54
+ case VIRTIO_F_VERSION_1:
55
+ /* SVQ trust that the guest vring is little endian */
56
+ if (!(svq_features & BIT_ULL(b))) {
57
+ svq_features |= BIT_ULL(b);
58
+ ok = false;
59
+ }
60
+ continue;
61
+
62
+ default:
63
+ if (svq_features & BIT_ULL(b)) {
64
+ svq_features &= ~BIT_ULL(b);
65
+ ok = false;
66
+ }
67
+ }
68
+ }
69
+
70
+ if (!ok) {
71
+ error_setg(errp, "SVQ Invalid device feature flags, offer: 0x%"PRIx64
72
+ ", ok: 0x%"PRIx64, features, svq_features);
73
+ }
74
+ return ok;
75
+}
76
+
77
+/**
78
* Forward guest notifications.
79
*
80
* @n: guest kick event notifier, the one that guest set to notify svq.
81
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
82
index XXXXXXX..XXXXXXX 100644
83
--- a/hw/virtio/vhost-shadow-virtqueue.h
84
+++ b/hw/virtio/vhost-shadow-virtqueue.h
85
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
86
EventNotifier svq_call;
87
} VhostShadowVirtqueue;
88
89
+bool vhost_svq_valid_features(uint64_t features, Error **errp);
90
+
91
void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
92
void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
93
94
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
95
index XXXXXXX..XXXXXXX 100644
96
--- a/hw/virtio/vhost-vdpa.c
97
+++ b/hw/virtio/vhost-vdpa.c
98
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
99
Error **errp)
100
{
101
g_autoptr(GPtrArray) shadow_vqs = NULL;
102
+ uint64_t dev_features, svq_features;
103
+ int r;
104
+ bool ok;
105
106
if (!v->shadow_vqs_enabled) {
107
return 0;
40
}
108
}
41
109
42
/* Save current position */
110
+ r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);
111
+ if (r != 0) {
112
+ error_setg_errno(errp, -r, "Can't get vdpa device features");
113
+ return r;
114
+ }
115
+
116
+ svq_features = dev_features;
117
+ ok = vhost_svq_valid_features(svq_features, errp);
118
+ if (unlikely(!ok)) {
119
+ return -1;
120
+ }
121
+
122
shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
123
for (unsigned n = 0; n < hdev->nvqs; ++n) {
124
g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
43
--
125
--
44
2.5.0
126
2.7.4
45
127
46
128
diff view generated by jsdifflib
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
Add support for following hash types:
3
It reports the shadow virtqueue address from qemu virtual address space.
4
IPV6 TCP with extension headers
5
IPV4 UDP
6
IPV6 UDP
7
IPV6 UDP with extension headers
8
4
9
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
5
Since this will be different from the guest's vaddr, but the device can
10
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
6
access it, SVQ takes special care about its alignment & lack of garbage
7
data. It assumes that IOMMU will work in host_page_size ranges for that.
8
9
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
10
Acked-by: Michael S. Tsirkin <mst@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
---
12
---
13
hw/net/net_rx_pkt.c | 42 ++++++++++++++++++++++++++++++++++++++++++
13
hw/virtio/vhost-shadow-virtqueue.c | 29 +++++++++++++++++++++++++++++
14
hw/net/net_rx_pkt.h | 6 +++++-
14
hw/virtio/vhost-shadow-virtqueue.h | 9 +++++++++
15
hw/net/trace-events | 4 ++++
15
2 files changed, 38 insertions(+)
16
3 files changed, 51 insertions(+), 1 deletion(-)
17
16
18
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
17
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
19
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/net/net_rx_pkt.c
19
--- a/hw/virtio/vhost-shadow-virtqueue.c
21
+++ b/hw/net/net_rx_pkt.c
20
+++ b/hw/virtio/vhost-shadow-virtqueue.c
22
@@ -XXX,XX +XXX,XX @@ _net_rx_rss_prepare_tcp(uint8_t *rss_input,
21
@@ -XXX,XX +XXX,XX @@ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd)
23
&tcphdr->th_dport, sizeof(uint16_t));
24
}
22
}
25
23
26
+static inline void
24
/**
27
+_net_rx_rss_prepare_udp(uint8_t *rss_input,
25
+ * Get the shadow vq vring address.
28
+ struct NetRxPkt *pkt,
26
+ * @svq: Shadow virtqueue
29
+ size_t *bytes_written)
27
+ * @addr: Destination to store address
28
+ */
29
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
30
+ struct vhost_vring_addr *addr)
30
+{
31
+{
31
+ struct udp_header *udphdr = &pkt->l4hdr_info.hdr.udp;
32
+ addr->desc_user_addr = (uint64_t)(intptr_t)svq->vring.desc;
32
+
33
+ addr->avail_user_addr = (uint64_t)(intptr_t)svq->vring.avail;
33
+ _net_rx_rss_add_chunk(rss_input, bytes_written,
34
+ addr->used_user_addr = (uint64_t)(intptr_t)svq->vring.used;
34
+ &udphdr->uh_sport, sizeof(uint16_t));
35
+
36
+ _net_rx_rss_add_chunk(rss_input, bytes_written,
37
+ &udphdr->uh_dport, sizeof(uint16_t));
38
+}
35
+}
39
+
36
+
40
uint32_t
37
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
41
net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
38
+{
42
NetRxPktRssType type,
39
+ size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
43
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
40
+ size_t avail_size = offsetof(vring_avail_t, ring) +
44
trace_net_rx_pkt_rss_ip6_ex();
41
+ sizeof(uint16_t) * svq->vring.num;
45
_net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
42
+
46
break;
43
+ return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
47
+ case NetPktRssIpV6TcpEx:
44
+}
48
+ assert(pkt->isip6);
45
+
49
+ assert(pkt->istcp);
46
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
50
+ trace_net_rx_pkt_rss_ip6_ex_tcp();
47
+{
51
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
48
+ size_t used_size = offsetof(vring_used_t, ring) +
52
+ _net_rx_rss_prepare_tcp(&rss_input[0], pkt, &rss_length);
49
+ sizeof(vring_used_elem_t) * svq->vring.num;
53
+ break;
50
+ return ROUND_UP(used_size, qemu_real_host_page_size);
54
+ case NetPktRssIpV4Udp:
51
+}
55
+ assert(pkt->isip4);
52
+
56
+ assert(pkt->isudp);
53
+/**
57
+ trace_net_rx_pkt_rss_ip4_udp();
54
* Set a new file descriptor for the guest to kick the SVQ and notify for avail
58
+ _net_rx_rss_prepare_ip4(&rss_input[0], pkt, &rss_length);
55
*
59
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
56
* @svq: The svq
60
+ break;
57
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
61
+ case NetPktRssIpV6Udp:
62
+ assert(pkt->isip6);
63
+ assert(pkt->isudp);
64
+ trace_net_rx_pkt_rss_ip6_udp();
65
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, false, &rss_length);
66
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
67
+ break;
68
+ case NetPktRssIpV6UdpEx:
69
+ assert(pkt->isip6);
70
+ assert(pkt->isudp);
71
+ trace_net_rx_pkt_rss_ip6_ex_udp();
72
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
73
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
74
+ break;
75
default:
76
assert(false);
77
break;
78
diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
79
index XXXXXXX..XXXXXXX 100644
58
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/net/net_rx_pkt.h
59
--- a/hw/virtio/vhost-shadow-virtqueue.h
81
+++ b/hw/net/net_rx_pkt.h
60
+++ b/hw/virtio/vhost-shadow-virtqueue.h
82
@@ -XXX,XX +XXX,XX @@ typedef enum {
61
@@ -XXX,XX +XXX,XX @@
83
NetPktRssIpV4Tcp,
62
#define VHOST_SHADOW_VIRTQUEUE_H
84
NetPktRssIpV6Tcp,
63
85
NetPktRssIpV6,
64
#include "qemu/event_notifier.h"
86
- NetPktRssIpV6Ex
65
+#include "hw/virtio/virtio.h"
87
+ NetPktRssIpV6Ex,
66
+#include "standard-headers/linux/vhost_types.h"
88
+ NetPktRssIpV6TcpEx,
67
89
+ NetPktRssIpV4Udp,
68
/* Shadow virtqueue to relay notifications */
90
+ NetPktRssIpV6Udp,
69
typedef struct VhostShadowVirtqueue {
91
+ NetPktRssIpV6UdpEx,
70
+ /* Shadow vring */
92
} NetRxPktRssType;
71
+ struct vring vring;
93
72
+
94
/**
73
/* Shadow kick notifier, sent to vhost */
95
diff --git a/hw/net/trace-events b/hw/net/trace-events
74
EventNotifier hdev_kick;
96
index XXXXXXX..XXXXXXX 100644
75
/* Shadow call notifier, sent to vhost */
97
--- a/hw/net/trace-events
76
@@ -XXX,XX +XXX,XX @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
98
+++ b/hw/net/trace-events
77
99
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_l3_csum_validate_csum(size_t l3hdr_off, uint32_t csl, uint32_t cntr,
78
void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
100
79
void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
101
net_rx_pkt_rss_ip4(void) "Calculating IPv4 RSS hash"
80
+void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
102
net_rx_pkt_rss_ip4_tcp(void) "Calculating IPv4/TCP RSS hash"
81
+ struct vhost_vring_addr *addr);
103
+net_rx_pkt_rss_ip4_udp(void) "Calculating IPv4/UDP RSS hash"
82
+size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
104
net_rx_pkt_rss_ip6_tcp(void) "Calculating IPv6/TCP RSS hash"
83
+size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
105
+net_rx_pkt_rss_ip6_udp(void) "Calculating IPv6/UDP RSS hash"
84
106
net_rx_pkt_rss_ip6(void) "Calculating IPv6 RSS hash"
85
void vhost_svq_stop(VhostShadowVirtqueue *svq);
107
net_rx_pkt_rss_ip6_ex(void) "Calculating IPv6/EX RSS hash"
108
+net_rx_pkt_rss_ip6_ex_tcp(void) "Calculating IPv6/EX/TCP RSS hash"
109
+net_rx_pkt_rss_ip6_ex_udp(void) "Calculating IPv6/EX/UDP RSS hash"
110
net_rx_pkt_rss_hash(size_t rss_length, uint32_t rss_hash) "RSS hash for %zu bytes: 0x%X"
111
net_rx_pkt_rss_add_chunk(void* ptr, size_t size, size_t input_offset) "Add RSS chunk %p, %zu bytes, RSS input offset %zu bytes"
112
86
113
--
87
--
114
2.5.0
88
2.7.4
115
89
116
90
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
This function re-uses its 'size' argument as a scratch variable.
3
First half of the buffers forwarding part, preparing vhost-vdpa
4
Instead, declare a local 'size' variable for that purpose so that the
4
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
5
function result doesn't get messed up.
5
this is effectively dead code at the moment, but it helps to reduce
6
patch size.
6
7
7
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
8
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9
Tested-by: Laurent Vivier <laurent@vivier.eu>
10
Signed-off-by: Jason Wang <jasowang@redhat.com>
10
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
---
11
---
12
hw/net/dp8393x.c | 9 +++++----
12
hw/virtio/vhost-vdpa.c | 48 +++++++++++++++++++++++++++++++++++++++++-------
13
1 file changed, 5 insertions(+), 4 deletions(-)
13
1 file changed, 41 insertions(+), 7 deletions(-)
14
14
15
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
15
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
16
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/net/dp8393x.c
17
--- a/hw/virtio/vhost-vdpa.c
18
+++ b/hw/net/dp8393x.c
18
+++ b/hw/virtio/vhost-vdpa.c
19
@@ -XXX,XX +XXX,XX @@ static int dp8393x_receive_filter(dp8393xState *s, const uint8_t * buf,
19
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
20
return ret;
21
}
22
23
+static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
24
+ struct vhost_vring_state *ring)
25
+{
26
+ trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
27
+ return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
28
+}
29
+
30
static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
31
struct vhost_vring_file *file)
32
{
33
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
34
return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
20
}
35
}
21
36
22
static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
37
+static int vhost_vdpa_set_vring_dev_addr(struct vhost_dev *dev,
23
- size_t size)
38
+ struct vhost_vring_addr *addr)
24
+ size_t pkt_size)
39
+{
40
+ trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
41
+ addr->desc_user_addr, addr->used_user_addr,
42
+ addr->avail_user_addr,
43
+ addr->log_guest_addr);
44
+
45
+ return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
46
+
47
+}
48
+
49
/**
50
* Set the shadow virtqueue descriptors to the device
51
*
52
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
53
static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
54
struct vhost_vring_addr *addr)
25
{
55
{
26
dp8393xState *s = qemu_get_nic_opaque(nc);
56
- trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
27
int packet_type;
57
- addr->desc_user_addr, addr->used_user_addr,
28
uint32_t available, address;
58
- addr->avail_user_addr,
29
- int width, rx_len = size;
59
- addr->log_guest_addr);
30
+ int width, rx_len = pkt_size;
60
- return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
31
uint32_t checksum;
61
+ struct vhost_vdpa *v = dev->opaque;
32
+ int size;
62
+
33
63
+ if (v->shadow_vqs_enabled) {
34
width = (s->regs[SONIC_DCR] & SONIC_DCR_DW) ? 2 : 1;
64
+ /*
35
65
+ * Device vring addr was set at device start. SVQ base is handled by
36
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
66
+ * VirtQueue code.
37
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
67
+ */
38
68
+ return 0;
39
- packet_type = dp8393x_receive_filter(s, buf, size);
69
+ }
40
+ packet_type = dp8393x_receive_filter(s, buf, pkt_size);
70
+
41
if (packet_type < 0) {
71
+ return vhost_vdpa_set_vring_dev_addr(dev, addr);
42
DPRINTF("packet not for netcard\n");
43
return -1;
44
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
45
/* Done */
46
dp8393x_update_irq(s);
47
48
- return size;
49
+ return pkt_size;
50
}
72
}
51
73
52
static void dp8393x_reset(DeviceState *dev)
74
static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
75
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
76
static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
77
struct vhost_vring_state *ring)
78
{
79
- trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
80
- return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
81
+ struct vhost_vdpa *v = dev->opaque;
82
+
83
+ if (v->shadow_vqs_enabled) {
84
+ /*
85
+ * Device vring base was set at device start. SVQ base is handled by
86
+ * VirtQueue code.
87
+ */
88
+ return 0;
89
+ }
90
+
91
+ return vhost_vdpa_set_dev_vring_base(dev, ring);
92
}
93
94
static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
53
--
95
--
54
2.5.0
96
2.7.4
55
97
56
98
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
The DP83932 and DP83934 have 32 data lines. The datasheet says,
3
Initial version of shadow virtqueue that actually forward buffers. There
4
4
is no iommu support at the moment, and that will be addressed in future
5
Data Bus: These bidirectional lines are used to transfer data on the
5
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
6
system bus. When the SONIC is a bus master, 16-bit data is transferred
6
this means that SVQ is not usable at this point of the series on any
7
on D15-D0 and 32-bit data is transferred on D31-D0. When the SONIC is
7
device.
8
accessed as a slave, register data is driven onto lines D15-D0.
8
9
D31-D16 are held TRI-STATE if SONIC is in 16-bit mode. If SONIC is in
9
For simplicity it only supports modern devices, that expects vring
10
32-bit mode, they are driven, but invalid.
10
in little endian, with split ring and no event idx or indirect
11
11
descriptors. Support for them will not be added in this series.
12
Always use 32-bit accesses both as bus master and bus slave.
12
13
13
It reuses the VirtQueue code for the device part. The driver part is
14
Force the MSW to zero in bus master mode.
14
based on Linux's virtio_ring driver, but with stripped functionality
15
15
and optimizations so it's easier to review.
16
This gets the Linux 'jazzsonic' driver working, and avoids the need for
16
17
prior hacks to make the NetBSD 'sn' driver work.
17
However, forwarding buffers have some particular pieces: One of the most
18
18
unexpected ones is that a guest's buffer can expand through more than
19
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
19
one descriptor in SVQ. While this is handled gracefully by qemu's
20
Tested-by: Laurent Vivier <laurent@vivier.eu>
20
emulated virtio devices, it may cause unexpected SVQ queue full. This
21
patch also solves it by checking for this condition at both guest's
22
kicks and device's calls. The code may be more elegant in the future if
23
SVQ code runs in its own iocontext.
24
25
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
26
Acked-by: Michael S. Tsirkin <mst@redhat.com>
21
Signed-off-by: Jason Wang <jasowang@redhat.com>
27
Signed-off-by: Jason Wang <jasowang@redhat.com>
22
---
28
---
23
hw/net/dp8393x.c | 47 +++++++++++++++++++++++++++++------------------
29
hw/virtio/vhost-shadow-virtqueue.c | 352 ++++++++++++++++++++++++++++++++++++-
24
1 file changed, 29 insertions(+), 18 deletions(-)
30
hw/virtio/vhost-shadow-virtqueue.h | 26 +++
25
31
hw/virtio/vhost-vdpa.c | 155 +++++++++++++++-
26
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
32
3 files changed, 522 insertions(+), 11 deletions(-)
33
34
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
27
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
28
--- a/hw/net/dp8393x.c
36
--- a/hw/virtio/vhost-shadow-virtqueue.c
29
+++ b/hw/net/dp8393x.c
37
+++ b/hw/virtio/vhost-shadow-virtqueue.c
30
@@ -XXX,XX +XXX,XX @@ static void dp8393x_put(dp8393xState *s, int width, int offset,
38
@@ -XXX,XX +XXX,XX @@
31
uint16_t val)
39
#include "qemu/error-report.h"
40
#include "qapi/error.h"
41
#include "qemu/main-loop.h"
42
+#include "qemu/log.h"
43
+#include "qemu/memalign.h"
44
#include "linux-headers/linux/vhost.h"
45
46
/**
47
@@ -XXX,XX +XXX,XX @@ bool vhost_svq_valid_features(uint64_t features, Error **errp)
48
}
49
50
/**
51
- * Forward guest notifications.
52
+ * Number of descriptors that the SVQ can make available from the guest.
53
+ *
54
+ * @svq: The svq
55
+ */
56
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
57
+{
58
+ return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
59
+}
60
+
61
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
62
+ const struct iovec *iovec, size_t num,
63
+ bool more_descs, bool write)
64
+{
65
+ uint16_t i = svq->free_head, last = svq->free_head;
66
+ unsigned n;
67
+ uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
68
+ vring_desc_t *descs = svq->vring.desc;
69
+
70
+ if (num == 0) {
71
+ return;
72
+ }
73
+
74
+ for (n = 0; n < num; n++) {
75
+ if (more_descs || (n + 1 < num)) {
76
+ descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
77
+ } else {
78
+ descs[i].flags = flags;
79
+ }
80
+ descs[i].addr = cpu_to_le64((hwaddr)(intptr_t)iovec[n].iov_base);
81
+ descs[i].len = cpu_to_le32(iovec[n].iov_len);
82
+
83
+ last = i;
84
+ i = cpu_to_le16(descs[i].next);
85
+ }
86
+
87
+ svq->free_head = le16_to_cpu(descs[last].next);
88
+}
89
+
90
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
91
+ VirtQueueElement *elem, unsigned *head)
92
+{
93
+ unsigned avail_idx;
94
+ vring_avail_t *avail = svq->vring.avail;
95
+
96
+ *head = svq->free_head;
97
+
98
+ /* We need some descriptors here */
99
+ if (unlikely(!elem->out_num && !elem->in_num)) {
100
+ qemu_log_mask(LOG_GUEST_ERROR,
101
+ "Guest provided element with no descriptors");
102
+ return false;
103
+ }
104
+
105
+ vhost_vring_write_descs(svq, elem->out_sg, elem->out_num, elem->in_num > 0,
106
+ false);
107
+ vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
108
+
109
+ /*
110
+ * Put the entry in the available array (but don't update avail->idx until
111
+ * they do sync).
112
+ */
113
+ avail_idx = svq->shadow_avail_idx & (svq->vring.num - 1);
114
+ avail->ring[avail_idx] = cpu_to_le16(*head);
115
+ svq->shadow_avail_idx++;
116
+
117
+ /* Update the avail index after write the descriptor */
118
+ smp_wmb();
119
+ avail->idx = cpu_to_le16(svq->shadow_avail_idx);
120
+
121
+ return true;
122
+}
123
+
124
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
125
+{
126
+ unsigned qemu_head;
127
+ bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
128
+ if (unlikely(!ok)) {
129
+ return false;
130
+ }
131
+
132
+ svq->ring_id_maps[qemu_head] = elem;
133
+ return true;
134
+}
135
+
136
+static void vhost_svq_kick(VhostShadowVirtqueue *svq)
137
+{
138
+ /*
139
+ * We need to expose the available array entries before checking the used
140
+ * flags
141
+ */
142
+ smp_mb();
143
+ if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
144
+ return;
145
+ }
146
+
147
+ event_notifier_set(&svq->hdev_kick);
148
+}
149
+
150
+/**
151
+ * Forward available buffers.
152
+ *
153
+ * @svq: Shadow VirtQueue
154
+ *
155
+ * Note that this function does not guarantee that all guest's available
156
+ * buffers are available to the device in SVQ avail ring. The guest may have
157
+ * exposed a GPA / GIOVA contiguous buffer, but it may not be contiguous in
158
+ * qemu vaddr.
159
+ *
160
+ * If that happens, guest's kick notifications will be disabled until the
161
+ * device uses some buffers.
162
+ */
163
+static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
164
+{
165
+ /* Clear event notifier */
166
+ event_notifier_test_and_clear(&svq->svq_kick);
167
+
168
+ /* Forward to the device as many available buffers as possible */
169
+ do {
170
+ virtio_queue_set_notification(svq->vq, false);
171
+
172
+ while (true) {
173
+ VirtQueueElement *elem;
174
+ bool ok;
175
+
176
+ if (svq->next_guest_avail_elem) {
177
+ elem = g_steal_pointer(&svq->next_guest_avail_elem);
178
+ } else {
179
+ elem = virtqueue_pop(svq->vq, sizeof(*elem));
180
+ }
181
+
182
+ if (!elem) {
183
+ break;
184
+ }
185
+
186
+ if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
187
+ /*
188
+ * This condition is possible since a contiguous buffer in GPA
189
+ * does not imply a contiguous buffer in qemu's VA
190
+ * scatter-gather segments. If that happens, the buffer exposed
191
+ * to the device needs to be a chain of descriptors at this
192
+ * moment.
193
+ *
194
+ * SVQ cannot hold more available buffers if we are here:
195
+ * queue the current guest descriptor and ignore further kicks
196
+ * until some elements are used.
197
+ */
198
+ svq->next_guest_avail_elem = elem;
199
+ return;
200
+ }
201
+
202
+ ok = vhost_svq_add(svq, elem);
203
+ if (unlikely(!ok)) {
204
+ /* VQ is broken, just return and ignore any other kicks */
205
+ return;
206
+ }
207
+ vhost_svq_kick(svq);
208
+ }
209
+
210
+ virtio_queue_set_notification(svq->vq, true);
211
+ } while (!virtio_queue_empty(svq->vq));
212
+}
213
+
214
+/**
215
+ * Handle guest's kick.
216
*
217
* @n: guest kick event notifier, the one that guest set to notify svq.
218
*/
219
-static void vhost_handle_guest_kick(EventNotifier *n)
220
+static void vhost_handle_guest_kick_notifier(EventNotifier *n)
32
{
221
{
33
if (s->big_endian) {
222
VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue, svq_kick);
34
- s->data[offset * width + width - 1] = cpu_to_be16(val);
223
event_notifier_test_and_clear(n);
35
+ if (width == 2) {
224
- event_notifier_set(&svq->hdev_kick);
36
+ s->data[offset * 2] = 0;
225
+ vhost_handle_guest_kick(svq);
37
+ s->data[offset * 2 + 1] = cpu_to_be16(val);
226
+}
38
+ } else {
227
+
39
+ s->data[offset] = cpu_to_be16(val);
228
+static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
40
+ }
229
+{
41
} else {
230
+ if (svq->last_used_idx != svq->shadow_used_idx) {
42
- s->data[offset * width] = cpu_to_le16(val);
231
+ return true;
43
+ if (width == 2) {
232
+ }
44
+ s->data[offset * 2] = cpu_to_le16(val);
233
+
45
+ s->data[offset * 2 + 1] = 0;
234
+ svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
46
+ } else {
235
+
47
+ s->data[offset] = cpu_to_le16(val);
236
+ return svq->last_used_idx != svq->shadow_used_idx;
48
+ }
237
}
238
239
/**
240
- * Forward vhost notifications
241
+ * Enable vhost device calls after disable them.
242
+ *
243
+ * @svq: The svq
244
+ *
245
+ * It returns false if there are pending used buffers from the vhost device,
246
+ * avoiding the possible races between SVQ checking for more work and enabling
247
+ * callbacks. True if SVQ used vring has no more pending buffers.
248
+ */
249
+static bool vhost_svq_enable_notification(VhostShadowVirtqueue *svq)
250
+{
251
+ svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
252
+ /* Make sure the flag is written before the read of used_idx */
253
+ smp_mb();
254
+ return !vhost_svq_more_used(svq);
255
+}
256
+
257
+static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
258
+{
259
+ svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
260
+}
261
+
262
+static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
263
+ uint32_t *len)
264
+{
265
+ vring_desc_t *descs = svq->vring.desc;
266
+ const vring_used_t *used = svq->vring.used;
267
+ vring_used_elem_t used_elem;
268
+ uint16_t last_used;
269
+
270
+ if (!vhost_svq_more_used(svq)) {
271
+ return NULL;
272
+ }
273
+
274
+ /* Only get used array entries after they have been exposed by dev */
275
+ smp_rmb();
276
+ last_used = svq->last_used_idx & (svq->vring.num - 1);
277
+ used_elem.id = le32_to_cpu(used->ring[last_used].id);
278
+ used_elem.len = le32_to_cpu(used->ring[last_used].len);
279
+
280
+ svq->last_used_idx++;
281
+ if (unlikely(used_elem.id >= svq->vring.num)) {
282
+ qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used",
283
+ svq->vdev->name, used_elem.id);
284
+ return NULL;
285
+ }
286
+
287
+ if (unlikely(!svq->ring_id_maps[used_elem.id])) {
288
+ qemu_log_mask(LOG_GUEST_ERROR,
289
+ "Device %s says index %u is used, but it was not available",
290
+ svq->vdev->name, used_elem.id);
291
+ return NULL;
292
+ }
293
+
294
+ descs[used_elem.id].next = svq->free_head;
295
+ svq->free_head = used_elem.id;
296
+
297
+ *len = used_elem.len;
298
+ return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
299
+}
300
+
301
+static void vhost_svq_flush(VhostShadowVirtqueue *svq,
302
+ bool check_for_avail_queue)
303
+{
304
+ VirtQueue *vq = svq->vq;
305
+
306
+ /* Forward as many used buffers as possible. */
307
+ do {
308
+ unsigned i = 0;
309
+
310
+ vhost_svq_disable_notification(svq);
311
+ while (true) {
312
+ uint32_t len;
313
+ g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
314
+ if (!elem) {
315
+ break;
316
+ }
317
+
318
+ if (unlikely(i >= svq->vring.num)) {
319
+ qemu_log_mask(LOG_GUEST_ERROR,
320
+ "More than %u used buffers obtained in a %u size SVQ",
321
+ i, svq->vring.num);
322
+ virtqueue_fill(vq, elem, len, i);
323
+ virtqueue_flush(vq, i);
324
+ return;
325
+ }
326
+ virtqueue_fill(vq, elem, len, i++);
327
+ }
328
+
329
+ virtqueue_flush(vq, i);
330
+ event_notifier_set(&svq->svq_call);
331
+
332
+ if (check_for_avail_queue && svq->next_guest_avail_elem) {
333
+ /*
334
+ * Avail ring was full when vhost_svq_flush was called, so it's a
335
+ * good moment to make more descriptors available if possible.
336
+ */
337
+ vhost_handle_guest_kick(svq);
338
+ }
339
+ } while (!vhost_svq_enable_notification(svq));
340
+}
341
+
342
+/**
343
+ * Forward used buffers.
344
*
345
* @n: hdev call event notifier, the one that device set to notify svq.
346
+ *
347
+ * Note that we are not making any buffers available in the loop, there is no
348
+ * way that it runs more than virtqueue size times.
349
*/
350
static void vhost_svq_handle_call(EventNotifier *n)
351
{
352
VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
353
hdev_call);
354
event_notifier_test_and_clear(n);
355
- event_notifier_set(&svq->svq_call);
356
+ vhost_svq_flush(svq, true);
357
}
358
359
/**
360
@@ -XXX,XX +XXX,XX @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
361
if (poll_start) {
362
event_notifier_init_fd(svq_kick, svq_kick_fd);
363
event_notifier_set(svq_kick);
364
- event_notifier_set_handler(svq_kick, vhost_handle_guest_kick);
365
+ event_notifier_set_handler(svq_kick, vhost_handle_guest_kick_notifier);
366
+ }
367
+}
368
+
369
+/**
370
+ * Start the shadow virtqueue operation.
371
+ *
372
+ * @svq: Shadow Virtqueue
373
+ * @vdev: VirtIO device
374
+ * @vq: Virtqueue to shadow
375
+ */
376
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
377
+ VirtQueue *vq)
378
+{
379
+ size_t desc_size, driver_size, device_size;
380
+
381
+ svq->next_guest_avail_elem = NULL;
382
+ svq->shadow_avail_idx = 0;
383
+ svq->shadow_used_idx = 0;
384
+ svq->last_used_idx = 0;
385
+ svq->vdev = vdev;
386
+ svq->vq = vq;
387
+
388
+ svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq));
389
+ driver_size = vhost_svq_driver_area_size(svq);
390
+ device_size = vhost_svq_device_area_size(svq);
391
+ svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
392
+ desc_size = sizeof(vring_desc_t) * svq->vring.num;
393
+ svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
394
+ memset(svq->vring.desc, 0, driver_size);
395
+ svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
396
+ memset(svq->vring.used, 0, device_size);
397
+ svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
398
+ for (unsigned i = 0; i < svq->vring.num - 1; i++) {
399
+ svq->vring.desc[i].next = cpu_to_le16(i + 1);
49
}
400
}
50
}
401
}
51
402
52
@@ -XXX,XX +XXX,XX @@ static uint64_t dp8393x_read(void *opaque, hwaddr addr, unsigned int size)
403
@@ -XXX,XX +XXX,XX @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
53
404
void vhost_svq_stop(VhostShadowVirtqueue *svq)
54
DPRINTF("read 0x%04x from reg %s\n", val, reg_names[reg]);
405
{
55
406
event_notifier_set_handler(&svq->svq_kick, NULL);
56
- return val;
407
+ g_autofree VirtQueueElement *next_avail_elem = NULL;
57
+ return s->big_endian ? val << 16 : val;
408
+
409
+ if (!svq->vq) {
410
+ return;
411
+ }
412
+
413
+ /* Send all pending used descriptors to guest */
414
+ vhost_svq_flush(svq, false);
415
+
416
+ for (unsigned i = 0; i < svq->vring.num; ++i) {
417
+ g_autofree VirtQueueElement *elem = NULL;
418
+ elem = g_steal_pointer(&svq->ring_id_maps[i]);
419
+ if (elem) {
420
+ virtqueue_detach_element(svq->vq, elem, 0);
421
+ }
422
+ }
423
+
424
+ next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
425
+ if (next_avail_elem) {
426
+ virtqueue_detach_element(svq->vq, next_avail_elem, 0);
427
+ }
428
+ svq->vq = NULL;
429
+ g_free(svq->ring_id_maps);
430
+ qemu_vfree(svq->vring.desc);
431
+ qemu_vfree(svq->vring.used);
58
}
432
}
59
433
60
static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
434
/**
61
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
435
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
436
index XXXXXXX..XXXXXXX 100644
437
--- a/hw/virtio/vhost-shadow-virtqueue.h
438
+++ b/hw/virtio/vhost-shadow-virtqueue.h
439
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
440
441
/* Guest's call notifier, where the SVQ calls guest. */
442
EventNotifier svq_call;
443
+
444
+ /* Virtio queue shadowing */
445
+ VirtQueue *vq;
446
+
447
+ /* Virtio device */
448
+ VirtIODevice *vdev;
449
+
450
+ /* Map for use the guest's descriptors */
451
+ VirtQueueElement **ring_id_maps;
452
+
453
+ /* Next VirtQueue element that guest made available */
454
+ VirtQueueElement *next_guest_avail_elem;
455
+
456
+ /* Next head to expose to the device */
457
+ uint16_t shadow_avail_idx;
458
+
459
+ /* Next free descriptor */
460
+ uint16_t free_head;
461
+
462
+ /* Last seen used idx */
463
+ uint16_t shadow_used_idx;
464
+
465
+ /* Next head to consume from the device */
466
+ uint16_t last_used_idx;
467
} VhostShadowVirtqueue;
468
469
bool vhost_svq_valid_features(uint64_t features, Error **errp);
470
@@ -XXX,XX +XXX,XX @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
471
size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
472
size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
473
474
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
475
+ VirtQueue *vq);
476
void vhost_svq_stop(VhostShadowVirtqueue *svq);
477
478
VhostShadowVirtqueue *vhost_svq_new(void);
479
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
480
index XXXXXXX..XXXXXXX 100644
481
--- a/hw/virtio/vhost-vdpa.c
482
+++ b/hw/virtio/vhost-vdpa.c
483
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_dev_addr(struct vhost_dev *dev,
484
* Note that this function does not rewind kick file descriptor if cannot set
485
* call one.
486
*/
487
-static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
488
- VhostShadowVirtqueue *svq, unsigned idx,
489
- Error **errp)
490
+static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
491
+ VhostShadowVirtqueue *svq, unsigned idx,
492
+ Error **errp)
62
{
493
{
63
dp8393xState *s = opaque;
494
struct vhost_vring_file file = {
64
int reg = addr >> s->it_shift;
495
.index = dev->vq_index + idx,
65
+ uint32_t val = s->big_endian ? data >> 16 : data;
496
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
66
497
r = vhost_vdpa_set_vring_dev_kick(dev, &file);
67
- DPRINTF("write 0x%04x to reg %s\n", (uint16_t)data, reg_names[reg]);
498
if (unlikely(r != 0)) {
68
+ DPRINTF("write 0x%04x to reg %s\n", (uint16_t)val, reg_names[reg]);
499
error_setg_errno(errp, -r, "Can't set device kick fd");
69
500
- return false;
70
switch (reg) {
501
+ return r;
71
/* Command register */
72
case SONIC_CR:
73
- dp8393x_do_command(s, data);
74
+ dp8393x_do_command(s, val);
75
break;
76
/* Prevent write to read-only registers */
77
case SONIC_CAP2:
78
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
79
/* Accept write to some registers only when in reset mode */
80
case SONIC_DCR:
81
if (s->regs[SONIC_CR] & SONIC_CR_RST) {
82
- s->regs[reg] = data & 0xbfff;
83
+ s->regs[reg] = val & 0xbfff;
84
} else {
85
DPRINTF("writing to DCR invalid\n");
86
}
87
break;
88
case SONIC_DCR2:
89
if (s->regs[SONIC_CR] & SONIC_CR_RST) {
90
- s->regs[reg] = data & 0xf017;
91
+ s->regs[reg] = val & 0xf017;
92
} else {
93
DPRINTF("writing to DCR2 invalid\n");
94
}
95
break;
96
/* 12 lower bytes are Read Only */
97
case SONIC_TCR:
98
- s->regs[reg] = data & 0xf000;
99
+ s->regs[reg] = val & 0xf000;
100
break;
101
/* 9 lower bytes are Read Only */
102
case SONIC_RCR:
103
- s->regs[reg] = data & 0xffe0;
104
+ s->regs[reg] = val & 0xffe0;
105
break;
106
/* Ignore most significant bit */
107
case SONIC_IMR:
108
- s->regs[reg] = data & 0x7fff;
109
+ s->regs[reg] = val & 0x7fff;
110
dp8393x_update_irq(s);
111
break;
112
/* Clear bits by writing 1 to them */
113
case SONIC_ISR:
114
- data &= s->regs[reg];
115
- s->regs[reg] &= ~data;
116
- if (data & SONIC_ISR_RBE) {
117
+ val &= s->regs[reg];
118
+ s->regs[reg] &= ~val;
119
+ if (val & SONIC_ISR_RBE) {
120
dp8393x_do_read_rra(s);
121
}
122
dp8393x_update_irq(s);
123
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
124
case SONIC_REA:
125
case SONIC_RRP:
126
case SONIC_RWP:
127
- s->regs[reg] = data & 0xfffe;
128
+ s->regs[reg] = val & 0xfffe;
129
break;
130
/* Invert written value for some registers */
131
case SONIC_CRCT:
132
case SONIC_FAET:
133
case SONIC_MPT:
134
- s->regs[reg] = data ^ 0xffff;
135
+ s->regs[reg] = val ^ 0xffff;
136
break;
137
/* All other registers have no special contrainst */
138
default:
139
- s->regs[reg] = data;
140
+ s->regs[reg] = val;
141
}
502
}
142
503
143
if (reg == SONIC_WT0 || reg == SONIC_WT1) {
504
event_notifier = &svq->hdev_call;
144
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
505
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
145
static const MemoryRegionOps dp8393x_ops = {
506
error_setg_errno(errp, -r, "Can't set device call fd");
146
.read = dp8393x_read,
507
}
147
.write = dp8393x_write,
508
148
- .impl.min_access_size = 2,
509
+ return r;
149
- .impl.max_access_size = 2,
510
+}
150
+ .impl.min_access_size = 4,
511
+
151
+ .impl.max_access_size = 4,
512
+/**
152
.endianness = DEVICE_NATIVE_ENDIAN,
513
+ * Unmap a SVQ area in the device
153
};
514
+ */
515
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
516
+ hwaddr size)
517
+{
518
+ int r;
519
+
520
+ size = ROUND_UP(size, qemu_real_host_page_size);
521
+ r = vhost_vdpa_dma_unmap(v, iova, size);
522
+ return r == 0;
523
+}
524
+
525
+static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
526
+ const VhostShadowVirtqueue *svq)
527
+{
528
+ struct vhost_vdpa *v = dev->opaque;
529
+ struct vhost_vring_addr svq_addr;
530
+ size_t device_size = vhost_svq_device_area_size(svq);
531
+ size_t driver_size = vhost_svq_driver_area_size(svq);
532
+ bool ok;
533
+
534
+ vhost_svq_get_vring_addr(svq, &svq_addr);
535
+
536
+ ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
537
+ if (unlikely(!ok)) {
538
+ return false;
539
+ }
540
+
541
+ return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
542
+}
543
+
544
+/**
545
+ * Map the shadow virtqueue rings in the device
546
+ *
547
+ * @dev: The vhost device
548
+ * @svq: The shadow virtqueue
549
+ * @addr: Assigned IOVA addresses
550
+ * @errp: Error pointer
551
+ */
552
+static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
553
+ const VhostShadowVirtqueue *svq,
554
+ struct vhost_vring_addr *addr,
555
+ Error **errp)
556
+{
557
+ struct vhost_vdpa *v = dev->opaque;
558
+ size_t device_size = vhost_svq_device_area_size(svq);
559
+ size_t driver_size = vhost_svq_driver_area_size(svq);
560
+ int r;
561
+
562
+ ERRP_GUARD();
563
+ vhost_svq_get_vring_addr(svq, addr);
564
+
565
+ r = vhost_vdpa_dma_map(v, addr->desc_user_addr, driver_size,
566
+ (void *)(uintptr_t)addr->desc_user_addr, true);
567
+ if (unlikely(r != 0)) {
568
+ error_setg_errno(errp, -r, "Cannot create vq driver region: ");
569
+ return false;
570
+ }
571
+
572
+ r = vhost_vdpa_dma_map(v, addr->used_user_addr, device_size,
573
+ (void *)(intptr_t)addr->used_user_addr, false);
574
+ if (unlikely(r != 0)) {
575
+ error_setg_errno(errp, -r, "Cannot create vq device region: ");
576
+ }
577
+
578
+ return r == 0;
579
+}
580
+
581
+static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
582
+ VhostShadowVirtqueue *svq, unsigned idx,
583
+ Error **errp)
584
+{
585
+ uint16_t vq_index = dev->vq_index + idx;
586
+ struct vhost_vring_state s = {
587
+ .index = vq_index,
588
+ };
589
+ int r;
590
+
591
+ r = vhost_vdpa_set_dev_vring_base(dev, &s);
592
+ if (unlikely(r)) {
593
+ error_setg_errno(errp, -r, "Cannot set vring base");
594
+ return false;
595
+ }
596
+
597
+ r = vhost_vdpa_svq_set_fds(dev, svq, idx, errp);
598
return r == 0;
599
}
600
601
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
602
}
603
604
for (i = 0; i < v->shadow_vqs->len; ++i) {
605
+ VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
606
VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
607
+ struct vhost_vring_addr addr = {
608
+ .index = i,
609
+ };
610
+ int r;
611
bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err);
612
if (unlikely(!ok)) {
613
- error_reportf_err(err, "Cannot setup SVQ %u: ", i);
614
+ goto err;
615
+ }
616
+
617
+ vhost_svq_start(svq, dev->vdev, vq);
618
+ ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err);
619
+ if (unlikely(!ok)) {
620
+ goto err_map;
621
+ }
622
+
623
+ /* Override vring GPA set by vhost subsystem */
624
+ r = vhost_vdpa_set_vring_dev_addr(dev, &addr);
625
+ if (unlikely(r != 0)) {
626
+ error_setg_errno(&err, -r, "Cannot set device address");
627
+ goto err_set_addr;
628
+ }
629
+ }
630
+
631
+ return true;
632
+
633
+err_set_addr:
634
+ vhost_vdpa_svq_unmap_rings(dev, g_ptr_array_index(v->shadow_vqs, i));
635
+
636
+err_map:
637
+ vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i));
638
+
639
+err:
640
+ error_reportf_err(err, "Cannot setup SVQ %u: ", i);
641
+ for (unsigned j = 0; j < i; ++j) {
642
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, j);
643
+ vhost_vdpa_svq_unmap_rings(dev, svq);
644
+ vhost_svq_stop(svq);
645
+ }
646
+
647
+ return false;
648
+}
649
+
650
+static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
651
+{
652
+ struct vhost_vdpa *v = dev->opaque;
653
+
654
+ if (!v->shadow_vqs) {
655
+ return true;
656
+ }
657
+
658
+ for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
659
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
660
+ bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
661
+ if (unlikely(!ok)) {
662
return false;
663
}
664
}
665
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
666
}
667
vhost_vdpa_set_vring_ready(dev);
668
} else {
669
+ ok = vhost_vdpa_svqs_stop(dev);
670
+ if (unlikely(!ok)) {
671
+ return -1;
672
+ }
673
vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
674
}
154
675
155
--
676
--
156
2.5.0
677
2.7.4
157
678
158
679
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
It doesn't make sense to clear the command register bit unless the
3
This iova tree function allows it to look for a hole in allocated
4
command was actually issued.
4
regions and return a totally new translation for a given translated
5
5
address.
6
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
6
7
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
It's usage is mainly to allow devices to access qemu address space,
8
Tested-by: Laurent Vivier <laurent@vivier.eu>
8
remapping guest's one into a new iova space where qemu can add chunks of
9
addresses.
10
11
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
12
Reviewed-by: Peter Xu <peterx@redhat.com>
13
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
Signed-off-by: Jason Wang <jasowang@redhat.com>
10
---
15
---
11
hw/net/dp8393x.c | 7 +++----
16
include/qemu/iova-tree.h | 18 +++++++
12
1 file changed, 3 insertions(+), 4 deletions(-)
17
util/iova-tree.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++
13
18
2 files changed, 154 insertions(+)
14
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
19
20
diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
15
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/net/dp8393x.c
22
--- a/include/qemu/iova-tree.h
17
+++ b/hw/net/dp8393x.c
23
+++ b/include/qemu/iova-tree.h
18
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_read_rra(dp8393xState *s)
24
@@ -XXX,XX +XXX,XX @@
19
s->regs[SONIC_ISR] |= SONIC_ISR_RBE;
25
#define IOVA_OK (0)
20
dp8393x_update_irq(s);
26
#define IOVA_ERR_INVALID (-1) /* Invalid parameters */
21
}
27
#define IOVA_ERR_OVERLAP (-2) /* IOVA range overlapped */
22
-
28
+#define IOVA_ERR_NOMEM (-3) /* Cannot allocate */
23
- /* Done */
29
24
- s->regs[SONIC_CR] &= ~SONIC_CR_RRRA;
30
typedef struct IOVATree IOVATree;
31
typedef struct DMAMap {
32
@@ -XXX,XX +XXX,XX @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
33
void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
34
35
/**
36
+ * iova_tree_alloc_map:
37
+ *
38
+ * @tree: the iova tree to allocate from
39
+ * @map: the new map (as translated addr & size) to allocate in the iova region
40
+ * @iova_begin: the minimum address of the allocation
41
+ * @iova_end: the maximum addressable direction of the allocation
42
+ *
43
+ * Allocates a new region of a given size, between iova_min and iova_max.
44
+ *
45
+ * Return: Same as iova_tree_insert, but cannot overlap and can return error if
46
+ * iova tree is out of free contiguous range. The caller gets the assigned iova
47
+ * in map->iova.
48
+ */
49
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
50
+ hwaddr iova_end);
51
+
52
+/**
53
* iova_tree_destroy:
54
*
55
* @tree: the iova tree to destroy
56
diff --git a/util/iova-tree.c b/util/iova-tree.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/util/iova-tree.c
59
+++ b/util/iova-tree.c
60
@@ -XXX,XX +XXX,XX @@ struct IOVATree {
61
GTree *tree;
62
};
63
64
+/* Args to pass to iova_tree_alloc foreach function. */
65
+struct IOVATreeAllocArgs {
66
+ /* Size of the desired allocation */
67
+ size_t new_size;
68
+
69
+ /* The minimum address allowed in the allocation */
70
+ hwaddr iova_begin;
71
+
72
+ /* Map at the left of the hole, can be NULL if "this" is first one */
73
+ const DMAMap *prev;
74
+
75
+ /* Map at the right of the hole, can be NULL if "prev" is the last one */
76
+ const DMAMap *this;
77
+
78
+ /* If found, we fill in the IOVA here */
79
+ hwaddr iova_result;
80
+
81
+ /* Whether have we found a valid IOVA */
82
+ bool iova_found;
83
+};
84
+
85
+/**
86
+ * Iterate args to the next hole
87
+ *
88
+ * @args: The alloc arguments
89
+ * @next: The next mapping in the tree. Can be NULL to signal the last one
90
+ */
91
+static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
92
+ const DMAMap *next)
93
+{
94
+ args->prev = args->this;
95
+ args->this = next;
96
+}
97
+
98
static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
99
{
100
const DMAMap *m1 = a, *m2 = b;
101
@@ -XXX,XX +XXX,XX @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
102
return IOVA_OK;
25
}
103
}
26
104
27
static void dp8393x_do_software_reset(dp8393xState *s)
105
+/**
28
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_command(dp8393xState *s, uint16_t command)
106
+ * Try to find an unallocated IOVA range between prev and this elements.
29
dp8393x_do_start_timer(s);
107
+ *
30
if (command & SONIC_CR_RST)
108
+ * @args: Arguments to allocation
31
dp8393x_do_software_reset(s);
109
+ *
32
- if (command & SONIC_CR_RRRA)
110
+ * Cases:
33
+ if (command & SONIC_CR_RRRA) {
111
+ *
34
dp8393x_do_read_rra(s);
112
+ * (1) !prev, !this: No entries allocated, always succeed
35
+ s->regs[SONIC_CR] &= ~SONIC_CR_RRRA;
113
+ *
36
+ }
114
+ * (2) !prev, this: We're iterating at the 1st element.
37
if (command & SONIC_CR_LCAM)
115
+ *
38
dp8393x_do_load_cam(s);
116
+ * (3) prev, !this: We're iterating at the last element.
39
}
117
+ *
118
+ * (4) prev, this: this is the most common case, we'll try to find a hole
119
+ * between "prev" and "this" mapping.
120
+ *
121
+ * Note that this function assumes the last valid iova is HWADDR_MAX, but it
122
+ * searches linearly so it's easy to discard the result if it's not the case.
123
+ */
124
+static void iova_tree_alloc_map_in_hole(struct IOVATreeAllocArgs *args)
125
+{
126
+ const DMAMap *prev = args->prev, *this = args->this;
127
+ uint64_t hole_start, hole_last;
128
+
129
+ if (this && this->iova + this->size < args->iova_begin) {
130
+ return;
131
+ }
132
+
133
+ hole_start = MAX(prev ? prev->iova + prev->size + 1 : 0, args->iova_begin);
134
+ hole_last = this ? this->iova : HWADDR_MAX;
135
+
136
+ if (hole_last - hole_start > args->new_size) {
137
+ args->iova_result = hole_start;
138
+ args->iova_found = true;
139
+ }
140
+}
141
+
142
+/**
143
+ * Foreach dma node in the tree, compare if there is a hole with its previous
144
+ * node (or minimum iova address allowed) and the node.
145
+ *
146
+ * @key: Node iterating
147
+ * @value: Node iterating
148
+ * @pargs: Struct to communicate with the outside world
149
+ *
150
+ * Return: false to keep iterating, true if needs break.
151
+ */
152
+static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
153
+ gpointer pargs)
154
+{
155
+ struct IOVATreeAllocArgs *args = pargs;
156
+ DMAMap *node = value;
157
+
158
+ assert(key == value);
159
+
160
+ iova_tree_alloc_args_iterate(args, node);
161
+ iova_tree_alloc_map_in_hole(args);
162
+ return args->iova_found;
163
+}
164
+
165
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
166
+ hwaddr iova_last)
167
+{
168
+ struct IOVATreeAllocArgs args = {
169
+ .new_size = map->size,
170
+ .iova_begin = iova_begin,
171
+ };
172
+
173
+ if (unlikely(iova_last < iova_begin)) {
174
+ return IOVA_ERR_INVALID;
175
+ }
176
+
177
+ /*
178
+ * Find a valid hole for the mapping
179
+ *
180
+ * Assuming low iova_begin, so no need to do a binary search to
181
+ * locate the first node.
182
+ *
183
+ * TODO: Replace all this with g_tree_node_first/next/last when available
184
+ * (from glib since 2.68). To do it with g_tree_foreach complicates the
185
+ * code a lot.
186
+ *
187
+ */
188
+ g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
189
+ if (!args.iova_found) {
190
+ /*
191
+ * Either tree is empty or the last hole is still not checked.
192
+ * g_tree_foreach does not compare (last, iova_last] range, so we check
193
+ * it here.
194
+ */
195
+ iova_tree_alloc_args_iterate(&args, NULL);
196
+ iova_tree_alloc_map_in_hole(&args);
197
+ }
198
+
199
+ if (!args.iova_found || args.iova_result + map->size > iova_last) {
200
+ return IOVA_ERR_NOMEM;
201
+ }
202
+
203
+ map->iova = args.iova_result;
204
+ return iova_tree_insert(tree, map);
205
+}
206
+
207
void iova_tree_destroy(IOVATree *tree)
208
{
209
g_tree_destroy(tree->tree);
40
--
210
--
41
2.5.0
211
2.7.4
42
212
43
213
diff view generated by jsdifflib
1
From: Lukas Straub <lukasstraub2@web.de>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
This simulates the case that happens when we resume COLO after failover.
3
This function does the reverse operation of iova_tree_find: To look for
4
a mapping that match a translated address so we can do the reverse.
4
5
5
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
6
This have linear complexity instead of logarithmic, but it supports
7
overlapping HVA. Future developments could reduce it.
8
9
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
10
Acked-by: Michael S. Tsirkin <mst@redhat.com>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
12
---
8
tests/test-replication.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
13
include/qemu/iova-tree.h | 20 +++++++++++++++++++-
9
1 file changed, 52 insertions(+)
14
util/iova-tree.c | 34 ++++++++++++++++++++++++++++++++++
15
2 files changed, 53 insertions(+), 1 deletion(-)
10
16
11
diff --git a/tests/test-replication.c b/tests/test-replication.c
17
diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
12
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
13
--- a/tests/test-replication.c
19
--- a/include/qemu/iova-tree.h
14
+++ b/tests/test-replication.c
20
+++ b/include/qemu/iova-tree.h
15
@@ -XXX,XX +XXX,XX @@ static void test_secondary_stop(void)
21
@@ -XXX,XX +XXX,XX @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map);
16
teardown_secondary();
22
* @tree: the iova tree to search from
23
* @map: the mapping to search
24
*
25
- * Search for a mapping in the iova tree that overlaps with the
26
+ * Search for a mapping in the iova tree that iova overlaps with the
27
* mapping range specified. Only the first found mapping will be
28
* returned.
29
*
30
@@ -XXX,XX +XXX,XX @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map);
31
const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map);
32
33
/**
34
+ * iova_tree_find_iova:
35
+ *
36
+ * @tree: the iova tree to search from
37
+ * @map: the mapping to search
38
+ *
39
+ * Search for a mapping in the iova tree that translated_addr overlaps with the
40
+ * mapping range specified. Only the first found mapping will be
41
+ * returned.
42
+ *
43
+ * Return: DMAMap pointer if found, or NULL if not found. Note that
44
+ * the returned DMAMap pointer is maintained internally. User should
45
+ * only read the content but never modify or free the content. Also,
46
+ * user is responsible to make sure the pointer is valid (say, no
47
+ * concurrent deletion in progress).
48
+ */
49
+const DMAMap *iova_tree_find_iova(const IOVATree *tree, const DMAMap *map);
50
+
51
+/**
52
* iova_tree_find_address:
53
*
54
* @tree: the iova tree to search from
55
diff --git a/util/iova-tree.c b/util/iova-tree.c
56
index XXXXXXX..XXXXXXX 100644
57
--- a/util/iova-tree.c
58
+++ b/util/iova-tree.c
59
@@ -XXX,XX +XXX,XX @@ struct IOVATreeAllocArgs {
60
bool iova_found;
61
};
62
63
+typedef struct IOVATreeFindIOVAArgs {
64
+ const DMAMap *needle;
65
+ const DMAMap *result;
66
+} IOVATreeFindIOVAArgs;
67
+
68
/**
69
* Iterate args to the next hole
70
*
71
@@ -XXX,XX +XXX,XX @@ const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map)
72
return g_tree_lookup(tree->tree, map);
17
}
73
}
18
74
19
+static void test_secondary_continuous_replication(void)
75
+static gboolean iova_tree_find_address_iterator(gpointer key, gpointer value,
76
+ gpointer data)
20
+{
77
+{
21
+ BlockBackend *top_blk, *local_blk;
78
+ const DMAMap *map = key;
22
+ Error *local_err = NULL;
79
+ IOVATreeFindIOVAArgs *args = data;
80
+ const DMAMap *needle;
23
+
81
+
24
+ top_blk = start_secondary();
82
+ g_assert(key == value);
25
+ replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
26
+ g_assert(!local_err);
27
+
83
+
28
+ /* write 0x22 to s_local_disk (IMG_SIZE / 2, IMG_SIZE) */
84
+ needle = args->needle;
29
+ local_blk = blk_by_name(S_LOCAL_DISK_ID);
85
+ if (map->translated_addr + map->size < needle->translated_addr ||
30
+ test_blk_write(local_blk, 0x22, IMG_SIZE / 2, IMG_SIZE / 2, false);
86
+ needle->translated_addr + needle->size < map->translated_addr) {
87
+ return false;
88
+ }
31
+
89
+
32
+ /* replication will backup s_local_disk to s_hidden_disk */
90
+ args->result = map;
33
+ test_blk_read(top_blk, 0x11, IMG_SIZE / 2,
91
+ return true;
34
+ IMG_SIZE / 2, 0, IMG_SIZE, false);
35
+
36
+ /* write 0x33 to s_active_disk (0, IMG_SIZE / 2) */
37
+ test_blk_write(top_blk, 0x33, 0, IMG_SIZE / 2, false);
38
+
39
+ /* do failover (active commit) */
40
+ replication_stop_all(true, &local_err);
41
+ g_assert(!local_err);
42
+
43
+ /* it should ignore all requests from now on */
44
+
45
+ /* start after failover */
46
+ replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
47
+ g_assert(!local_err);
48
+
49
+ /* checkpoint */
50
+ replication_do_checkpoint_all(&local_err);
51
+ g_assert(!local_err);
52
+
53
+ /* stop */
54
+ replication_stop_all(true, &local_err);
55
+ g_assert(!local_err);
56
+
57
+ /* read from s_local_disk (0, IMG_SIZE / 2) */
58
+ test_blk_read(top_blk, 0x33, 0, IMG_SIZE / 2,
59
+ 0, IMG_SIZE / 2, false);
60
+
61
+
62
+ /* read from s_local_disk (IMG_SIZE / 2, IMG_SIZE) */
63
+ test_blk_read(top_blk, 0x22, IMG_SIZE / 2,
64
+ IMG_SIZE / 2, 0, IMG_SIZE, false);
65
+
66
+ teardown_secondary();
67
+}
92
+}
68
+
93
+
69
static void test_secondary_do_checkpoint(void)
94
+const DMAMap *iova_tree_find_iova(const IOVATree *tree, const DMAMap *map)
95
+{
96
+ IOVATreeFindIOVAArgs args = {
97
+ .needle = map,
98
+ };
99
+
100
+ g_tree_foreach(tree->tree, iova_tree_find_address_iterator, &args);
101
+ return args.result;
102
+}
103
+
104
const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova)
70
{
105
{
71
BlockBackend *top_blk, *local_blk;
106
const DMAMap map = { .iova = iova, .size = 0 };
72
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
73
g_test_add_func("/replication/secondary/write", test_secondary_write);
74
g_test_add_func("/replication/secondary/start", test_secondary_start);
75
g_test_add_func("/replication/secondary/stop", test_secondary_stop);
76
+ g_test_add_func("/replication/secondary/continuous_replication",
77
+ test_secondary_continuous_replication);
78
g_test_add_func("/replication/secondary/do_checkpoint",
79
test_secondary_do_checkpoint);
80
g_test_add_func("/replication/secondary/get_error_all",
81
--
107
--
82
2.5.0
108
2.7.4
83
109
84
110
diff view generated by jsdifflib
1
From: Lukas Straub <lukasstraub2@web.de>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
To switch the Secondary to Primary, we need to insert new filters
3
This tree is able to look for a translated address from an IOVA address.
4
before the filter-rewriter.
5
4
6
Add the options insert= and position= to be able to insert filters
5
At first glance it is similar to util/iova-tree. However, SVQ working on
7
anywhere in the filter list.
6
devices with limited IOVA space need more capabilities, like allocating
7
IOVA chunks or performing reverse translations (qemu addresses to iova).
8
8
9
position should be "head" or "tail" to insert at the head or
9
The allocation capability, as "assign a free IOVA address to this chunk
10
tail of the filter list or it should be "id=<id>" to specify
10
of memory in qemu's address space" allows shadow virtqueue to create a
11
the id of another filter.
11
new address space that is not restricted by guest's addressable one, so
12
insert should be either "before" or "behind" to specify where to
12
we can allocate shadow vqs vrings outside of it.
13
insert the new filter relative to the one specified with position.
14
13
15
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
14
It duplicates the tree so it can search efficiently in both directions,
16
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
15
and it will signal overlap if iova or the translated address is present
16
in any tree.
17
18
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
19
Acked-by: Michael S. Tsirkin <mst@redhat.com>
17
Signed-off-by: Jason Wang <jasowang@redhat.com>
20
Signed-off-by: Jason Wang <jasowang@redhat.com>
18
---
21
---
19
include/net/filter.h | 2 ++
22
hw/virtio/meson.build | 2 +-
20
net/filter.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++-
23
hw/virtio/vhost-iova-tree.c | 110 ++++++++++++++++++++++++++++++++++++++++++++
21
qemu-options.hx | 31 +++++++++++++++---
24
hw/virtio/vhost-iova-tree.h | 27 +++++++++++
22
3 files changed, 119 insertions(+), 6 deletions(-)
25
3 files changed, 138 insertions(+), 1 deletion(-)
26
create mode 100644 hw/virtio/vhost-iova-tree.c
27
create mode 100644 hw/virtio/vhost-iova-tree.h
23
28
24
diff --git a/include/net/filter.h b/include/net/filter.h
29
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
25
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX 100644
26
--- a/include/net/filter.h
31
--- a/hw/virtio/meson.build
27
+++ b/include/net/filter.h
32
+++ b/hw/virtio/meson.build
28
@@ -XXX,XX +XXX,XX @@ struct NetFilterState {
33
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
29
NetClientState *netdev;
34
30
NetFilterDirection direction;
35
virtio_ss = ss.source_set()
31
bool on;
36
virtio_ss.add(files('virtio.c'))
32
+ char *position;
37
-virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
33
+ bool insert_before_flag;
38
+virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
34
QTAILQ_ENTRY(NetFilterState) next;
39
virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
35
};
40
virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
36
41
virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
37
diff --git a/net/filter.c b/net/filter.c
42
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
38
index XXXXXXX..XXXXXXX 100644
43
new file mode 100644
39
--- a/net/filter.c
44
index XXXXXXX..XXXXXXX
40
+++ b/net/filter.c
45
--- /dev/null
41
@@ -XXX,XX +XXX,XX @@ static void netfilter_set_status(Object *obj, const char *str, Error **errp)
46
+++ b/hw/virtio/vhost-iova-tree.c
42
}
47
@@ -XXX,XX +XXX,XX @@
43
}
48
+/*
44
49
+ * vhost software live migration iova tree
45
+static char *netfilter_get_position(Object *obj, Error **errp)
50
+ *
51
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
52
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
53
+ *
54
+ * SPDX-License-Identifier: GPL-2.0-or-later
55
+ */
56
+
57
+#include "qemu/osdep.h"
58
+#include "qemu/iova-tree.h"
59
+#include "vhost-iova-tree.h"
60
+
61
+#define iova_min_addr qemu_real_host_page_size
62
+
63
+/**
64
+ * VhostIOVATree, able to:
65
+ * - Translate iova address
66
+ * - Reverse translate iova address (from translated to iova)
67
+ * - Allocate IOVA regions for translated range (linear operation)
68
+ */
69
+struct VhostIOVATree {
70
+ /* First addressable iova address in the device */
71
+ uint64_t iova_first;
72
+
73
+ /* Last addressable iova address in the device */
74
+ uint64_t iova_last;
75
+
76
+ /* IOVA address to qemu memory maps. */
77
+ IOVATree *iova_taddr_map;
78
+};
79
+
80
+/**
81
+ * Create a new IOVA tree
82
+ *
83
+ * Returns the new IOVA tree
84
+ */
85
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
46
+{
86
+{
47
+ NetFilterState *nf = NETFILTER(obj);
87
+ VhostIOVATree *tree = g_new(VhostIOVATree, 1);
48
+
88
+
49
+ return g_strdup(nf->position);
89
+ /* Some devices do not like 0 addresses */
90
+ tree->iova_first = MAX(iova_first, iova_min_addr);
91
+ tree->iova_last = iova_last;
92
+
93
+ tree->iova_taddr_map = iova_tree_new();
94
+ return tree;
50
+}
95
+}
51
+
96
+
52
+static void netfilter_set_position(Object *obj, const char *str, Error **errp)
97
+/**
98
+ * Delete an iova tree
99
+ */
100
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
53
+{
101
+{
54
+ NetFilterState *nf = NETFILTER(obj);
102
+ iova_tree_destroy(iova_tree->iova_taddr_map);
55
+
103
+ g_free(iova_tree);
56
+ nf->position = g_strdup(str);
57
+}
104
+}
58
+
105
+
59
+static char *netfilter_get_insert(Object *obj, Error **errp)
106
+/**
107
+ * Find the IOVA address stored from a memory address
108
+ *
109
+ * @tree: The iova tree
110
+ * @map: The map with the memory address
111
+ *
112
+ * Return the stored mapping, or NULL if not found.
113
+ */
114
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
115
+ const DMAMap *map)
60
+{
116
+{
61
+ NetFilterState *nf = NETFILTER(obj);
117
+ return iova_tree_find_iova(tree->iova_taddr_map, map);
62
+
63
+ return nf->insert_before_flag ? g_strdup("before") : g_strdup("behind");
64
+}
118
+}
65
+
119
+
66
+static void netfilter_set_insert(Object *obj, const char *str, Error **errp)
120
+/**
121
+ * Allocate a new mapping
122
+ *
123
+ * @tree: The iova tree
124
+ * @map: The iova map
125
+ *
126
+ * Returns:
127
+ * - IOVA_OK if the map fits in the container
128
+ * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
129
+ * - IOVA_ERR_NOMEM if tree cannot allocate more space.
130
+ *
131
+ * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
132
+ */
133
+int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
67
+{
134
+{
68
+ NetFilterState *nf = NETFILTER(obj);
135
+ /* Some vhost devices do not like addr 0. Skip first page */
136
+ hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
69
+
137
+
70
+ if (strcmp(str, "before") && strcmp(str, "behind")) {
138
+ if (map->translated_addr + map->size < map->translated_addr ||
71
+ error_setg(errp, "Invalid value for netfilter insert, "
139
+ map->perm == IOMMU_NONE) {
72
+ "should be 'before' or 'behind'");
140
+ return IOVA_ERR_INVALID;
73
+ return;
74
+ }
141
+ }
75
+
142
+
76
+ nf->insert_before_flag = !strcmp(str, "before");
143
+ /* Allocate a node in IOVA address */
144
+ return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first,
145
+ tree->iova_last);
77
+}
146
+}
78
+
147
+
79
static void netfilter_init(Object *obj)
148
+/**
80
{
149
+ * Remove existing mappings from iova tree
81
NetFilterState *nf = NETFILTER(obj);
150
+ *
82
151
+ * @iova_tree: The vhost iova tree
83
nf->on = true;
152
+ * @map: The map to remove
84
+ nf->insert_before_flag = false;
153
+ */
85
+ nf->position = g_strdup("tail");
154
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
86
155
+{
87
object_property_add_str(obj, "netdev",
156
+ iova_tree_remove(iova_tree->iova_taddr_map, map);
88
netfilter_get_netdev_id, netfilter_set_netdev_id,
157
+}
89
@@ -XXX,XX +XXX,XX @@ static void netfilter_init(Object *obj)
158
diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
90
object_property_add_str(obj, "status",
159
new file mode 100644
91
netfilter_get_status, netfilter_set_status,
160
index XXXXXXX..XXXXXXX
92
NULL);
161
--- /dev/null
93
+ object_property_add_str(obj, "position",
162
+++ b/hw/virtio/vhost-iova-tree.h
94
+ netfilter_get_position, netfilter_set_position,
163
@@ -XXX,XX +XXX,XX @@
95
+ NULL);
164
+/*
96
+ object_property_add_str(obj, "insert",
165
+ * vhost software live migration iova tree
97
+ netfilter_get_insert, netfilter_set_insert,
166
+ *
98
+ NULL);
167
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
99
}
168
+ * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
100
169
+ *
101
static void netfilter_complete(UserCreatable *uc, Error **errp)
170
+ * SPDX-License-Identifier: GPL-2.0-or-later
102
{
171
+ */
103
NetFilterState *nf = NETFILTER(uc);
104
+ NetFilterState *position = NULL;
105
NetClientState *ncs[MAX_QUEUE_NUM];
106
NetFilterClass *nfc = NETFILTER_GET_CLASS(uc);
107
int queues;
108
@@ -XXX,XX +XXX,XX @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
109
return;
110
}
111
112
+ if (strcmp(nf->position, "head") && strcmp(nf->position, "tail")) {
113
+ Object *container;
114
+ Object *obj;
115
+ char *position_id;
116
+
172
+
117
+ if (!g_str_has_prefix(nf->position, "id=")) {
173
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
118
+ error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "position",
174
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
119
+ "'head', 'tail' or 'id=<id>'");
120
+ return;
121
+ }
122
+
175
+
123
+ /* get the id from the string */
176
+#include "qemu/iova-tree.h"
124
+ position_id = g_strndup(nf->position + 3, strlen(nf->position) - 3);
177
+#include "exec/memory.h"
125
+
178
+
126
+ /* Search for the position to insert before/behind */
179
+typedef struct VhostIOVATree VhostIOVATree;
127
+ container = object_get_objects_root();
128
+ obj = object_resolve_path_component(container, position_id);
129
+ if (!obj) {
130
+ error_setg(errp, "filter '%s' not found", position_id);
131
+ g_free(position_id);
132
+ return;
133
+ }
134
+
180
+
135
+ position = NETFILTER(obj);
181
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
182
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
183
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
136
+
184
+
137
+ if (position->netdev != ncs[0]) {
185
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
138
+ error_setg(errp, "filter '%s' belongs to a different netdev",
186
+ const DMAMap *map);
139
+ position_id);
187
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
140
+ g_free(position_id);
188
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
141
+ return;
142
+ }
143
+
189
+
144
+ g_free(position_id);
190
+#endif
145
+ }
146
+
147
nf->netdev = ncs[0];
148
149
if (nfc->setup) {
150
@@ -XXX,XX +XXX,XX @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
151
return;
152
}
153
}
154
- QTAILQ_INSERT_TAIL(&nf->netdev->filters, nf, next);
155
+
156
+ if (position) {
157
+ if (nf->insert_before_flag) {
158
+ QTAILQ_INSERT_BEFORE(position, nf, next);
159
+ } else {
160
+ QTAILQ_INSERT_AFTER(&nf->netdev->filters, position, nf, next);
161
+ }
162
+ } else if (!strcmp(nf->position, "head")) {
163
+ QTAILQ_INSERT_HEAD(&nf->netdev->filters, nf, next);
164
+ } else if (!strcmp(nf->position, "tail")) {
165
+ QTAILQ_INSERT_TAIL(&nf->netdev->filters, nf, next);
166
+ }
167
}
168
169
static void netfilter_finalize(Object *obj)
170
@@ -XXX,XX +XXX,XX @@ static void netfilter_finalize(Object *obj)
171
QTAILQ_REMOVE(&nf->netdev->filters, nf, next);
172
}
173
g_free(nf->netdev_id);
174
+ g_free(nf->position);
175
}
176
177
static void default_handle_event(NetFilterState *nf, int event, Error **errp)
178
diff --git a/qemu-options.hx b/qemu-options.hx
179
index XXXXXXX..XXXXXXX 100644
180
--- a/qemu-options.hx
181
+++ b/qemu-options.hx
182
@@ -XXX,XX +XXX,XX @@ applications, they can do this through this parameter. Its format is
183
a gnutls priority string as described at
184
@url{https://gnutls.org/manual/html_node/Priority-Strings.html}.
185
186
-@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{on|off}]
187
+@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{on|off}][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
188
189
Interval @var{t} can't be 0, this filter batches the packet delivery: all
190
packets arriving in a given interval on netdev @var{netdevid} are delayed
191
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
192
@option{tx}: the filter is attached to the transmit queue of the netdev,
193
where it will receive packets sent by the netdev.
194
195
-@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
196
+position @var{head|tail|id=<id>} is an option to specify where the
197
+filter should be inserted in the filter list. It can be applied to any
198
+netfilter.
199
+
200
+@option{head}: the filter is inserted at the head of the filter
201
+ list, before any existing filters.
202
+
203
+@option{tail}: the filter is inserted at the tail of the filter
204
+ list, behind any existing filters (default).
205
+
206
+@option{id=<id>}: the filter is inserted before or behind the filter
207
+ specified by <id>, see the insert option below.
208
+
209
+insert @var{behind|before} is an option to specify where to insert the
210
+new filter relative to the one specified with position=id=<id>. It can
211
+be applied to any netfilter.
212
+
213
+@option{before}: insert before the specified filter.
214
+
215
+@option{behind}: insert behind the specified filter (default).
216
+
217
+@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
218
219
filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
220
221
-@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
222
+@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
223
224
filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev
225
@var{chardevid},and redirect indev's packet to filter.if it has the vnet_hdr_support flag,
226
@@ -XXX,XX +XXX,XX @@ Create a filter-redirector we need to differ outdev id from indev id, id can not
227
be the same. we can just use indev or outdev, but at least one of indev or outdev
228
need to be specified.
229
230
-@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support]
231
+@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
232
233
Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
234
secondary from primary to keep secondary tcp connection,and rewrite
235
@@ -XXX,XX +XXX,XX @@ colo secondary:
236
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
237
-object filter-rewriter,id=rew0,netdev=hn0,queue=all
238
239
-@item -object filter-dump,id=@var{id},netdev=@var{dev}[,file=@var{filename}][,maxlen=@var{len}]
240
+@item -object filter-dump,id=@var{id},netdev=@var{dev}[,file=@var{filename}][,maxlen=@var{len}][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
241
242
Dump the network traffic on netdev @var{dev} to the file specified by
243
@var{filename}. At most @var{len} bytes (64k by default) per packet are stored.
244
--
191
--
245
2.5.0
192
2.7.4
246
193
247
194
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
According to the datasheet, section 3.4.4, "in 32-bit mode ... the SONIC
3
Use translations added in VhostIOVATree in SVQ.
4
always writes long words".
4
5
5
Only introduce usage here, not allocation and deallocation. As with
6
Therefore, use the same technique for the 'in_use' field that is used
6
previous patches, we use the dead code paths of shadow_vqs_enabled to
7
everywhere else, and write the full long word.
7
avoid commiting too many changes at once. These are impossible to take
8
8
at the moment.
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
9
10
Tested-by: Laurent Vivier <laurent@vivier.eu>
10
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
11
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
11
Acked-by: Michael S. Tsirkin <mst@redhat.com>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
---
13
---
14
hw/net/dp8393x.c | 17 ++++++-----------
14
hw/virtio/vhost-shadow-virtqueue.c | 86 +++++++++++++++++++++++---
15
1 file changed, 6 insertions(+), 11 deletions(-)
15
hw/virtio/vhost-shadow-virtqueue.h | 6 +-
16
16
hw/virtio/vhost-vdpa.c | 122 +++++++++++++++++++++++++++++++------
17
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
17
include/hw/virtio/vhost-vdpa.h | 3 +
18
4 files changed, 187 insertions(+), 30 deletions(-)
19
20
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
18
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
19
--- a/hw/net/dp8393x.c
22
--- a/hw/virtio/vhost-shadow-virtqueue.c
20
+++ b/hw/net/dp8393x.c
23
+++ b/hw/virtio/vhost-shadow-virtqueue.c
21
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
24
@@ -XXX,XX +XXX,XX @@ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
22
return -1;
25
return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
26
}
27
28
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
29
+/**
30
+ * Translate addresses between the qemu's virtual address and the SVQ IOVA
31
+ *
32
+ * @svq: Shadow VirtQueue
33
+ * @vaddr: Translated IOVA addresses
34
+ * @iovec: Source qemu's VA addresses
35
+ * @num: Length of iovec and minimum length of vaddr
36
+ */
37
+static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
38
+ hwaddr *addrs, const struct iovec *iovec,
39
+ size_t num)
40
+{
41
+ if (num == 0) {
42
+ return true;
43
+ }
44
+
45
+ for (size_t i = 0; i < num; ++i) {
46
+ DMAMap needle = {
47
+ .translated_addr = (hwaddr)(uintptr_t)iovec[i].iov_base,
48
+ .size = iovec[i].iov_len,
49
+ };
50
+ Int128 needle_last, map_last;
51
+ size_t off;
52
+
53
+ const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
54
+ /*
55
+ * Map cannot be NULL since iova map contains all guest space and
56
+ * qemu already has a physical address mapped
57
+ */
58
+ if (unlikely(!map)) {
59
+ qemu_log_mask(LOG_GUEST_ERROR,
60
+ "Invalid address 0x%"HWADDR_PRIx" given by guest",
61
+ needle.translated_addr);
62
+ return false;
63
+ }
64
+
65
+ off = needle.translated_addr - map->translated_addr;
66
+ addrs[i] = map->iova + off;
67
+
68
+ needle_last = int128_add(int128_make64(needle.translated_addr),
69
+ int128_make64(iovec[i].iov_len));
70
+ map_last = int128_make64(map->translated_addr + map->size);
71
+ if (unlikely(int128_gt(needle_last, map_last))) {
72
+ qemu_log_mask(LOG_GUEST_ERROR,
73
+ "Guest buffer expands over iova range");
74
+ return false;
75
+ }
76
+ }
77
+
78
+ return true;
79
+}
80
+
81
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
82
const struct iovec *iovec, size_t num,
83
bool more_descs, bool write)
84
{
85
@@ -XXX,XX +XXX,XX @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
86
} else {
87
descs[i].flags = flags;
88
}
89
- descs[i].addr = cpu_to_le64((hwaddr)(intptr_t)iovec[n].iov_base);
90
+ descs[i].addr = cpu_to_le64(sg[n]);
91
descs[i].len = cpu_to_le32(iovec[n].iov_len);
92
93
last = i;
94
@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
95
{
96
unsigned avail_idx;
97
vring_avail_t *avail = svq->vring.avail;
98
+ bool ok;
99
+ g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
100
101
*head = svq->free_head;
102
103
@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
104
return false;
23
}
105
}
24
106
25
- /* XXX: Check byte ordering */
107
- vhost_vring_write_descs(svq, elem->out_sg, elem->out_num, elem->in_num > 0,
26
-
108
- false);
27
/* Check for EOL */
109
- vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
28
if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
110
+ ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
29
/* Are we still in resource exhaustion? */
111
+ if (unlikely(!ok)) {
30
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
112
+ return false;
31
/* EOL detected */
113
+ }
32
s->regs[SONIC_ISR] |= SONIC_ISR_RDE;
114
+ vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
33
} else {
115
+ elem->in_num > 0, false);
34
- /* Clear in_use, but it is always 16bit wide */
116
+
35
- int offset = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
117
+
36
- if (s->big_endian && width == 2) {
118
+ ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
37
- /* we need to adjust the offset of the 16bit field */
119
+ if (unlikely(!ok)) {
38
- offset += sizeof(uint16_t);
120
+ return false;
39
- }
121
+ }
40
- s->data[0] = 0;
122
+
41
- address_space_write(&s->as, offset, MEMTXATTRS_UNSPECIFIED,
123
+ vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
42
- s->data, sizeof(uint16_t));
124
43
+ /* Clear in_use */
125
/*
44
+ size = sizeof(uint16_t) * width;
126
* Put the entry in the available array (but don't update avail->idx until
45
+ address = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
127
@@ -XXX,XX +XXX,XX @@ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd)
46
+ dp8393x_put(s, width, 0, 0);
128
void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
47
+ address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
129
struct vhost_vring_addr *addr)
48
+ s->data, size);
130
{
49
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
131
- addr->desc_user_addr = (uint64_t)(intptr_t)svq->vring.desc;
50
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
132
- addr->avail_user_addr = (uint64_t)(intptr_t)svq->vring.avail;
51
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
133
- addr->used_user_addr = (uint64_t)(intptr_t)svq->vring.used;
134
+ addr->desc_user_addr = (uint64_t)(uintptr_t)svq->vring.desc;
135
+ addr->avail_user_addr = (uint64_t)(uintptr_t)svq->vring.avail;
136
+ addr->used_user_addr = (uint64_t)(uintptr_t)svq->vring.used;
137
}
138
139
size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
140
@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
141
* Creates vhost shadow virtqueue, and instructs the vhost device to use the
142
* shadow methods and file descriptors.
143
*
144
+ * @iova_tree: Tree to perform descriptors translations
145
+ *
146
* Returns the new virtqueue or NULL.
147
*
148
* In case of error, reason is reported through error_report.
149
*/
150
-VhostShadowVirtqueue *vhost_svq_new(void)
151
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
152
{
153
g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
154
int r;
155
@@ -XXX,XX +XXX,XX @@ VhostShadowVirtqueue *vhost_svq_new(void)
156
157
event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
158
event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
159
+ svq->iova_tree = iova_tree;
160
return g_steal_pointer(&svq);
161
162
err_init_hdev_call:
163
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
164
index XXXXXXX..XXXXXXX 100644
165
--- a/hw/virtio/vhost-shadow-virtqueue.h
166
+++ b/hw/virtio/vhost-shadow-virtqueue.h
167
@@ -XXX,XX +XXX,XX @@
168
#include "qemu/event_notifier.h"
169
#include "hw/virtio/virtio.h"
170
#include "standard-headers/linux/vhost_types.h"
171
+#include "hw/virtio/vhost-iova-tree.h"
172
173
/* Shadow virtqueue to relay notifications */
174
typedef struct VhostShadowVirtqueue {
175
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
176
/* Virtio device */
177
VirtIODevice *vdev;
178
179
+ /* IOVA mapping */
180
+ VhostIOVATree *iova_tree;
181
+
182
/* Map for use the guest's descriptors */
183
VirtQueueElement **ring_id_maps;
184
185
@@ -XXX,XX +XXX,XX @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
186
VirtQueue *vq);
187
void vhost_svq_stop(VhostShadowVirtqueue *svq);
188
189
-VhostShadowVirtqueue *vhost_svq_new(void);
190
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
191
192
void vhost_svq_free(gpointer vq);
193
G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
194
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
195
index XXXXXXX..XXXXXXX 100644
196
--- a/hw/virtio/vhost-vdpa.c
197
+++ b/hw/virtio/vhost-vdpa.c
198
@@ -XXX,XX +XXX,XX @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
199
vaddr, section->readonly);
200
201
llsize = int128_sub(llend, int128_make64(iova));
202
+ if (v->shadow_vqs_enabled) {
203
+ DMAMap mem_region = {
204
+ .translated_addr = (hwaddr)(uintptr_t)vaddr,
205
+ .size = int128_get64(llsize) - 1,
206
+ .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
207
+ };
208
+
209
+ int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
210
+ if (unlikely(r != IOVA_OK)) {
211
+ error_report("Can't allocate a mapping (%d)", r);
212
+ goto fail;
213
+ }
214
+
215
+ iova = mem_region.iova;
216
+ }
217
218
vhost_vdpa_iotlb_batch_begin_once(v);
219
ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
220
@@ -XXX,XX +XXX,XX @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
221
222
llsize = int128_sub(llend, int128_make64(iova));
223
224
+ if (v->shadow_vqs_enabled) {
225
+ const DMAMap *result;
226
+ const void *vaddr = memory_region_get_ram_ptr(section->mr) +
227
+ section->offset_within_region +
228
+ (iova - section->offset_within_address_space);
229
+ DMAMap mem_region = {
230
+ .translated_addr = (hwaddr)(uintptr_t)vaddr,
231
+ .size = int128_get64(llsize) - 1,
232
+ };
233
+
234
+ result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
235
+ iova = result->iova;
236
+ vhost_iova_tree_remove(v->iova_tree, &mem_region);
237
+ }
238
vhost_vdpa_iotlb_batch_begin_once(v);
239
ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
240
if (ret) {
241
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
242
243
shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
244
for (unsigned n = 0; n < hdev->nvqs; ++n) {
245
- g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();
246
+ g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
247
248
if (unlikely(!svq)) {
249
error_setg(errp, "Cannot create svq %u", n);
250
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
251
/**
252
* Unmap a SVQ area in the device
253
*/
254
-static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
255
- hwaddr size)
256
+static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
257
+ const DMAMap *needle)
258
{
259
+ const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
260
+ hwaddr size;
261
int r;
262
263
- size = ROUND_UP(size, qemu_real_host_page_size);
264
- r = vhost_vdpa_dma_unmap(v, iova, size);
265
+ if (unlikely(!result)) {
266
+ error_report("Unable to find SVQ address to unmap");
267
+ return false;
268
+ }
269
+
270
+ size = ROUND_UP(result->size, qemu_real_host_page_size);
271
+ r = vhost_vdpa_dma_unmap(v, result->iova, size);
272
return r == 0;
273
}
274
275
static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
276
const VhostShadowVirtqueue *svq)
277
{
278
+ DMAMap needle = {};
279
struct vhost_vdpa *v = dev->opaque;
280
struct vhost_vring_addr svq_addr;
281
- size_t device_size = vhost_svq_device_area_size(svq);
282
- size_t driver_size = vhost_svq_driver_area_size(svq);
283
bool ok;
284
285
vhost_svq_get_vring_addr(svq, &svq_addr);
286
287
- ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
288
+ needle.translated_addr = svq_addr.desc_user_addr;
289
+ ok = vhost_vdpa_svq_unmap_ring(v, &needle);
290
if (unlikely(!ok)) {
291
return false;
292
}
293
294
- return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
295
+ needle.translated_addr = svq_addr.used_user_addr;
296
+ return vhost_vdpa_svq_unmap_ring(v, &needle);
297
+}
298
+
299
+/**
300
+ * Map the SVQ area in the device
301
+ *
302
+ * @v: Vhost-vdpa device
303
+ * @needle: The area to search iova
304
+ * @errorp: Error pointer
305
+ */
306
+static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
307
+ Error **errp)
308
+{
309
+ int r;
310
+
311
+ r = vhost_iova_tree_map_alloc(v->iova_tree, needle);
312
+ if (unlikely(r != IOVA_OK)) {
313
+ error_setg(errp, "Cannot allocate iova (%d)", r);
314
+ return false;
315
+ }
316
+
317
+ r = vhost_vdpa_dma_map(v, needle->iova, needle->size + 1,
318
+ (void *)(uintptr_t)needle->translated_addr,
319
+ needle->perm == IOMMU_RO);
320
+ if (unlikely(r != 0)) {
321
+ error_setg_errno(errp, -r, "Cannot map region to device");
322
+ vhost_iova_tree_remove(v->iova_tree, needle);
323
+ }
324
+
325
+ return r == 0;
326
}
327
328
/**
329
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
330
struct vhost_vring_addr *addr,
331
Error **errp)
332
{
333
+ DMAMap device_region, driver_region;
334
+ struct vhost_vring_addr svq_addr;
335
struct vhost_vdpa *v = dev->opaque;
336
size_t device_size = vhost_svq_device_area_size(svq);
337
size_t driver_size = vhost_svq_driver_area_size(svq);
338
- int r;
339
+ size_t avail_offset;
340
+ bool ok;
341
342
ERRP_GUARD();
343
- vhost_svq_get_vring_addr(svq, addr);
344
+ vhost_svq_get_vring_addr(svq, &svq_addr);
345
346
- r = vhost_vdpa_dma_map(v, addr->desc_user_addr, driver_size,
347
- (void *)(uintptr_t)addr->desc_user_addr, true);
348
- if (unlikely(r != 0)) {
349
- error_setg_errno(errp, -r, "Cannot create vq driver region: ");
350
+ driver_region = (DMAMap) {
351
+ .translated_addr = svq_addr.desc_user_addr,
352
+ .size = driver_size - 1,
353
+ .perm = IOMMU_RO,
354
+ };
355
+ ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp);
356
+ if (unlikely(!ok)) {
357
+ error_prepend(errp, "Cannot create vq driver region: ");
358
return false;
359
}
360
+ addr->desc_user_addr = driver_region.iova;
361
+ avail_offset = svq_addr.avail_user_addr - svq_addr.desc_user_addr;
362
+ addr->avail_user_addr = driver_region.iova + avail_offset;
363
364
- r = vhost_vdpa_dma_map(v, addr->used_user_addr, device_size,
365
- (void *)(intptr_t)addr->used_user_addr, false);
366
- if (unlikely(r != 0)) {
367
- error_setg_errno(errp, -r, "Cannot create vq device region: ");
368
+ device_region = (DMAMap) {
369
+ .translated_addr = svq_addr.used_user_addr,
370
+ .size = device_size - 1,
371
+ .perm = IOMMU_RW,
372
+ };
373
+ ok = vhost_vdpa_svq_map_ring(v, &device_region, errp);
374
+ if (unlikely(!ok)) {
375
+ error_prepend(errp, "Cannot create vq device region: ");
376
+ vhost_vdpa_svq_unmap_ring(v, &driver_region);
377
}
378
+ addr->used_user_addr = device_region.iova;
379
380
- return r == 0;
381
+ return ok;
382
}
383
384
static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
385
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
386
index XXXXXXX..XXXXXXX 100644
387
--- a/include/hw/virtio/vhost-vdpa.h
388
+++ b/include/hw/virtio/vhost-vdpa.h
389
@@ -XXX,XX +XXX,XX @@
390
391
#include <gmodule.h>
392
393
+#include "hw/virtio/vhost-iova-tree.h"
394
#include "hw/virtio/virtio.h"
395
#include "standard-headers/linux/vhost_types.h"
396
397
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
398
MemoryListener listener;
399
struct vhost_vdpa_iova_range iova_range;
400
bool shadow_vqs_enabled;
401
+ /* IOVA mapping used by the Shadow Virtqueue */
402
+ VhostIOVATree *iova_tree;
403
GPtrArray *shadow_vqs;
404
struct vhost_dev *dev;
405
VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
52
--
406
--
53
2.5.0
407
2.7.4
54
408
55
409
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
Add a bounds check to prevent a large packet from causing a buffer
3
This is needed to achieve migration, so the destination can restore its
4
overflow. This is defensive programming -- I haven't actually tried
4
index.
5
sending an oversized packet or a jumbo ethernet frame.
6
5
7
The SONIC handles packets that are too big for the buffer by raising
6
Setting base as last used idx, so destination will see as available all
8
the RBAE interrupt and dropping them. Linux uses that interrupt to
7
the entries that the device did not use, including the in-flight
9
count dropped packets.
8
processing ones.
10
9
11
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
This is ok for networking, but other kinds of devices might have
12
Tested-by: Laurent Vivier <laurent@vivier.eu>
11
problems with these retransmissions.
12
13
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
14
Acked-by: Michael S. Tsirkin <mst@redhat.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
16
---
15
hw/net/dp8393x.c | 9 +++++++++
17
hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++
16
1 file changed, 9 insertions(+)
18
1 file changed, 17 insertions(+)
17
19
18
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
20
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
19
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/net/dp8393x.c
22
--- a/hw/virtio/vhost-vdpa.c
21
+++ b/hw/net/dp8393x.c
23
+++ b/hw/virtio/vhost-vdpa.c
22
@@ -XXX,XX +XXX,XX @@ do { printf("sonic ERROR: %s: " fmt, __func__ , ## __VA_ARGS__); } while (0)
24
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
23
#define SONIC_TCR_CRCI 0x2000
25
static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
24
#define SONIC_TCR_PINT 0x8000
26
struct vhost_vring_state *ring)
25
27
{
26
+#define SONIC_ISR_RBAE 0x0010
28
+ struct vhost_vdpa *v = dev->opaque;
27
#define SONIC_ISR_RBE 0x0020
29
int ret;
28
#define SONIC_ISR_RDE 0x0040
30
29
#define SONIC_ISR_TC 0x0080
31
+ if (v->shadow_vqs_enabled) {
30
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
32
+ VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
31
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
33
+ ring->index);
32
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
34
+
33
35
+ /*
34
+ if (pkt_size + 4 > dp8393x_rbwc(s) * 2) {
36
+ * Setting base as last used idx, so destination will see as available
35
+ DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
37
+ * all the entries that the device did not use, including the in-flight
36
+ s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
38
+ * processing ones.
37
+ dp8393x_update_irq(s);
39
+ *
38
+ dp8393x_do_read_rra(s);
40
+ * TODO: This is ok for networking, but other kinds of devices might
39
+ return pkt_size;
41
+ * have problems with these retransmissions.
42
+ */
43
+ ring->num = svq->last_used_idx;
44
+ return 0;
40
+ }
45
+ }
41
+
46
+
42
packet_type = dp8393x_receive_filter(s, buf, pkt_size);
47
ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
43
if (packet_type < 0) {
48
trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
44
DPRINTF("packet not for netcard\n");
49
return ret;
45
--
50
--
46
2.5.0
51
2.7.4
47
52
48
53
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
These operations need to take place regardless of whether or not
3
Setting the log address would make the device start reporting invalid
4
rx descriptors have been used up (that is, EOL flag was observed).
4
dirty memory because the SVQ vrings are located in qemu's memory.
5
5
6
The algorithm is now the same for a packet that was withheld as for
6
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
7
a packet that was not.
7
Acked-by: Michael S. Tsirkin <mst@redhat.com>
8
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
Tested-by: Laurent Vivier <laurent@vivier.eu>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
---
9
---
13
hw/net/dp8393x.c | 12 +++++++-----
10
hw/virtio/vhost-vdpa.c | 3 ++-
14
1 file changed, 7 insertions(+), 5 deletions(-)
11
1 file changed, 2 insertions(+), 1 deletion(-)
15
12
16
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
13
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
17
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
18
--- a/hw/net/dp8393x.c
15
--- a/hw/virtio/vhost-vdpa.c
19
+++ b/hw/net/dp8393x.c
16
+++ b/hw/virtio/vhost-vdpa.c
20
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
17
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
21
/* Move to next descriptor */
18
static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
22
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
19
struct vhost_log *log)
23
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
20
{
24
- s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
21
- if (vhost_vdpa_one_time_request(dev)) {
25
+ }
22
+ struct vhost_vdpa *v = dev->opaque;
26
23
+ if (v->shadow_vqs_enabled || vhost_vdpa_one_time_request(dev)) {
27
- if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
24
return 0;
28
- /* Read next RRA */
29
- dp8393x_do_read_rra(s);
30
- }
31
+ s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) |
32
+ ((s->regs[SONIC_RSC] + 1) & 0x00ff);
33
+
34
+ if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
35
+ /* Read next RRA */
36
+ dp8393x_do_read_rra(s);
37
}
25
}
38
26
39
/* Done */
40
--
27
--
41
2.5.0
28
2.7.4
42
29
43
30
diff view generated by jsdifflib
1
From: Finn Thain <fthain@telegraphics.com.au>
1
From: Eugenio Pérez <eperezma@redhat.com>
2
2
3
Follow the algorithm given in the National Semiconductor DP83932C
3
SVQ is able to log the dirty bits by itself, so let's use it to not
4
datasheet in section 3.4.7:
4
block migration.
5
5
6
At the next reception, the SONIC re-reads the last RXpkt.link field,
6
Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
7
and updates its CRDA register to point to the next descriptor.
7
enabled. Even if the device supports it, the reports would be nonsense
8
because SVQ memory is in the qemu region.
8
9
9
The chip is designed to allow the host to provide a new list of
10
The log region is still allocated. Future changes might skip that, but
10
descriptors in this way.
11
this series is already long enough.
11
12
12
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
13
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
13
Tested-by: Laurent Vivier <laurent@vivier.eu>
14
Acked-by: Michael S. Tsirkin <mst@redhat.com>
14
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
16
---
16
---
17
hw/net/dp8393x.c | 11 +++++++----
17
hw/virtio/vhost-vdpa.c | 39 +++++++++++++++++++++++++++++++++++----
18
1 file changed, 7 insertions(+), 4 deletions(-)
18
include/hw/virtio/vhost-vdpa.h | 1 +
19
2 files changed, 36 insertions(+), 4 deletions(-)
19
20
20
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
21
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
21
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
22
--- a/hw/net/dp8393x.c
23
--- a/hw/virtio/vhost-vdpa.c
23
+++ b/hw/net/dp8393x.c
24
+++ b/hw/virtio/vhost-vdpa.c
24
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
25
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
25
address = dp8393x_crda(s) + sizeof(uint16_t) * 5 * width;
26
return v->index != 0;
26
address_space_read(&s->as, address, MEMTXATTRS_UNSPECIFIED,
27
}
27
s->data, size);
28
28
- if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
29
+static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
29
+ s->regs[SONIC_LLFA] = dp8393x_get(s, width, 0);
30
+ uint64_t *features)
30
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
31
+{
31
/* Still EOL ; stop reception */
32
+ int ret;
32
return -1;
33
+
33
- } else {
34
+ ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
34
- s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
35
+ trace_vhost_vdpa_get_features(dev, *features);
35
}
36
+ return ret;
36
+ /* Link has been updated by host */
37
+}
37
+ s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
38
+
39
static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
40
Error **errp)
41
{
42
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
43
return 0;
38
}
44
}
39
45
40
/* Save current position */
46
- r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);
41
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
47
+ r = vhost_vdpa_get_dev_features(hdev, &dev_features);
42
MEMTXATTRS_UNSPECIFIED,
48
if (r != 0) {
43
s->data, size);
49
error_setg_errno(errp, -r, "Can't get vdpa device features");
44
50
return r;
45
- /* Move to next descriptor */
51
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
46
+ /* Check link field */
52
static int vhost_vdpa_set_features(struct vhost_dev *dev,
47
size = sizeof(uint16_t) * width;
53
uint64_t features)
48
address_space_read(&s->as,
54
{
49
dp8393x_crda(s) + sizeof(uint16_t) * 5 * width,
55
+ struct vhost_vdpa *v = dev->opaque;
50
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
56
int ret;
51
dp8393x_put(s, width, 0, 0);
57
52
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
58
if (vhost_vdpa_one_time_request(dev)) {
53
s->data, size);
59
return 0;
60
}
61
62
+ if (v->shadow_vqs_enabled) {
63
+ if ((v->acked_features ^ features) == BIT_ULL(VHOST_F_LOG_ALL)) {
64
+ /*
65
+ * QEMU is just trying to enable or disable logging. SVQ handles
66
+ * this sepparately, so no need to forward this.
67
+ */
68
+ v->acked_features = features;
69
+ return 0;
70
+ }
54
+
71
+
55
+ /* Move to next descriptor */
72
+ v->acked_features = features;
56
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
73
+
57
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
74
+ /* We must not ack _F_LOG if SVQ is enabled */
58
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
75
+ features &= ~BIT_ULL(VHOST_F_LOG_ALL);
76
+ }
77
+
78
trace_vhost_vdpa_set_features(dev, features);
79
ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
80
if (ret) {
81
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
82
static int vhost_vdpa_get_features(struct vhost_dev *dev,
83
uint64_t *features)
84
{
85
- int ret;
86
+ struct vhost_vdpa *v = dev->opaque;
87
+ int ret = vhost_vdpa_get_dev_features(dev, features);
88
+
89
+ if (ret == 0 && v->shadow_vqs_enabled) {
90
+ /* Add SVQ logging capabilities */
91
+ *features |= BIT_ULL(VHOST_F_LOG_ALL);
92
+ }
93
94
- ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
95
- trace_vhost_vdpa_get_features(dev, *features);
96
return ret;
97
}
98
99
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
100
index XXXXXXX..XXXXXXX 100644
101
--- a/include/hw/virtio/vhost-vdpa.h
102
+++ b/include/hw/virtio/vhost-vdpa.h
103
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
104
bool iotlb_batch_begin_sent;
105
MemoryListener listener;
106
struct vhost_vdpa_iova_range iova_range;
107
+ uint64_t acked_features;
108
bool shadow_vqs_enabled;
109
/* IOVA mapping used by the Shadow Virtqueue */
110
VhostIOVATree *iova_tree;
59
--
111
--
60
2.5.0
112
2.7.4
61
113
62
114
diff view generated by jsdifflib
Deleted patch
1
From: Finn Thain <fthain@telegraphics.com.au>
2
1
3
A received packet consumes pkt_size bytes in the buffer and the frame
4
checksum that's appended to it consumes another 4 bytes. The Receive
5
Buffer Address register takes the former quantity into account but
6
not the latter. So the next packet written to the buffer overwrites
7
the frame checksum. Fix this.
8
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
11
Tested-by: Laurent Vivier <laurent@vivier.eu>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
---
14
hw/net/dp8393x.c | 1 +
15
1 file changed, 1 insertion(+)
16
17
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/hw/net/dp8393x.c
20
+++ b/hw/net/dp8393x.c
21
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
22
address += rx_len;
23
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
24
&checksum, 4);
25
+ address += 4;
26
rx_len += 4;
27
s->regs[SONIC_CRBA1] = address >> 16;
28
s->regs[SONIC_CRBA0] = address & 0xffff;
29
--
30
2.5.0
31
32
diff view generated by jsdifflib
Deleted patch
1
From: Finn Thain <fthain@telegraphics.com.au>
2
1
3
Section 3.4.1 of the datasheet says,
4
5
The alignment of the RRA is confined to either word or long word
6
boundaries, depending upon the data width mode. In 16-bit mode,
7
the RRA must be aligned to a word boundary (A0 is always zero)
8
and in 32-bit mode, the RRA is aligned to a long word boundary
9
(A0 and A1 are always zero).
10
11
This constraint has been implemented for 16-bit mode; implement it
12
for 32-bit mode too.
13
14
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
15
Tested-by: Laurent Vivier <laurent@vivier.eu>
16
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
17
Signed-off-by: Jason Wang <jasowang@redhat.com>
18
---
19
hw/net/dp8393x.c | 8 ++++++--
20
1 file changed, 6 insertions(+), 2 deletions(-)
21
22
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
23
index XXXXXXX..XXXXXXX 100644
24
--- a/hw/net/dp8393x.c
25
+++ b/hw/net/dp8393x.c
26
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
27
qemu_flush_queued_packets(qemu_get_queue(s->nic));
28
}
29
break;
30
- /* Ignore least significant bit */
31
+ /* The guest is required to store aligned pointers here */
32
case SONIC_RSA:
33
case SONIC_REA:
34
case SONIC_RRP:
35
case SONIC_RWP:
36
- s->regs[reg] = val & 0xfffe;
37
+ if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
38
+ s->regs[reg] = val & 0xfffc;
39
+ } else {
40
+ s->regs[reg] = val & 0xfffe;
41
+ }
42
break;
43
/* Invert written value for some registers */
44
case SONIC_CRCT:
45
--
46
2.5.0
47
48
diff view generated by jsdifflib
Deleted patch
1
From: Finn Thain <fthain@telegraphics.com.au>
2
1
3
The jazzsonic driver in Linux uses the Silicon Revision register value
4
to probe the chip. The driver fails unless the SR register contains 4.
5
Unfortunately, reading this register in QEMU usually returns 0 because
6
the s->regs[] array gets wiped after a software reset.
7
8
Fixes: bd8f1ebce4 ("net/dp8393x: fix hardware reset")
9
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
10
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
---
13
hw/net/dp8393x.c | 2 +-
14
1 file changed, 1 insertion(+), 1 deletion(-)
15
16
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/hw/net/dp8393x.c
19
+++ b/hw/net/dp8393x.c
20
@@ -XXX,XX +XXX,XX @@ static void dp8393x_reset(DeviceState *dev)
21
timer_del(s->watchdog);
22
23
memset(s->regs, 0, sizeof(s->regs));
24
+ s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux/mips */
25
s->regs[SONIC_CR] = SONIC_CR_RST | SONIC_CR_STP | SONIC_CR_RXDIS;
26
s->regs[SONIC_DCR] &= ~(SONIC_DCR_EXBUS | SONIC_DCR_LBR);
27
s->regs[SONIC_RCR] &= ~(SONIC_RCR_LB0 | SONIC_RCR_LB1 | SONIC_RCR_BRD | SONIC_RCR_RNT);
28
@@ -XXX,XX +XXX,XX @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
29
qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
30
31
s->watchdog = timer_new_ns(QEMU_CLOCK_VIRTUAL, dp8393x_watchdog, s);
32
- s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
33
34
memory_region_init_ram(&s->prom, OBJECT(dev),
35
"dp8393x-prom", SONIC_PROM_SIZE, &local_err);
36
--
37
2.5.0
38
39
diff view generated by jsdifflib
Deleted patch
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
2
1
3
When requested to calculate the hash for TCPV6 packet,
4
ignore overrides of source and destination addresses
5
in in extension headers.
6
Use these overrides when new hash type NetPktRssIpV6TcpEx
7
requested.
8
Use this type in e1000e hash calculation for IPv6 TCP, which
9
should take in account overrides of the addresses.
10
11
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
12
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
15
hw/net/e1000e_core.c | 2 +-
16
hw/net/net_rx_pkt.c | 2 +-
17
2 files changed, 2 insertions(+), 2 deletions(-)
18
19
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
20
index XXXXXXX..XXXXXXX 100644
21
--- a/hw/net/e1000e_core.c
22
+++ b/hw/net/e1000e_core.c
23
@@ -XXX,XX +XXX,XX @@ e1000e_rss_calc_hash(E1000ECore *core,
24
type = NetPktRssIpV4Tcp;
25
break;
26
case E1000_MRQ_RSS_TYPE_IPV6TCP:
27
- type = NetPktRssIpV6Tcp;
28
+ type = NetPktRssIpV6TcpEx;
29
break;
30
case E1000_MRQ_RSS_TYPE_IPV6:
31
type = NetPktRssIpV6;
32
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
33
index XXXXXXX..XXXXXXX 100644
34
--- a/hw/net/net_rx_pkt.c
35
+++ b/hw/net/net_rx_pkt.c
36
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
37
assert(pkt->isip6);
38
assert(pkt->istcp);
39
trace_net_rx_pkt_rss_ip6_tcp();
40
- _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
41
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, false, &rss_length);
42
_net_rx_rss_prepare_tcp(&rss_input[0], pkt, &rss_length);
43
break;
44
case NetPktRssIpV6:
45
--
46
2.5.0
47
48
diff view generated by jsdifflib
Deleted patch
1
From: Bin Meng <bmeng.cn@gmail.com>
2
1
3
When CADENCE_GEM_ERR_DEBUG is turned on, there are several
4
compilation errors in DB_PRINT(). Fix them.
5
6
While we are here, update to use appropriate modifiers in
7
the same DB_PRINT() call.
8
9
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
10
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
---
13
hw/net/cadence_gem.c | 11 ++++++-----
14
1 file changed, 6 insertions(+), 5 deletions(-)
15
16
diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/hw/net/cadence_gem.c
19
+++ b/hw/net/cadence_gem.c
20
@@ -XXX,XX +XXX,XX @@ static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size)
21
return -1;
22
}
23
24
- DB_PRINT("copy %d bytes to 0x%x\n", MIN(bytes_to_copy, rxbufsize),
25
- rx_desc_get_buffer(s->rx_desc[q]));
26
+ DB_PRINT("copy %u bytes to 0x%" PRIx64 "\n",
27
+ MIN(bytes_to_copy, rxbufsize),
28
+ rx_desc_get_buffer(s, s->rx_desc[q]));
29
30
/* Copy packet data to emulated DMA buffer */
31
address_space_write(&s->dma_as, rx_desc_get_buffer(s, s->rx_desc[q]) +
32
@@ -XXX,XX +XXX,XX @@ static void gem_transmit(CadenceGEMState *s)
33
34
if (tx_desc_get_length(desc) > sizeof(tx_packet) -
35
(p - tx_packet)) {
36
- DB_PRINT("TX descriptor @ 0x%x too large: size 0x%x space " \
37
- "0x%x\n", (unsigned)packet_desc_addr,
38
- (unsigned)tx_desc_get_length(desc),
39
+ DB_PRINT("TX descriptor @ 0x%" HWADDR_PRIx \
40
+ " too large: size 0x%x space 0x%zx\n",
41
+ packet_desc_addr, tx_desc_get_length(desc),
42
sizeof(tx_packet) - (p - tx_packet));
43
break;
44
}
45
--
46
2.5.0
47
48
diff view generated by jsdifflib
Deleted patch
1
From: Lukas Straub <lukasstraub2@web.de>
2
1
3
Document the qemu command-line and qmp commands for continuous replication
4
5
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
8
docs/COLO-FT.txt | 224 +++++++++++++++++++++++++++++++++------------
9
docs/block-replication.txt | 28 ++++--
10
2 files changed, 184 insertions(+), 68 deletions(-)
11
12
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
13
index XXXXXXX..XXXXXXX 100644
14
--- a/docs/COLO-FT.txt
15
+++ b/docs/COLO-FT.txt
16
@@ -XXX,XX +XXX,XX @@ The diagram just shows the main qmp command, you can get the detail
17
in test procedure.
18
19
== Test procedure ==
20
-1. Startup qemu
21
-Primary:
22
-# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name primary \
23
- -device piix3-usb-uhci -vnc :7 \
24
- -device usb-tablet -netdev tap,id=hn0,vhost=off \
25
- -device virtio-net-pci,id=net-pci0,netdev=hn0 \
26
- -drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
27
- children.0.file.filename=1.raw,\
28
- children.0.driver=raw -S
29
-Secondary:
30
-# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name secondary \
31
- -device piix3-usb-uhci -vnc :7 \
32
- -device usb-tablet -netdev tap,id=hn0,vhost=off \
33
- -device virtio-net-pci,id=net-pci0,netdev=hn0 \
34
- -drive if=none,id=secondary-disk0,file.filename=1.raw,driver=raw,node-name=node0 \
35
- -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
36
- file.driver=qcow2,top-id=active-disk0,\
37
- file.file.filename=/mnt/ramfs/active_disk.img,\
38
- file.backing.driver=qcow2,\
39
- file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
40
- file.backing.backing=secondary-disk0 \
41
- -incoming tcp:0:8888
42
-
43
-2. On Secondary VM's QEMU monitor, issue command
44
+Note: Here we are running both instances on the same host for testing,
45
+change the IP Addresses if you want to run it on two hosts. Initally
46
+127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
47
+
48
+== Startup qemu ==
49
+1. Primary:
50
+Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts.
51
+You don't need to change any IP's here, because 0.0.0.0 listens on any
52
+interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
53
+instance.
54
+
55
+# imagefolder="/mnt/vms/colo-test-primary"
56
+
57
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \
58
+ -device piix3-usb-uhci -device usb-tablet -name primary \
59
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
60
+ -device rtl8139,id=e0,netdev=hn0 \
61
+ -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \
62
+ -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \
63
+ -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \
64
+ -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
65
+ -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait \
66
+ -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
67
+ -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
68
+ -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
69
+ -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
70
+ -object iothread,id=iothread1 \
71
+ -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
72
+outdev=compare_out0,iothread=iothread1 \
73
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
74
+children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
75
+
76
+2. Secondary:
77
+Note: Active and hidden images need to be created only once and the
78
+size should be the same as primary.qcow2. Again, you don't need to change
79
+any IP's here, except for the $primary_ip variable.
80
+
81
+# imagefolder="/mnt/vms/colo-test-secondary"
82
+# primary_ip=127.0.0.1
83
+
84
+# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
85
+
86
+# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
87
+
88
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \
89
+ -device piix3-usb-uhci -device usb-tablet -name secondary \
90
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
91
+ -device rtl8139,id=e0,netdev=hn0 \
92
+ -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \
93
+ -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \
94
+ -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
95
+ -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
96
+ -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
97
+ -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
98
+ -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
99
+top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
100
+file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
101
+file.backing.backing=parent0 \
102
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
103
+children.0=childs0 \
104
+ -incoming tcp:0.0.0.0:9998
105
+
106
+
107
+3. On Secondary VM's QEMU monitor, issue command
108
{'execute':'qmp_capabilities'}
109
-{ 'execute': 'nbd-server-start',
110
- 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } }
111
-}
112
-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } }
113
+{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0.0.0.0', 'port': '9999'} } } }
114
+{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', 'writable': true } }
115
116
Note:
117
a. The qmp command nbd-server-start and nbd-server-add must be run
118
before running the qmp command migrate on primary QEMU
119
b. Active disk, hidden disk and nbd target's length should be the
120
same.
121
- c. It is better to put active disk and hidden disk in ramdisk.
122
+ c. It is better to put active disk and hidden disk in ramdisk. They
123
+ will be merged into the parent disk on failover.
124
125
-3. On Primary VM's QEMU monitor, issue command:
126
+4. On Primary VM's QEMU monitor, issue command:
127
{'execute':'qmp_capabilities'}
128
-{ 'execute': 'human-monitor-command',
129
- 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
130
-{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } }
131
-{ 'execute': 'migrate-set-capabilities',
132
- 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
133
-{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } }
134
+{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}}
135
+{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'replication0' } }
136
+{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
137
+{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } }
138
139
Note:
140
a. There should be only one NBD Client for each primary disk.
141
- b. xx.xx.xx.xx is the secondary physical machine's hostname or IP
142
- c. The qmp command line must be run after running qmp command line in
143
+ b. The qmp command line must be run after running qmp command line in
144
secondary qemu.
145
146
-4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
147
+5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
148
You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
149
-to change the checkpoint period time
150
+to change the idle checkpoint period time
151
+
152
+6. Failover test
153
+You can kill one of the VMs and Failover on the surviving VM:
154
+
155
+If you killed the Secondary, then follow "Primary Failover". After that,
156
+if you want to resume the replication, follow "Primary resume replication"
157
+
158
+If you killed the Primary, then follow "Secondary Failover". After that,
159
+if you want to resume the replication, follow "Secondary resume replication"
160
+
161
+== Primary Failover ==
162
+The Secondary died, resume on the Primary
163
+
164
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'child': 'children.1'} }
165
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } }
166
+{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } }
167
+{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } }
168
+{'execute': 'object-del', 'arguments':{ 'id': 'm0' } }
169
+{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } }
170
+{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } }
171
+{'execute': 'x-colo-lost-heartbeat' }
172
+
173
+== Secondary Failover ==
174
+The Primary died, resume on the Secondary and prepare to become the new Primary
175
+
176
+{'execute': 'nbd-server-stop'}
177
+{'execute': 'x-colo-lost-heartbeat'}
178
+
179
+{'execute': 'object-del', 'arguments':{ 'id': 'f2' } }
180
+{'execute': 'object-del', 'arguments':{ 'id': 'f1' } }
181
+{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } }
182
+{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } }
183
+
184
+{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9003' } }, 'server': true } } } }
185
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9004' } }, 'server': true } } } }
186
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': true } } } }
187
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': false } } } }
188
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } } } }
189
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false } } } }
190
+
191
+== Primary resume replication ==
192
+Resume replication after new Secondary is up.
193
+
194
+Start the new Secondary (Steps 2 and 3 above), then on the Primary:
195
+{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} }
196
+
197
+Wait until disk is synced, then:
198
+{'execute': 'stop'}
199
+{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} }
200
+
201
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}}
202
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } }
203
+
204
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } }
205
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } }
206
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } }
207
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } }
208
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } }
209
+
210
+{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
211
+{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } }
212
+
213
+Note:
214
+If this Primary previously was a Secondary, then we need to insert the
215
+filters before the filter-rewriter by using the
216
+"'insert': 'before', 'position': 'id=rew0'" Options. See below.
217
+
218
+== Secondary resume replication ==
219
+Become Primary and resume replication after new Secondary is up. Note
220
+that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
221
+
222
+Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
223
+then on the old Secondary:
224
+{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} }
225
+
226
+Wait until disk is synced, then:
227
+{'execute': 'stop'}
228
+{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } }
229
230
-5. Failover test
231
-You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
232
-monitor at the same time, then SVM will failover and client will not detect this
233
-change.
234
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0'}}
235
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } }
236
237
-Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
238
-issue block related command to stop block replication.
239
-Primary:
240
- Remove the nbd child from the quorum:
241
- { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
242
- { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
243
- Note: there is no qmp command to remove the blockdev now
244
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } }
245
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } }
246
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } }
247
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } }
248
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } }
249
250
-Secondary:
251
- The primary host is down, so we should do the following thing:
252
- { 'execute': 'nbd-server-stop' }
253
+{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
254
+{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } }
255
256
== TODO ==
257
-1. Support continuous VM replication.
258
-2. Support shared storage.
259
-3. Develop the heartbeat part.
260
-4. Reduce checkpoint VM’s downtime while doing checkpoint.
261
+1. Support shared storage.
262
+2. Develop the heartbeat part.
263
+3. Reduce checkpoint VM’s downtime while doing checkpoint.
264
diff --git a/docs/block-replication.txt b/docs/block-replication.txt
265
index XXXXXXX..XXXXXXX 100644
266
--- a/docs/block-replication.txt
267
+++ b/docs/block-replication.txt
268
@@ -XXX,XX +XXX,XX @@ blocks that are already in QEMU.
269
^ || .----------
270
| || | Secondary
271
1 Quorum || '----------
272
- / \ ||
273
- / \ ||
274
- Primary 2 filter
275
- disk ^ virtio-blk
276
- | ^
277
- 3 NBD -------> 3 NBD |
278
+ / \ || virtio-blk
279
+ / \ || ^
280
+ Primary 2 filter |
281
+ disk ^ 7 Quorum
282
+ | /
283
+ 3 NBD -------> 3 NBD /
284
client || server 2 filter
285
|| ^ ^
286
--------. || | |
287
@@ -XXX,XX +XXX,XX @@ any state that would otherwise be lost by the speculative write-through
288
of the NBD server into the secondary disk. So before block replication,
289
the primary disk and secondary disk should contain the same data.
290
291
+7) The secondary also has a quorum node, so after secondary failover it
292
+can become the new primary and continue replication.
293
+
294
+
295
== Failure Handling ==
296
There are 7 internal errors when block replication is running:
297
1. I/O error on primary disk
298
@@ -XXX,XX +XXX,XX @@ Primary:
299
leading whitespace.
300
5. The qmp command line must be run after running qmp command line in
301
secondary qemu.
302
- 6. After failover we need remove children.1 (replication driver).
303
+ 6. After primary failover we need remove children.1 (replication driver).
304
305
Secondary:
306
-drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
307
- -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
308
+ -drive if=none,id=childs1,driver=replication,mode=secondary,top-id=childs1
309
file.file.filename=active_disk.qcow2,\
310
file.driver=qcow2,\
311
file.backing.file.filename=hidden_disk.qcow2,\
312
file.backing.driver=qcow2,\
313
file.backing.backing=colo1
314
+ -drive if=xxx,driver=quorum,read-pattern=fifo,id=top-disk1,\
315
+ vote-threshold=1,children.0=childs1
316
317
Then run qmp command in secondary qemu:
318
{ 'execute': 'nbd-server-start',
319
@@ -XXX,XX +XXX,XX @@ Secondary:
320
The primary host is down, so we should do the following thing:
321
{ 'execute': 'nbd-server-stop' }
322
323
+Promote Secondary to Primary:
324
+ see COLO-FT.txt
325
+
326
TODO:
327
-1. Continuous block replication
328
-2. Shared disk
329
+1. Shared disk
330
--
331
2.5.0
332
333
diff view generated by jsdifflib
Deleted patch
1
From: Stefan Hajnoczi <stefanha@redhat.com>
2
1
3
The L2TPv3 RFC number is 3931:
4
https://tools.ietf.org/html/rfc3931
5
6
Reported-by: Henrik Johansson <henrikjohansson@rocketmail.com>
7
Reviewed-by: Stefan Weil <sw@weilnetz.de>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Jason Wang <jasowang@redhat.com>
10
---
11
qemu-options.hx | 4 ++--
12
1 file changed, 2 insertions(+), 2 deletions(-)
13
14
diff --git a/qemu-options.hx b/qemu-options.hx
15
index XXXXXXX..XXXXXXX 100644
16
--- a/qemu-options.hx
17
+++ b/qemu-options.hx
18
@@ -XXX,XX +XXX,XX @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
19
" Linux kernel 3.3+ as well as most routers can talk\n"
20
" L2TPv3. This transport allows connecting a VM to a VM,\n"
21
" VM to a router and even VM to Host. It is a nearly-universal\n"
22
- " standard (RFC3391). Note - this implementation uses static\n"
23
+ " standard (RFC3931). Note - this implementation uses static\n"
24
" pre-configured tunnels (same as the Linux kernel).\n"
25
" use 'src=' to specify source address\n"
26
" use 'dst=' to specify destination address\n"
27
@@ -XXX,XX +XXX,XX @@ Example (send packets from host's 1.2.3.4):
28
@end example
29
30
@item -netdev l2tpv3,id=@var{id},src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
31
-Configure a L2TPv3 pseudowire host network backend. L2TPv3 (RFC3391) is a
32
+Configure a L2TPv3 pseudowire host network backend. L2TPv3 (RFC3931) is a
33
popular protocol to transport Ethernet (and other Layer 2) data frames between
34
two systems. It is present in routers, firewalls and the Linux kernel
35
(from version 3.3 onwards).
36
--
37
2.5.0
38
39
diff view generated by jsdifflib