1
The following changes since commit 6632f6ff96f0537fc34cdc00c760656fc62e23c5:
1
The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:
2
2
3
Merge remote-tracking branch 'remotes/famz/tags/block-and-testing-pull-request' into staging (2017-07-17 11:46:36 +0100)
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)
4
4
5
are available in the git repository at:
5
are available in the git repository at:
6
6
7
https://github.com/jasowang/qemu.git tags/net-pull-request
7
https://github.com/jasowang/qemu.git tags/net-pull-request
8
8
9
for you to fetch changes up to 189ae6bb5ce1f5a322f8691d00fe942ba43dd601:
9
for you to fetch changes up to 41aa2e3f9b27fd259a13711545d933a20f1d2f16:
10
10
11
virtio-net: fix offload ctrl endian (2017-07-17 20:13:56 +0800)
11
l2tpv3: fix RFC number typo in qemu-options.hx (2020-03-02 15:30:08 +0800)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
14
15
- fix virtio-net ctrl offload endian
15
----------------------------------------------------------------
16
- vnet header support for variou COLO netfilters and compare thread
16
Bin Meng (1):
17
hw: net: cadence_gem: Fix build errors in DB_PRINT()
17
18
18
----------------------------------------------------------------
19
Finn Thain (14):
19
Jason Wang (1):
20
dp8393x: Mask EOL bit from descriptor addresses
20
virtio-net: fix offload ctrl endian
21
dp8393x: Always use 32-bit accesses
22
dp8393x: Clean up endianness hacks
23
dp8393x: Have dp8393x_receive() return the packet size
24
dp8393x: Update LLFA and CRDA registers from rx descriptor
25
dp8393x: Clear RRRA command register bit only when appropriate
26
dp8393x: Implement packet size limit and RBAE interrupt
27
dp8393x: Don't clobber packet checksum
28
dp8393x: Use long-word-aligned RRA pointers in 32-bit mode
29
dp8393x: Pad frames to word or long word boundary
30
dp8393x: Clear descriptor in_use field to release packet
31
dp8393x: Always update RRA pointers and sequence numbers
32
dp8393x: Don't reset Silicon Revision register
33
dp8393x: Don't stop reception upon RBE interrupt assertion
21
34
22
Michal Privoznik (1):
35
Lukas Straub (4):
23
virtion-net: Prefer is_power_of_2()
36
block/replication.c: Ignore requests after failover
37
tests/test-replication.c: Add test for for secondary node continuing replication
38
net/filter.c: Add Options to insert filters anywhere in the filter list
39
colo: Update Documentation for continuous replication
24
40
25
Zhang Chen (12):
41
Stefan Hajnoczi (1):
26
net: Add vnet_hdr_len arguments in NetClientState
42
l2tpv3: fix RFC number typo in qemu-options.hx
27
net/net.c: Add vnet_hdr support in SocketReadState
28
net/filter-mirror.c: Introduce parameter for filter_send()
29
net/filter-mirror.c: Make filter mirror support vnet support.
30
net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
31
net/colo.c: Make vnet_hdr_len as packet property
32
net/colo-compare.c: Introduce parameter for compare_chr_send()
33
net/colo-compare.c: Make colo-compare support vnet_hdr_len
34
net/colo.c: Add vnet packet parse feature in colo-proxy
35
net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
36
net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
37
docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header
38
43
39
docs/colo-proxy.txt | 26 ++++++++++++++++
44
Yuri Benditovich (3):
40
hw/net/virtio-net.c | 4 ++-
45
e1000e: Avoid hw_error if legacy mode used
41
include/net/net.h | 10 ++++--
46
NetRxPkt: Introduce support for additional hash types
42
net/colo-compare.c | 84 ++++++++++++++++++++++++++++++++++++++++++---------
47
NetRxPkt: fix hash calculation of IPV6 TCP
43
net/colo.c | 9 +++---
48
44
net/colo.h | 4 ++-
49
block/replication.c | 35 ++++++-
45
net/filter-mirror.c | 75 +++++++++++++++++++++++++++++++++++++++++----
50
docs/COLO-FT.txt | 224 +++++++++++++++++++++++++++++++++------------
46
net/filter-rewriter.c | 37 ++++++++++++++++++++++-
51
docs/block-replication.txt | 28 ++++--
47
net/net.c | 37 ++++++++++++++++++++---
52
hw/net/cadence_gem.c | 11 ++-
48
net/socket.c | 8 ++---
53
hw/net/dp8393x.c | 200 ++++++++++++++++++++++++++--------------
49
qemu-options.hx | 19 ++++++------
54
hw/net/e1000e_core.c | 15 +--
50
11 files changed, 265 insertions(+), 48 deletions(-)
55
hw/net/net_rx_pkt.c | 44 ++++++++-
56
hw/net/net_rx_pkt.h | 6 +-
57
hw/net/trace-events | 4 +
58
include/net/filter.h | 2 +
59
net/filter.c | 92 ++++++++++++++++++-
60
qemu-options.hx | 35 +++++--
61
tests/test-replication.c | 52 +++++++++++
62
13 files changed, 591 insertions(+), 157 deletions(-)
51
63
52
64
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
The Least Significant bit of a descriptor address register is used as
4
an EOL flag. It has to be masked when the register value is to be used
5
as an actual address for copying memory around. But when the registers
6
are to be updated the EOL bit should not be masked.
7
8
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
9
Tested-by: Laurent Vivier <laurent@vivier.eu>
10
---
11
hw/net/dp8393x.c | 17 +++++++++++------
12
1 file changed, 11 insertions(+), 6 deletions(-)
13
14
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/net/dp8393x.c
17
+++ b/hw/net/dp8393x.c
18
@@ -XXX,XX +XXX,XX @@ do { printf("sonic ERROR: %s: " fmt, __func__ , ## __VA_ARGS__); } while (0)
19
#define SONIC_ISR_PINT 0x0800
20
#define SONIC_ISR_LCD 0x1000
21
22
+#define SONIC_DESC_EOL 0x0001
23
+#define SONIC_DESC_ADDR 0xFFFE
24
+
25
#define TYPE_DP8393X "dp8393x"
26
#define DP8393X(obj) OBJECT_CHECK(dp8393xState, (obj), TYPE_DP8393X)
27
28
@@ -XXX,XX +XXX,XX @@ static uint32_t dp8393x_crba(dp8393xState *s)
29
30
static uint32_t dp8393x_crda(dp8393xState *s)
31
{
32
- return (s->regs[SONIC_URDA] << 16) | s->regs[SONIC_CRDA];
33
+ return (s->regs[SONIC_URDA] << 16) |
34
+ (s->regs[SONIC_CRDA] & SONIC_DESC_ADDR);
35
}
36
37
static uint32_t dp8393x_rbwc(dp8393xState *s)
38
@@ -XXX,XX +XXX,XX @@ static uint32_t dp8393x_tsa(dp8393xState *s)
39
40
static uint32_t dp8393x_ttda(dp8393xState *s)
41
{
42
- return (s->regs[SONIC_UTDA] << 16) | s->regs[SONIC_TTDA];
43
+ return (s->regs[SONIC_UTDA] << 16) |
44
+ (s->regs[SONIC_TTDA] & SONIC_DESC_ADDR);
45
}
46
47
static uint32_t dp8393x_wt(dp8393xState *s)
48
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_transmit_packets(dp8393xState *s)
49
MEMTXATTRS_UNSPECIFIED, s->data,
50
size);
51
s->regs[SONIC_CTDA] = dp8393x_get(s, width, 0) & ~0x1;
52
- if (dp8393x_get(s, width, 0) & 0x1) {
53
+ if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
54
/* EOL detected */
55
break;
56
}
57
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
58
/* XXX: Check byte ordering */
59
60
/* Check for EOL */
61
- if (s->regs[SONIC_LLFA] & 0x1) {
62
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
63
/* Are we still in resource exhaustion? */
64
size = sizeof(uint16_t) * 1 * width;
65
address = dp8393x_crda(s) + sizeof(uint16_t) * 5 * width;
66
address_space_read(&s->as, address, MEMTXATTRS_UNSPECIFIED,
67
s->data, size);
68
- if (dp8393x_get(s, width, 0) & 0x1) {
69
+ if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
70
/* Still EOL ; stop reception */
71
return -1;
72
} else {
73
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
74
dp8393x_crda(s) + sizeof(uint16_t) * 5 * width,
75
MEMTXATTRS_UNSPECIFIED, s->data, size);
76
s->regs[SONIC_LLFA] = dp8393x_get(s, width, 0);
77
- if (s->regs[SONIC_LLFA] & 0x1) {
78
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
79
/* EOL detected */
80
s->regs[SONIC_ISR] |= SONIC_ISR_RDE;
81
} else {
82
--
83
2.5.0
84
85
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
This patch change the compare_chr_send() parameter from CharBackend to CompareState,
3
The DP83932 and DP83934 have 32 data lines. The datasheet says,
4
we can get more information like vnet_hdr(We use it to support packet with vnet_header).
5
4
6
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
5
Data Bus: These bidirectional lines are used to transfer data on the
6
system bus. When the SONIC is a bus master, 16-bit data is transferred
7
on D15-D0 and 32-bit data is transferred on D31-D0. When the SONIC is
8
accessed as a slave, register data is driven onto lines D15-D0.
9
D31-D16 are held TRI-STATE if SONIC is in 16-bit mode. If SONIC is in
10
32-bit mode, they are driven, but invalid.
11
12
Always use 32-bit accesses both as bus master and bus slave.
13
14
Force the MSW to zero in bus master mode.
15
16
This gets the Linux 'jazzsonic' driver working, and avoids the need for
17
prior hacks to make the NetBSD 'sn' driver work.
18
19
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
20
Tested-by: Laurent Vivier <laurent@vivier.eu>
7
Signed-off-by: Jason Wang <jasowang@redhat.com>
21
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
---
22
---
9
net/colo-compare.c | 14 +++++++-------
23
hw/net/dp8393x.c | 47 +++++++++++++++++++++++++++++------------------
10
1 file changed, 7 insertions(+), 7 deletions(-)
24
1 file changed, 29 insertions(+), 18 deletions(-)
11
25
12
diff --git a/net/colo-compare.c b/net/colo-compare.c
26
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
13
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
14
--- a/net/colo-compare.c
28
--- a/hw/net/dp8393x.c
15
+++ b/net/colo-compare.c
29
+++ b/hw/net/dp8393x.c
16
@@ -XXX,XX +XXX,XX @@ enum {
30
@@ -XXX,XX +XXX,XX @@ static void dp8393x_put(dp8393xState *s, int width, int offset,
17
SECONDARY_IN,
31
uint16_t val)
18
};
32
{
19
33
if (s->big_endian) {
20
-static int compare_chr_send(CharBackend *out,
34
- s->data[offset * width + width - 1] = cpu_to_be16(val);
21
+static int compare_chr_send(CompareState *s,
35
+ if (width == 2) {
22
const uint8_t *buf,
36
+ s->data[offset * 2] = 0;
23
uint32_t size);
37
+ s->data[offset * 2 + 1] = cpu_to_be16(val);
24
38
+ } else {
25
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
39
+ s->data[offset] = cpu_to_be16(val);
26
}
40
+ }
27
41
} else {
28
if (result) {
42
- s->data[offset * width] = cpu_to_le16(val);
29
- ret = compare_chr_send(&s->chr_out, pkt->data, pkt->size);
43
+ if (width == 2) {
30
+ ret = compare_chr_send(s, pkt->data, pkt->size);
44
+ s->data[offset * 2] = cpu_to_le16(val);
31
if (ret < 0) {
45
+ s->data[offset * 2 + 1] = 0;
32
error_report("colo_send_primary_packet failed");
46
+ } else {
33
}
47
+ s->data[offset] = cpu_to_le16(val);
34
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
48
+ }
35
}
49
}
36
}
50
}
37
51
38
-static int compare_chr_send(CharBackend *out,
52
@@ -XXX,XX +XXX,XX @@ static uint64_t dp8393x_read(void *opaque, hwaddr addr, unsigned int size)
39
+static int compare_chr_send(CompareState *s,
53
40
const uint8_t *buf,
54
DPRINTF("read 0x%04x from reg %s\n", val, reg_names[reg]);
41
uint32_t size)
55
56
- return val;
57
+ return s->big_endian ? val << 16 : val;
58
}
59
60
static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
61
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
42
{
62
{
43
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CharBackend *out,
63
dp8393xState *s = opaque;
44
return 0;
64
int reg = addr >> s->it_shift;
65
+ uint32_t val = s->big_endian ? data >> 16 : data;
66
67
- DPRINTF("write 0x%04x to reg %s\n", (uint16_t)data, reg_names[reg]);
68
+ DPRINTF("write 0x%04x to reg %s\n", (uint16_t)val, reg_names[reg]);
69
70
switch (reg) {
71
/* Command register */
72
case SONIC_CR:
73
- dp8393x_do_command(s, data);
74
+ dp8393x_do_command(s, val);
75
break;
76
/* Prevent write to read-only registers */
77
case SONIC_CAP2:
78
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
79
/* Accept write to some registers only when in reset mode */
80
case SONIC_DCR:
81
if (s->regs[SONIC_CR] & SONIC_CR_RST) {
82
- s->regs[reg] = data & 0xbfff;
83
+ s->regs[reg] = val & 0xbfff;
84
} else {
85
DPRINTF("writing to DCR invalid\n");
86
}
87
break;
88
case SONIC_DCR2:
89
if (s->regs[SONIC_CR] & SONIC_CR_RST) {
90
- s->regs[reg] = data & 0xf017;
91
+ s->regs[reg] = val & 0xf017;
92
} else {
93
DPRINTF("writing to DCR2 invalid\n");
94
}
95
break;
96
/* 12 lower bytes are Read Only */
97
case SONIC_TCR:
98
- s->regs[reg] = data & 0xf000;
99
+ s->regs[reg] = val & 0xf000;
100
break;
101
/* 9 lower bytes are Read Only */
102
case SONIC_RCR:
103
- s->regs[reg] = data & 0xffe0;
104
+ s->regs[reg] = val & 0xffe0;
105
break;
106
/* Ignore most significant bit */
107
case SONIC_IMR:
108
- s->regs[reg] = data & 0x7fff;
109
+ s->regs[reg] = val & 0x7fff;
110
dp8393x_update_irq(s);
111
break;
112
/* Clear bits by writing 1 to them */
113
case SONIC_ISR:
114
- data &= s->regs[reg];
115
- s->regs[reg] &= ~data;
116
- if (data & SONIC_ISR_RBE) {
117
+ val &= s->regs[reg];
118
+ s->regs[reg] &= ~val;
119
+ if (val & SONIC_ISR_RBE) {
120
dp8393x_do_read_rra(s);
121
}
122
dp8393x_update_irq(s);
123
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
124
case SONIC_REA:
125
case SONIC_RRP:
126
case SONIC_RWP:
127
- s->regs[reg] = data & 0xfffe;
128
+ s->regs[reg] = val & 0xfffe;
129
break;
130
/* Invert written value for some registers */
131
case SONIC_CRCT:
132
case SONIC_FAET:
133
case SONIC_MPT:
134
- s->regs[reg] = data ^ 0xffff;
135
+ s->regs[reg] = val ^ 0xffff;
136
break;
137
/* All other registers have no special contrainst */
138
default:
139
- s->regs[reg] = data;
140
+ s->regs[reg] = val;
45
}
141
}
46
142
47
- ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
143
if (reg == SONIC_WT0 || reg == SONIC_WT1) {
48
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
144
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
49
if (ret != sizeof(len)) {
145
static const MemoryRegionOps dp8393x_ops = {
50
goto err;
146
.read = dp8393x_read,
51
}
147
.write = dp8393x_write,
52
148
- .impl.min_access_size = 2,
53
- ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
149
- .impl.max_access_size = 2,
54
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
150
+ .impl.min_access_size = 4,
55
if (ret != size) {
151
+ .impl.max_access_size = 4,
56
goto err;
152
.endianness = DEVICE_NATIVE_ENDIAN,
57
}
153
};
58
@@ -XXX,XX +XXX,XX @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs)
154
59
60
if (packet_enqueue(s, PRIMARY_IN)) {
61
trace_colo_compare_main("primary: unsupported packet in");
62
- compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
63
+ compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
64
} else {
65
/* compare connection */
66
g_queue_foreach(&s->conn_list, colo_compare_connection, s);
67
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
68
69
while (!g_queue_is_empty(&conn->primary_list)) {
70
pkt = g_queue_pop_head(&conn->primary_list);
71
- compare_chr_send(&s->chr_out, pkt->data, pkt->size);
72
+ compare_chr_send(s, pkt->data, pkt->size);
73
packet_destroy(pkt, NULL);
74
}
75
while (!g_queue_is_empty(&conn->secondary_list)) {
76
--
155
--
77
2.7.4
156
2.5.0
78
157
79
158
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
According to the datasheet, section 3.4.4, "in 32-bit mode ... the SONIC
4
always writes long words".
5
6
Therefore, use the same technique for the 'in_use' field that is used
7
everywhere else, and write the full long word.
8
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
Tested-by: Laurent Vivier <laurent@vivier.eu>
11
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
---
14
hw/net/dp8393x.c | 17 ++++++-----------
15
1 file changed, 6 insertions(+), 11 deletions(-)
16
17
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/hw/net/dp8393x.c
20
+++ b/hw/net/dp8393x.c
21
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
22
return -1;
23
}
24
25
- /* XXX: Check byte ordering */
26
-
27
/* Check for EOL */
28
if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
29
/* Are we still in resource exhaustion? */
30
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
31
/* EOL detected */
32
s->regs[SONIC_ISR] |= SONIC_ISR_RDE;
33
} else {
34
- /* Clear in_use, but it is always 16bit wide */
35
- int offset = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
36
- if (s->big_endian && width == 2) {
37
- /* we need to adjust the offset of the 16bit field */
38
- offset += sizeof(uint16_t);
39
- }
40
- s->data[0] = 0;
41
- address_space_write(&s->as, offset, MEMTXATTRS_UNSPECIFIED,
42
- s->data, sizeof(uint16_t));
43
+ /* Clear in_use */
44
+ size = sizeof(uint16_t) * width;
45
+ address = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
46
+ dp8393x_put(s, width, 0, 0);
47
+ address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
48
+ s->data, size);
49
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
50
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
51
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
52
--
53
2.5.0
54
55
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
This function re-uses its 'size' argument as a scratch variable.
4
Instead, declare a local 'size' variable for that purpose so that the
5
function result doesn't get messed up.
6
7
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Tested-by: Laurent Vivier <laurent@vivier.eu>
10
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
---
12
hw/net/dp8393x.c | 9 +++++----
13
1 file changed, 5 insertions(+), 4 deletions(-)
14
15
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/net/dp8393x.c
18
+++ b/hw/net/dp8393x.c
19
@@ -XXX,XX +XXX,XX @@ static int dp8393x_receive_filter(dp8393xState *s, const uint8_t * buf,
20
}
21
22
static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
23
- size_t size)
24
+ size_t pkt_size)
25
{
26
dp8393xState *s = qemu_get_nic_opaque(nc);
27
int packet_type;
28
uint32_t available, address;
29
- int width, rx_len = size;
30
+ int width, rx_len = pkt_size;
31
uint32_t checksum;
32
+ int size;
33
34
width = (s->regs[SONIC_DCR] & SONIC_DCR_DW) ? 2 : 1;
35
36
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
37
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
38
39
- packet_type = dp8393x_receive_filter(s, buf, size);
40
+ packet_type = dp8393x_receive_filter(s, buf, pkt_size);
41
if (packet_type < 0) {
42
DPRINTF("packet not for netcard\n");
43
return -1;
44
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
45
/* Done */
46
dp8393x_update_irq(s);
47
48
- return size;
49
+ return pkt_size;
50
}
51
52
static void dp8393x_reset(DeviceState *dev)
53
--
54
2.5.0
55
56
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
Follow the algorithm given in the National Semiconductor DP83932C
4
datasheet in section 3.4.7:
5
6
At the next reception, the SONIC re-reads the last RXpkt.link field,
7
and updates its CRDA register to point to the next descriptor.
8
9
The chip is designed to allow the host to provide a new list of
10
descriptors in this way.
11
12
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
13
Tested-by: Laurent Vivier <laurent@vivier.eu>
14
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
15
Signed-off-by: Jason Wang <jasowang@redhat.com>
16
---
17
hw/net/dp8393x.c | 11 +++++++----
18
1 file changed, 7 insertions(+), 4 deletions(-)
19
20
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
21
index XXXXXXX..XXXXXXX 100644
22
--- a/hw/net/dp8393x.c
23
+++ b/hw/net/dp8393x.c
24
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
25
address = dp8393x_crda(s) + sizeof(uint16_t) * 5 * width;
26
address_space_read(&s->as, address, MEMTXATTRS_UNSPECIFIED,
27
s->data, size);
28
- if (dp8393x_get(s, width, 0) & SONIC_DESC_EOL) {
29
+ s->regs[SONIC_LLFA] = dp8393x_get(s, width, 0);
30
+ if (s->regs[SONIC_LLFA] & SONIC_DESC_EOL) {
31
/* Still EOL ; stop reception */
32
return -1;
33
- } else {
34
- s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
35
}
36
+ /* Link has been updated by host */
37
+ s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
38
}
39
40
/* Save current position */
41
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
42
MEMTXATTRS_UNSPECIFIED,
43
s->data, size);
44
45
- /* Move to next descriptor */
46
+ /* Check link field */
47
size = sizeof(uint16_t) * width;
48
address_space_read(&s->as,
49
dp8393x_crda(s) + sizeof(uint16_t) * 5 * width,
50
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
51
dp8393x_put(s, width, 0, 0);
52
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
53
s->data, size);
54
+
55
+ /* Move to next descriptor */
56
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
57
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
58
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
59
--
60
2.5.0
61
62
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
It doesn't make sense to clear the command register bit unless the
4
command was actually issued.
5
6
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
7
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
8
Tested-by: Laurent Vivier <laurent@vivier.eu>
9
Signed-off-by: Jason Wang <jasowang@redhat.com>
10
---
11
hw/net/dp8393x.c | 7 +++----
12
1 file changed, 3 insertions(+), 4 deletions(-)
13
14
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/net/dp8393x.c
17
+++ b/hw/net/dp8393x.c
18
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_read_rra(dp8393xState *s)
19
s->regs[SONIC_ISR] |= SONIC_ISR_RBE;
20
dp8393x_update_irq(s);
21
}
22
-
23
- /* Done */
24
- s->regs[SONIC_CR] &= ~SONIC_CR_RRRA;
25
}
26
27
static void dp8393x_do_software_reset(dp8393xState *s)
28
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_command(dp8393xState *s, uint16_t command)
29
dp8393x_do_start_timer(s);
30
if (command & SONIC_CR_RST)
31
dp8393x_do_software_reset(s);
32
- if (command & SONIC_CR_RRRA)
33
+ if (command & SONIC_CR_RRRA) {
34
dp8393x_do_read_rra(s);
35
+ s->regs[SONIC_CR] &= ~SONIC_CR_RRRA;
36
+ }
37
if (command & SONIC_CR_LCAM)
38
dp8393x_do_load_cam(s);
39
}
40
--
41
2.5.0
42
43
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
Add a bounds check to prevent a large packet from causing a buffer
4
overflow. This is defensive programming -- I haven't actually tried
5
sending an oversized packet or a jumbo ethernet frame.
6
7
The SONIC handles packets that are too big for the buffer by raising
8
the RBAE interrupt and dropping them. Linux uses that interrupt to
9
count dropped packets.
10
11
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
12
Tested-by: Laurent Vivier <laurent@vivier.eu>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
15
hw/net/dp8393x.c | 9 +++++++++
16
1 file changed, 9 insertions(+)
17
18
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/net/dp8393x.c
21
+++ b/hw/net/dp8393x.c
22
@@ -XXX,XX +XXX,XX @@ do { printf("sonic ERROR: %s: " fmt, __func__ , ## __VA_ARGS__); } while (0)
23
#define SONIC_TCR_CRCI 0x2000
24
#define SONIC_TCR_PINT 0x8000
25
26
+#define SONIC_ISR_RBAE 0x0010
27
#define SONIC_ISR_RBE 0x0020
28
#define SONIC_ISR_RDE 0x0040
29
#define SONIC_ISR_TC 0x0080
30
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
31
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
32
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
33
34
+ if (pkt_size + 4 > dp8393x_rbwc(s) * 2) {
35
+ DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
36
+ s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
37
+ dp8393x_update_irq(s);
38
+ dp8393x_do_read_rra(s);
39
+ return pkt_size;
40
+ }
41
+
42
packet_type = dp8393x_receive_filter(s, buf, pkt_size);
43
if (packet_type < 0) {
44
DPRINTF("packet not for netcard\n");
45
--
46
2.5.0
47
48
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
A received packet consumes pkt_size bytes in the buffer and the frame
4
checksum that's appended to it consumes another 4 bytes. The Receive
5
Buffer Address register takes the former quantity into account but
6
not the latter. So the next packet written to the buffer overwrites
7
the frame checksum. Fix this.
8
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
11
Tested-by: Laurent Vivier <laurent@vivier.eu>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
---
14
hw/net/dp8393x.c | 1 +
15
1 file changed, 1 insertion(+)
16
17
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/hw/net/dp8393x.c
20
+++ b/hw/net/dp8393x.c
21
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
22
address += rx_len;
23
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
24
&checksum, 4);
25
+ address += 4;
26
rx_len += 4;
27
s->regs[SONIC_CRBA1] = address >> 16;
28
s->regs[SONIC_CRBA0] = address & 0xffff;
29
--
30
2.5.0
31
32
diff view generated by jsdifflib
New patch
1
From: Finn Thain <fthain@telegraphics.com.au>
1
2
3
Section 3.4.1 of the datasheet says,
4
5
The alignment of the RRA is confined to either word or long word
6
boundaries, depending upon the data width mode. In 16-bit mode,
7
the RRA must be aligned to a word boundary (A0 is always zero)
8
and in 32-bit mode, the RRA is aligned to a long word boundary
9
(A0 and A1 are always zero).
10
11
This constraint has been implemented for 16-bit mode; implement it
12
for 32-bit mode too.
13
14
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
15
Tested-by: Laurent Vivier <laurent@vivier.eu>
16
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
17
Signed-off-by: Jason Wang <jasowang@redhat.com>
18
---
19
hw/net/dp8393x.c | 8 ++++++--
20
1 file changed, 6 insertions(+), 2 deletions(-)
21
22
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
23
index XXXXXXX..XXXXXXX 100644
24
--- a/hw/net/dp8393x.c
25
+++ b/hw/net/dp8393x.c
26
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
27
qemu_flush_queued_packets(qemu_get_queue(s->nic));
28
}
29
break;
30
- /* Ignore least significant bit */
31
+ /* The guest is required to store aligned pointers here */
32
case SONIC_RSA:
33
case SONIC_REA:
34
case SONIC_RRP:
35
case SONIC_RWP:
36
- s->regs[reg] = val & 0xfffe;
37
+ if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
38
+ s->regs[reg] = val & 0xfffc;
39
+ } else {
40
+ s->regs[reg] = val & 0xfffe;
41
+ }
42
break;
43
/* Invert written value for some registers */
44
case SONIC_CRCT:
45
--
46
2.5.0
47
48
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
Add vnet_hdr_len arguments in NetClientState
3
The existing code has a bug where the Remaining Buffer Word Count (RBWC)
4
that make other module get real vnet_hdr_len easily.
4
is calculated with a truncating division, which gives the wrong result
5
for odd-sized packets.
5
6
6
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
7
Section 1.4.1 of the datasheet says,
8
9
Once the end of the packet has been reached, the serializer will
10
fill out the last word (16-bit mode) or long word (32-bit mode)
11
if the last byte did not end on a word or long word boundary
12
respectively. The fill byte will be 0FFh.
13
14
Implement buffer padding so that buffer limits are correctly enforced.
15
16
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
17
Tested-by: Laurent Vivier <laurent@vivier.eu>
18
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Signed-off-by: Jason Wang <jasowang@redhat.com>
19
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
---
20
---
9
include/net/net.h | 1 +
21
hw/net/dp8393x.c | 39 ++++++++++++++++++++++++++++-----------
10
net/net.c | 1 +
22
1 file changed, 28 insertions(+), 11 deletions(-)
11
2 files changed, 2 insertions(+)
12
23
13
diff --git a/include/net/net.h b/include/net/net.h
24
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
14
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
15
--- a/include/net/net.h
26
--- a/hw/net/dp8393x.c
16
+++ b/include/net/net.h
27
+++ b/hw/net/dp8393x.c
17
@@ -XXX,XX +XXX,XX @@ struct NetClientState {
28
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
18
unsigned int queue_index;
29
dp8393xState *s = qemu_get_nic_opaque(nc);
19
unsigned rxfilter_notify_enabled:1;
30
int packet_type;
20
int vring_enable;
31
uint32_t available, address;
21
+ int vnet_hdr_len;
32
- int width, rx_len = pkt_size;
22
QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
33
+ int width, rx_len, padded_len;
23
};
34
uint32_t checksum;
24
35
int size;
25
diff --git a/net/net.c b/net/net.c
36
26
index XXXXXXX..XXXXXXX 100644
37
- width = (s->regs[SONIC_DCR] & SONIC_DCR_DW) ? 2 : 1;
27
--- a/net/net.c
38
-
28
+++ b/net/net.c
39
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
29
@@ -XXX,XX +XXX,XX @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
40
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
30
return;
41
31
}
42
- if (pkt_size + 4 > dp8393x_rbwc(s) * 2) {
32
43
+ rx_len = pkt_size + sizeof(checksum);
33
+ nc->vnet_hdr_len = len;
44
+ if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
34
nc->info->set_vnet_hdr_len(nc, len);
45
+ width = 2;
35
}
46
+ padded_len = ((rx_len - 1) | 3) + 1;
47
+ } else {
48
+ width = 1;
49
+ padded_len = ((rx_len - 1) | 1) + 1;
50
+ }
51
+
52
+ if (padded_len > dp8393x_rbwc(s) * 2) {
53
DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
54
s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
55
dp8393x_update_irq(s);
56
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
57
s->regs[SONIC_TRBA0] = s->regs[SONIC_CRBA0];
58
59
/* Calculate the ethernet checksum */
60
- checksum = cpu_to_le32(crc32(0, buf, rx_len));
61
+ checksum = cpu_to_le32(crc32(0, buf, pkt_size));
62
63
/* Put packet into RBA */
64
DPRINTF("Receive packet at %08x\n", dp8393x_crba(s));
65
address = dp8393x_crba(s);
66
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
67
- buf, rx_len);
68
- address += rx_len;
69
+ buf, pkt_size);
70
+ address += pkt_size;
71
+
72
+ /* Put frame checksum into RBA */
73
address_space_write(&s->as, address, MEMTXATTRS_UNSPECIFIED,
74
- &checksum, 4);
75
- address += 4;
76
- rx_len += 4;
77
+ &checksum, sizeof(checksum));
78
+ address += sizeof(checksum);
79
+
80
+ /* Pad short packets to keep pointers aligned */
81
+ if (rx_len < padded_len) {
82
+ size = padded_len - rx_len;
83
+ address_space_rw(&s->as, address, MEMTXATTRS_UNSPECIFIED,
84
+ (uint8_t *)"\xFF\xFF\xFF", size, 1);
85
+ address += size;
86
+ }
87
+
88
s->regs[SONIC_CRBA1] = address >> 16;
89
s->regs[SONIC_CRBA0] = address & 0xffff;
90
available = dp8393x_rbwc(s);
91
- available -= rx_len / 2;
92
+ available -= padded_len >> 1;
93
s->regs[SONIC_RBWC1] = available >> 16;
94
s->regs[SONIC_RBWC0] = available & 0xffff;
36
95
37
--
96
--
38
2.7.4
97
2.5.0
39
98
40
99
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
COLO-Proxy just focus on packet payload, so we skip vnet header.
3
When the SONIC receives a packet into the last available descriptor, it
4
retains ownership of that descriptor for as long as necessary.
4
5
5
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
6
Section 3.4.7 of the datasheet says,
7
8
When the system appends more descriptors, the SONIC releases ownership
9
of the descriptor after writing 0000h to the RXpkt.in_use field.
10
11
The packet can now be processed by the host, so raise a PKTRX interrupt,
12
just like the normal case.
13
14
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
15
Tested-by: Laurent Vivier <laurent@vivier.eu>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
16
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
17
---
8
net/colo-compare.c | 8 ++++++--
18
hw/net/dp8393x.c | 10 ++++++++++
9
1 file changed, 6 insertions(+), 2 deletions(-)
19
1 file changed, 10 insertions(+)
10
20
11
diff --git a/net/colo-compare.c b/net/colo-compare.c
21
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
12
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
13
--- a/net/colo-compare.c
23
--- a/hw/net/dp8393x.c
14
+++ b/net/colo-compare.c
24
+++ b/hw/net/dp8393x.c
15
@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
25
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
16
sec_ip_src, sec_ip_dst);
26
return -1;
27
}
28
/* Link has been updated by host */
29
+
30
+ /* Clear in_use */
31
+ size = sizeof(uint16_t) * width;
32
+ address = dp8393x_crda(s) + sizeof(uint16_t) * 6 * width;
33
+ dp8393x_put(s, width, 0, 0);
34
+ address_space_rw(&s->as, address, MEMTXATTRS_UNSPECIFIED,
35
+ (uint8_t *)s->data, size, 1);
36
+
37
+ /* Move to next descriptor */
38
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
39
+ s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
17
}
40
}
18
41
19
+ offset = ppkt->vnet_hdr_len + offset;
42
/* Save current position */
20
+
21
if (ppkt->size == spkt->size) {
22
- return memcmp(ppkt->data + offset, spkt->data + offset,
23
+ return memcmp(ppkt->data + offset,
24
+ spkt->data + offset,
25
spkt->size - offset);
26
} else {
27
trace_colo_compare_main("Net packet size are not the same");
28
@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
29
*/
30
if (ptcp->th_off > 5) {
31
ptrdiff_t tcp_offset;
32
+
33
tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
34
- + (ptcp->th_off * 4);
35
+ + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
36
res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
37
} else if (ptcp->th_sum == stcp->th_sum) {
38
res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
39
--
43
--
40
2.7.4
44
2.5.0
41
45
42
46
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
This patch change the filter_send() parameter from CharBackend to MirrorState,
3
These operations need to take place regardless of whether or not
4
we can get more information like vnet_hdr(We use it to support packet with vnet_header).
4
rx descriptors have been used up (that is, EOL flag was observed).
5
5
6
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
6
The algorithm is now the same for a packet that was withheld as for
7
a packet that was not.
8
9
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
10
Tested-by: Laurent Vivier <laurent@vivier.eu>
7
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
---
12
---
9
net/filter-mirror.c | 10 +++++-----
13
hw/net/dp8393x.c | 12 +++++++-----
10
1 file changed, 5 insertions(+), 5 deletions(-)
14
1 file changed, 7 insertions(+), 5 deletions(-)
11
15
12
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
16
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
13
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
14
--- a/net/filter-mirror.c
18
--- a/hw/net/dp8393x.c
15
+++ b/net/filter-mirror.c
19
+++ b/hw/net/dp8393x.c
16
@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
20
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
17
SocketReadState rs;
21
/* Move to next descriptor */
18
} MirrorState;
22
s->regs[SONIC_CRDA] = s->regs[SONIC_LLFA];
19
23
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
20
-static int filter_send(CharBackend *chr_out,
24
- s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) | (((s->regs[SONIC_RSC] & 0x00ff) + 1) & 0x00ff);
21
+static int filter_send(MirrorState *s,
25
+ }
22
const struct iovec *iov,
26
23
int iovcnt)
27
- if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
24
{
28
- /* Read next RRA */
25
@@ -XXX,XX +XXX,XX @@ static int filter_send(CharBackend *chr_out,
29
- dp8393x_do_read_rra(s);
30
- }
31
+ s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) |
32
+ ((s->regs[SONIC_RSC] + 1) & 0x00ff);
33
+
34
+ if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
35
+ /* Read next RRA */
36
+ dp8393x_do_read_rra(s);
26
}
37
}
27
38
28
len = htonl(size);
39
/* Done */
29
- ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
30
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
31
if (ret != sizeof(len)) {
32
goto err;
33
}
34
35
buf = g_malloc(size);
36
iov_to_buf(iov, iovcnt, 0, buf, size);
37
- ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
38
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
39
g_free(buf);
40
if (ret != size) {
41
goto err;
42
@@ -XXX,XX +XXX,XX @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
43
MirrorState *s = FILTER_MIRROR(nf);
44
int ret;
45
46
- ret = filter_send(&s->chr_out, iov, iovcnt);
47
+ ret = filter_send(s, iov, iovcnt);
48
if (ret) {
49
error_report("filter mirror send failed(%s)", strerror(-ret));
50
}
51
@@ -XXX,XX +XXX,XX @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf,
52
int ret;
53
54
if (qemu_chr_fe_backend_connected(&s->chr_out)) {
55
- ret = filter_send(&s->chr_out, iov, iovcnt);
56
+ ret = filter_send(s, iov, iovcnt);
57
if (ret) {
58
error_report("filter redirector send failed(%s)", strerror(-ret));
59
}
60
--
40
--
61
2.7.4
41
2.5.0
62
42
63
43
diff view generated by jsdifflib
1
From: Michal Privoznik <mprivozn@redhat.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
We have a function that checks if given number is power of two.
3
The jazzsonic driver in Linux uses the Silicon Revision register value
4
We should prefer it instead of expanding the check on our own.
4
to probe the chip. The driver fails unless the SR register contains 4.
5
Unfortunately, reading this register in QEMU usually returns 0 because
6
the s->regs[] array gets wiped after a software reset.
5
7
6
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
8
Fixes: bd8f1ebce4 ("net/dp8393x: fix hardware reset")
9
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
10
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
7
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
---
12
---
9
hw/net/virtio-net.c | 2 +-
13
hw/net/dp8393x.c | 2 +-
10
1 file changed, 1 insertion(+), 1 deletion(-)
14
1 file changed, 1 insertion(+), 1 deletion(-)
11
15
12
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
16
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
13
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/net/virtio-net.c
18
--- a/hw/net/dp8393x.c
15
+++ b/hw/net/virtio-net.c
19
+++ b/hw/net/dp8393x.c
16
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
20
@@ -XXX,XX +XXX,XX @@ static void dp8393x_reset(DeviceState *dev)
17
*/
21
timer_del(s->watchdog);
18
if (n->net_conf.rx_queue_size < VIRTIO_NET_RX_QUEUE_MIN_SIZE ||
22
19
n->net_conf.rx_queue_size > VIRTQUEUE_MAX_SIZE ||
23
memset(s->regs, 0, sizeof(s->regs));
20
- (n->net_conf.rx_queue_size & (n->net_conf.rx_queue_size - 1))) {
24
+ s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux/mips */
21
+ !is_power_of_2(n->net_conf.rx_queue_size)) {
25
s->regs[SONIC_CR] = SONIC_CR_RST | SONIC_CR_STP | SONIC_CR_RXDIS;
22
error_setg(errp, "Invalid rx_queue_size (= %" PRIu16 "), "
26
s->regs[SONIC_DCR] &= ~(SONIC_DCR_EXBUS | SONIC_DCR_LBR);
23
"must be a power of 2 between %d and %d.",
27
s->regs[SONIC_RCR] &= ~(SONIC_RCR_LB0 | SONIC_RCR_LB1 | SONIC_RCR_BRD | SONIC_RCR_RNT);
24
n->net_conf.rx_queue_size, VIRTIO_NET_RX_QUEUE_MIN_SIZE,
28
@@ -XXX,XX +XXX,XX @@ static void dp8393x_realize(DeviceState *dev, Error **errp)
29
qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
30
31
s->watchdog = timer_new_ns(QEMU_CLOCK_VIRTUAL, dp8393x_watchdog, s);
32
- s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
33
34
memory_region_init_ram(&s->prom, OBJECT(dev),
35
"dp8393x-prom", SONIC_PROM_SIZE, &local_err);
25
--
36
--
26
2.7.4
37
2.5.0
27
38
28
39
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Finn Thain <fthain@telegraphics.com.au>
2
2
3
We add the vnet_hdr_support option for filter-mirror, default is disabled.
3
Section 3.4.7 of the datasheet explains that,
4
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
5
You can use it for example:
6
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
7
4
8
If it has vnet_hdr_support flag, we will change the sending packet format from
5
The RBE bit in the Interrupt Status register is set when the
9
struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const uint8_t buf[];}.
6
SONIC finishes using the second to last receive buffer and reads
10
make other module(like colo-compare) know how to parse net packet correctly.
7
the last RRA descriptor. Actually, the SONIC is not truly out of
8
resources, but gives the system an early warning of an impending
9
out of resources condition.
11
10
12
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
11
RBE does not mean actual receive buffer exhaustion, and reception should
12
not be stopped. This is important because Linux will not check and clear
13
the RBE interrupt until it receives another packet. But that won't
14
happen if can_receive returns false. This bug causes the SONIC to become
15
deaf (until reset).
16
17
Fix this with a new flag to indicate actual receive buffer exhaustion.
18
19
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
20
Tested-by: Laurent Vivier <laurent@vivier.eu>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
21
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
22
---
15
net/filter-mirror.c | 42 +++++++++++++++++++++++++++++++++++++++++-
23
hw/net/dp8393x.c | 35 ++++++++++++++++++++++-------------
16
qemu-options.hx | 5 ++---
24
1 file changed, 22 insertions(+), 13 deletions(-)
17
2 files changed, 43 insertions(+), 4 deletions(-)
18
25
19
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
26
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
20
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
21
--- a/net/filter-mirror.c
28
--- a/hw/net/dp8393x.c
22
+++ b/net/filter-mirror.c
29
+++ b/hw/net/dp8393x.c
23
@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
30
@@ -XXX,XX +XXX,XX @@ typedef struct dp8393xState {
24
CharBackend chr_in;
31
/* Hardware */
25
CharBackend chr_out;
32
uint8_t it_shift;
26
SocketReadState rs;
33
bool big_endian;
27
+ bool vnet_hdr;
34
+ bool last_rba_is_full;
28
} MirrorState;
35
qemu_irq irq;
29
36
#ifdef DEBUG_SONIC
30
static int filter_send(MirrorState *s,
37
int irq_level;
31
const struct iovec *iov,
38
@@ -XXX,XX +XXX,XX @@ static void dp8393x_do_read_rra(dp8393xState *s)
32
int iovcnt)
39
s->regs[SONIC_RRP] = s->regs[SONIC_RSA];
33
{
34
+ NetFilterState *nf = NETFILTER(s);
35
int ret = 0;
36
ssize_t size = 0;
37
uint32_t len = 0;
38
@@ -XXX,XX +XXX,XX @@ static int filter_send(MirrorState *s,
39
goto err;
40
}
40
}
41
41
42
+ if (s->vnet_hdr) {
42
- /* Check resource exhaustion */
43
+ /*
43
+ /* Warn the host if CRBA now has the last available resource */
44
+ * If vnet_hdr = on, we send vnet header len to make other
44
if (s->regs[SONIC_RRP] == s->regs[SONIC_RWP])
45
+ * module(like colo-compare) know how to parse net
45
{
46
+ * packet correctly.
46
s->regs[SONIC_ISR] |= SONIC_ISR_RBE;
47
+ */
47
dp8393x_update_irq(s);
48
+ ssize_t vnet_hdr_len;
48
}
49
+
49
+
50
+ vnet_hdr_len = nf->netdev->vnet_hdr_len;
50
+ /* Allow packet reception */
51
+
51
+ s->last_rba_is_full = false;
52
+ len = htonl(vnet_hdr_len);
52
}
53
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
53
54
+ if (ret != sizeof(len)) {
54
static void dp8393x_do_software_reset(dp8393xState *s)
55
+ goto err;
55
@@ -XXX,XX +XXX,XX @@ static void dp8393x_write(void *opaque, hwaddr addr, uint64_t data,
56
+ }
56
dp8393x_do_read_rra(s);
57
}
58
dp8393x_update_irq(s);
59
- if (dp8393x_can_receive(s->nic->ncs)) {
60
- qemu_flush_queued_packets(qemu_get_queue(s->nic));
61
- }
62
break;
63
/* The guest is required to store aligned pointers here */
64
case SONIC_RSA:
65
@@ -XXX,XX +XXX,XX @@ static int dp8393x_can_receive(NetClientState *nc)
66
67
if (!(s->regs[SONIC_CR] & SONIC_CR_RXEN))
68
return 0;
69
- if (s->regs[SONIC_ISR] & SONIC_ISR_RBE)
70
- return 0;
71
return 1;
72
}
73
74
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
75
s->regs[SONIC_RCR] &= ~(SONIC_RCR_PRX | SONIC_RCR_LBK | SONIC_RCR_FAER |
76
SONIC_RCR_CRCR | SONIC_RCR_LPKT | SONIC_RCR_BC | SONIC_RCR_MC);
77
78
+ if (s->last_rba_is_full) {
79
+ return pkt_size;
57
+ }
80
+ }
58
+
81
+
59
buf = g_malloc(size);
82
rx_len = pkt_size + sizeof(checksum);
60
iov_to_buf(iov, iovcnt, 0, buf, size);
83
if (s->regs[SONIC_DCR] & SONIC_DCR_DW) {
61
ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
84
width = 2;
62
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
85
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
63
}
86
DPRINTF("oversize packet, pkt_size is %d\n", pkt_size);
87
s->regs[SONIC_ISR] |= SONIC_ISR_RBAE;
88
dp8393x_update_irq(s);
89
- dp8393x_do_read_rra(s);
90
- return pkt_size;
91
+ s->regs[SONIC_RCR] |= SONIC_RCR_LPKT;
92
+ goto done;
64
}
93
}
65
94
66
- net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
95
packet_type = dp8393x_receive_filter(s, buf, pkt_size);
67
+ net_socket_rs_init(&s->rs, redirector_rs_finalize, s->vnet_hdr);
96
@@ -XXX,XX +XXX,XX @@ static ssize_t dp8393x_receive(NetClientState *nc, const uint8_t * buf,
68
97
s->regs[SONIC_ISR] |= SONIC_ISR_PKTRX;
69
if (s->indev) {
70
chr = qemu_chr_find(s->indev);
71
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_set_outdev(Object *obj,
72
}
98
}
99
100
+ dp8393x_update_irq(s);
101
+
102
s->regs[SONIC_RSC] = (s->regs[SONIC_RSC] & 0xff00) |
103
((s->regs[SONIC_RSC] + 1) & 0x00ff);
104
105
+done:
106
+
107
if (s->regs[SONIC_RCR] & SONIC_RCR_LPKT) {
108
- /* Read next RRA */
109
- dp8393x_do_read_rra(s);
110
+ if (s->regs[SONIC_RRP] == s->regs[SONIC_RWP]) {
111
+ /* Stop packet reception */
112
+ s->last_rba_is_full = true;
113
+ } else {
114
+ /* Read next resource */
115
+ dp8393x_do_read_rra(s);
116
+ }
117
}
118
119
- /* Done */
120
- dp8393x_update_irq(s);
121
-
122
return pkt_size;
73
}
123
}
74
124
75
+static bool filter_mirror_get_vnet_hdr(Object *obj, Error **errp)
76
+{
77
+ MirrorState *s = FILTER_MIRROR(obj);
78
+
79
+ return s->vnet_hdr;
80
+}
81
+
82
+static void filter_mirror_set_vnet_hdr(Object *obj, bool value, Error **errp)
83
+{
84
+ MirrorState *s = FILTER_MIRROR(obj);
85
+
86
+ s->vnet_hdr = value;
87
+}
88
+
89
static char *filter_redirector_get_outdev(Object *obj, Error **errp)
90
{
91
MirrorState *s = FILTER_REDIRECTOR(obj);
92
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
93
94
static void filter_mirror_init(Object *obj)
95
{
96
+ MirrorState *s = FILTER_MIRROR(obj);
97
+
98
object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
99
filter_mirror_set_outdev, NULL);
100
+
101
+ s->vnet_hdr = false;
102
+ object_property_add_bool(obj, "vnet_hdr_support",
103
+ filter_mirror_get_vnet_hdr,
104
+ filter_mirror_set_vnet_hdr, NULL);
105
}
106
107
static void filter_redirector_init(Object *obj)
108
diff --git a/qemu-options.hx b/qemu-options.hx
109
index XXXXXXX..XXXXXXX 100644
110
--- a/qemu-options.hx
111
+++ b/qemu-options.hx
112
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
113
@option{tx}: the filter is attached to the transmit queue of the netdev,
114
where it will receive packets sent by the netdev.
115
116
-@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
117
+@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
118
119
-filter-mirror on netdev @var{netdevid},mirror net packet to chardev
120
-@var{chardevid}
121
+filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
122
123
@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
124
outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
125
--
125
--
126
2.7.4
126
2.5.0
127
127
128
128
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
2
2
3
We add a flag to decide whether net_fill_rstate() need read
3
https://bugzilla.redhat.com/show_bug.cgi?id=1787142
4
the vnet_hdr_len or not.
4
The emulation issues hw_error if PSRCTL register
5
is written, for example, with zero value.
6
Such configuration does not present any problem when
7
DTYP bits of RCTL register define legacy format of
8
transfer descriptors. Current commit discards check
9
for BSIZE0 and BSIZE1 when legacy mode used.
5
10
6
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
11
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
7
Suggested-by: Jason Wang <jasowang@redhat.com>
12
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
8
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
9
---
14
---
10
include/net/net.h | 9 +++++++--
15
hw/net/e1000e_core.c | 13 ++++++++-----
11
net/colo-compare.c | 4 ++--
16
1 file changed, 8 insertions(+), 5 deletions(-)
12
net/filter-mirror.c | 2 +-
13
net/net.c | 36 ++++++++++++++++++++++++++++++++----
14
net/socket.c | 8 ++++----
15
5 files changed, 46 insertions(+), 13 deletions(-)
16
17
17
diff --git a/include/net/net.h b/include/net/net.h
18
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
18
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
19
--- a/include/net/net.h
20
--- a/hw/net/e1000e_core.c
20
+++ b/include/net/net.h
21
+++ b/hw/net/e1000e_core.c
21
@@ -XXX,XX +XXX,XX @@ typedef struct NICState {
22
@@ -XXX,XX +XXX,XX @@ e1000e_set_eitr(E1000ECore *core, int index, uint32_t val)
22
} NICState;
23
static void
23
24
e1000e_set_psrctl(E1000ECore *core, int index, uint32_t val)
24
struct SocketReadState {
25
{
25
- int state; /* 0 = getting length, 1 = getting data */
26
- if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
26
+ /* 0 = getting length, 1 = getting vnet header length, 2 = getting data */
27
- hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
27
+ int state;
28
- }
28
+ /* This flag decide whether to read the vnet_hdr_len field */
29
+ if (core->mac[RCTL] & E1000_RCTL_DTYP_MASK) {
29
+ bool vnet_hdr;
30
+
30
uint32_t index;
31
+ if ((val & E1000_PSRCTL_BSIZE0_MASK) == 0) {
31
uint32_t packet_len;
32
+ hw_error("e1000e: PSRCTL.BSIZE0 cannot be zero");
32
+ uint32_t vnet_hdr_len;
33
+ }
33
uint8_t buf[NET_BUFSIZE];
34
34
SocketReadStateFinalize *finalize;
35
- if ((val & E1000_PSRCTL_BSIZE1_MASK) == 0) {
35
};
36
- hw_error("e1000e: PSRCTL.BSIZE1 cannot be zero");
36
@@ -XXX,XX +XXX,XX @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
37
+ if ((val & E1000_PSRCTL_BSIZE1_MASK) == 0) {
37
void print_net_client(Monitor *mon, NetClientState *nc);
38
+ hw_error("e1000e: PSRCTL.BSIZE1 cannot be zero");
38
void hmp_info_network(Monitor *mon, const QDict *qdict);
39
+ }
39
void net_socket_rs_init(SocketReadState *rs,
40
- SocketReadStateFinalize *finalize);
41
+ SocketReadStateFinalize *finalize,
42
+ bool vnet_hdr);
43
44
/* NIC info */
45
46
diff --git a/net/colo-compare.c b/net/colo-compare.c
47
index XXXXXXX..XXXXXXX 100644
48
--- a/net/colo-compare.c
49
+++ b/net/colo-compare.c
50
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
51
return;
52
}
40
}
53
41
54
- net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
42
core->mac[PSRCTL] = val;
55
- net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
56
+ net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
57
+ net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
58
59
g_queue_init(&s->conn_list);
60
61
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
62
index XXXXXXX..XXXXXXX 100644
63
--- a/net/filter-mirror.c
64
+++ b/net/filter-mirror.c
65
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
66
}
67
}
68
69
- net_socket_rs_init(&s->rs, redirector_rs_finalize);
70
+ net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
71
72
if (s->indev) {
73
chr = qemu_chr_find(s->indev);
74
diff --git a/net/net.c b/net/net.c
75
index XXXXXXX..XXXXXXX 100644
76
--- a/net/net.c
77
+++ b/net/net.c
78
@@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_net_opts = {
79
};
80
81
void net_socket_rs_init(SocketReadState *rs,
82
- SocketReadStateFinalize *finalize)
83
+ SocketReadStateFinalize *finalize,
84
+ bool vnet_hdr)
85
{
86
rs->state = 0;
87
+ rs->vnet_hdr = vnet_hdr;
88
rs->index = 0;
89
rs->packet_len = 0;
90
+ rs->vnet_hdr_len = 0;
91
memset(rs->buf, 0, sizeof(rs->buf));
92
rs->finalize = finalize;
93
}
94
@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
95
unsigned int l;
96
97
while (size > 0) {
98
- /* reassemble a packet from the network */
99
- switch (rs->state) { /* 0 = getting length, 1 = getting data */
100
+ /* Reassemble a packet from the network.
101
+ * 0 = getting length.
102
+ * 1 = getting vnet header length.
103
+ * 2 = getting data.
104
+ */
105
+ switch (rs->state) {
106
case 0:
107
l = 4 - rs->index;
108
if (l > size) {
109
@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
110
/* got length */
111
rs->packet_len = ntohl(*(uint32_t *)rs->buf);
112
rs->index = 0;
113
- rs->state = 1;
114
+ if (rs->vnet_hdr) {
115
+ rs->state = 1;
116
+ } else {
117
+ rs->state = 2;
118
+ rs->vnet_hdr_len = 0;
119
+ }
120
}
121
break;
122
case 1:
123
+ l = 4 - rs->index;
124
+ if (l > size) {
125
+ l = size;
126
+ }
127
+ memcpy(rs->buf + rs->index, buf, l);
128
+ buf += l;
129
+ size -= l;
130
+ rs->index += l;
131
+ if (rs->index == 4) {
132
+ /* got vnet header length */
133
+ rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf);
134
+ rs->index = 0;
135
+ rs->state = 2;
136
+ }
137
+ break;
138
+ case 2:
139
l = rs->packet_len - rs->index;
140
if (l > size) {
141
l = size;
142
diff --git a/net/socket.c b/net/socket.c
143
index XXXXXXX..XXXXXXX 100644
144
--- a/net/socket.c
145
+++ b/net/socket.c
146
@@ -XXX,XX +XXX,XX @@ static void net_socket_send(void *opaque)
147
closesocket(s->fd);
148
149
s->fd = -1;
150
- net_socket_rs_init(&s->rs, net_socket_rs_finalize);
151
+ net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
152
s->nc.link_down = true;
153
memset(s->nc.info_str, 0, sizeof(s->nc.info_str));
154
155
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer,
156
s->fd = fd;
157
s->listen_fd = -1;
158
s->send_fn = net_socket_send_dgram;
159
- net_socket_rs_init(&s->rs, net_socket_rs_finalize);
160
+ net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
161
net_socket_read_poll(s, true);
162
163
/* mcast: save bound address as dst */
164
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_stream(NetClientState *peer,
165
166
s->fd = fd;
167
s->listen_fd = -1;
168
- net_socket_rs_init(&s->rs, net_socket_rs_finalize);
169
+ net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
170
171
/* Disable Nagle algorithm on TCP sockets to reduce latency */
172
socket_set_nodelay(fd);
173
@@ -XXX,XX +XXX,XX @@ static int net_socket_listen_init(NetClientState *peer,
174
s->fd = -1;
175
s->listen_fd = fd;
176
s->nc.link_down = true;
177
- net_socket_rs_init(&s->rs, net_socket_rs_finalize);
178
+ net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
179
180
qemu_set_fd_handler(s->listen_fd, net_socket_accept, NULL, s);
181
return 0;
182
--
43
--
183
2.7.4
44
2.5.0
184
45
185
46
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
2
2
3
We add the vnet_hdr_support option for colo-compare, default is disabled.
3
Add support for following hash types:
4
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
4
IPV6 TCP with extension headers
5
You can use it for example:
5
IPV4 UDP
6
-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
6
IPV6 UDP
7
IPV6 UDP with extension headers
7
8
8
COLO-compare can get vnet header length from filter,
9
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
9
Add vnet_hdr_len to struct packet and output packet with
10
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
10
the vnet_hdr_len.
11
12
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
---
12
---
15
net/colo-compare.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-------
13
hw/net/net_rx_pkt.c | 42 ++++++++++++++++++++++++++++++++++++++++++
16
qemu-options.hx | 4 ++--
14
hw/net/net_rx_pkt.h | 6 +++++-
17
2 files changed, 55 insertions(+), 9 deletions(-)
15
hw/net/trace-events | 4 ++++
16
3 files changed, 51 insertions(+), 1 deletion(-)
18
17
19
diff --git a/net/colo-compare.c b/net/colo-compare.c
18
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
20
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
21
--- a/net/colo-compare.c
20
--- a/hw/net/net_rx_pkt.c
22
+++ b/net/colo-compare.c
21
+++ b/hw/net/net_rx_pkt.c
23
@@ -XXX,XX +XXX,XX @@ typedef struct CompareState {
22
@@ -XXX,XX +XXX,XX @@ _net_rx_rss_prepare_tcp(uint8_t *rss_input,
24
CharBackend chr_out;
23
&tcphdr->th_dport, sizeof(uint16_t));
25
SocketReadState pri_rs;
24
}
26
SocketReadState sec_rs;
25
27
+ bool vnet_hdr;
26
+static inline void
28
27
+_net_rx_rss_prepare_udp(uint8_t *rss_input,
29
/* connection list: the connections belonged to this NIC could be found
28
+ struct NetRxPkt *pkt,
30
* in this list.
29
+ size_t *bytes_written)
31
@@ -XXX,XX +XXX,XX @@ enum {
30
+{
32
31
+ struct udp_header *udphdr = &pkt->l4hdr_info.hdr.udp;
33
static int compare_chr_send(CompareState *s,
34
const uint8_t *buf,
35
- uint32_t size);
36
+ uint32_t size,
37
+ uint32_t vnet_hdr_len);
38
39
static gint seq_sorter(Packet *a, Packet *b, gpointer data)
40
{
41
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
42
}
43
44
if (result) {
45
- ret = compare_chr_send(s, pkt->data, pkt->size);
46
+ ret = compare_chr_send(s,
47
+ pkt->data,
48
+ pkt->size,
49
+ pkt->vnet_hdr_len);
50
if (ret < 0) {
51
error_report("colo_send_primary_packet failed");
52
}
53
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
54
55
static int compare_chr_send(CompareState *s,
56
const uint8_t *buf,
57
- uint32_t size)
58
+ uint32_t size,
59
+ uint32_t vnet_hdr_len)
60
{
61
int ret = 0;
62
uint32_t len = htonl(size);
63
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CompareState *s,
64
goto err;
65
}
66
67
+ if (s->vnet_hdr) {
68
+ /*
69
+ * We send vnet header len make other module(like filter-redirector)
70
+ * know how to parse net packet correctly.
71
+ */
72
+ len = htonl(vnet_hdr_len);
73
+ ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
74
+ if (ret != sizeof(len)) {
75
+ goto err;
76
+ }
77
+ }
78
+
32
+
79
ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
33
+ _net_rx_rss_add_chunk(rss_input, bytes_written,
80
if (ret != size) {
34
+ &udphdr->uh_sport, sizeof(uint16_t));
81
goto err;
82
@@ -XXX,XX +XXX,XX @@ static void compare_set_outdev(Object *obj, const char *value, Error **errp)
83
s->outdev = g_strdup(value);
84
}
85
86
+static bool compare_get_vnet_hdr(Object *obj, Error **errp)
87
+{
88
+ CompareState *s = COLO_COMPARE(obj);
89
+
35
+
90
+ return s->vnet_hdr;
36
+ _net_rx_rss_add_chunk(rss_input, bytes_written,
37
+ &udphdr->uh_dport, sizeof(uint16_t));
91
+}
38
+}
92
+
39
+
93
+static void compare_set_vnet_hdr(Object *obj,
40
uint32_t
94
+ bool value,
41
net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
95
+ Error **errp)
42
NetRxPktRssType type,
96
+{
43
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
97
+ CompareState *s = COLO_COMPARE(obj);
44
trace_net_rx_pkt_rss_ip6_ex();
98
+
45
_net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
99
+ s->vnet_hdr = value;
46
break;
100
+}
47
+ case NetPktRssIpV6TcpEx:
101
+
48
+ assert(pkt->isip6);
102
static void compare_pri_rs_finalize(SocketReadState *pri_rs)
49
+ assert(pkt->istcp);
103
{
50
+ trace_net_rx_pkt_rss_ip6_ex_tcp();
104
CompareState *s = container_of(pri_rs, CompareState, pri_rs);
51
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
105
52
+ _net_rx_rss_prepare_tcp(&rss_input[0], pkt, &rss_length);
106
if (packet_enqueue(s, PRIMARY_IN)) {
53
+ break;
107
trace_colo_compare_main("primary: unsupported packet in");
54
+ case NetPktRssIpV4Udp:
108
- compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
55
+ assert(pkt->isip4);
109
+ compare_chr_send(s,
56
+ assert(pkt->isudp);
110
+ pri_rs->buf,
57
+ trace_net_rx_pkt_rss_ip4_udp();
111
+ pri_rs->packet_len,
58
+ _net_rx_rss_prepare_ip4(&rss_input[0], pkt, &rss_length);
112
+ pri_rs->vnet_hdr_len);
59
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
113
} else {
60
+ break;
114
/* compare connection */
61
+ case NetPktRssIpV6Udp:
115
g_queue_foreach(&s->conn_list, colo_compare_connection, s);
62
+ assert(pkt->isip6);
116
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
63
+ assert(pkt->isudp);
117
return;
64
+ trace_net_rx_pkt_rss_ip6_udp();
118
}
65
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, false, &rss_length);
119
66
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
120
- net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
67
+ break;
121
- net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
68
+ case NetPktRssIpV6UdpEx:
122
+ net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
69
+ assert(pkt->isip6);
123
+ net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
70
+ assert(pkt->isudp);
124
71
+ trace_net_rx_pkt_rss_ip6_ex_udp();
125
g_queue_init(&s->conn_list);
72
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
126
73
+ _net_rx_rss_prepare_udp(&rss_input[0], pkt, &rss_length);
127
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
74
+ break;
128
75
default:
129
while (!g_queue_is_empty(&conn->primary_list)) {
76
assert(false);
130
pkt = g_queue_pop_head(&conn->primary_list);
77
break;
131
- compare_chr_send(s, pkt->data, pkt->size);
78
diff --git a/hw/net/net_rx_pkt.h b/hw/net/net_rx_pkt.h
132
+ compare_chr_send(s,
133
+ pkt->data,
134
+ pkt->size,
135
+ pkt->vnet_hdr_len);
136
packet_destroy(pkt, NULL);
137
}
138
while (!g_queue_is_empty(&conn->secondary_list)) {
139
@@ -XXX,XX +XXX,XX @@ static void colo_compare_class_init(ObjectClass *oc, void *data)
140
141
static void colo_compare_init(Object *obj)
142
{
143
+ CompareState *s = COLO_COMPARE(obj);
144
+
145
object_property_add_str(obj, "primary_in",
146
compare_get_pri_indev, compare_set_pri_indev,
147
NULL);
148
@@ -XXX,XX +XXX,XX @@ static void colo_compare_init(Object *obj)
149
object_property_add_str(obj, "outdev",
150
compare_get_outdev, compare_set_outdev,
151
NULL);
152
+
153
+ s->vnet_hdr = false;
154
+ object_property_add_bool(obj, "vnet_hdr_support", compare_get_vnet_hdr,
155
+ compare_set_vnet_hdr, NULL);
156
}
157
158
static void colo_compare_finalize(Object *obj)
159
diff --git a/qemu-options.hx b/qemu-options.hx
160
index XXXXXXX..XXXXXXX 100644
79
index XXXXXXX..XXXXXXX 100644
161
--- a/qemu-options.hx
80
--- a/hw/net/net_rx_pkt.h
162
+++ b/qemu-options.hx
81
+++ b/hw/net/net_rx_pkt.h
163
@@ -XXX,XX +XXX,XX @@ Dump the network traffic on netdev @var{dev} to the file specified by
82
@@ -XXX,XX +XXX,XX @@ typedef enum {
164
The file format is libpcap, so it can be analyzed with tools such as tcpdump
83
NetPktRssIpV4Tcp,
165
or Wireshark.
84
NetPktRssIpV6Tcp,
166
85
NetPktRssIpV6,
167
-@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},
86
- NetPktRssIpV6Ex
168
-outdev=@var{chardevid}
87
+ NetPktRssIpV6Ex,
169
+@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid}[,vnet_hdr_support]
88
+ NetPktRssIpV6TcpEx,
170
89
+ NetPktRssIpV4Udp,
171
Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with
90
+ NetPktRssIpV6Udp,
172
secondary packet. If the packets are same, we will output primary
91
+ NetPktRssIpV6UdpEx,
173
packet to outdev@var{chardevid}, else we will notify colo-frame
92
} NetRxPktRssType;
174
do checkpoint and send primary packet to outdev@var{chardevid}.
93
175
+if it has the vnet_hdr_support flag, colo compare will send/recv packet with vnet_hdr_len.
94
/**
176
95
diff --git a/hw/net/trace-events b/hw/net/trace-events
177
we must use it with the help of filter-mirror and filter-redirector.
96
index XXXXXXX..XXXXXXX 100644
97
--- a/hw/net/trace-events
98
+++ b/hw/net/trace-events
99
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_l3_csum_validate_csum(size_t l3hdr_off, uint32_t csl, uint32_t cntr,
100
101
net_rx_pkt_rss_ip4(void) "Calculating IPv4 RSS hash"
102
net_rx_pkt_rss_ip4_tcp(void) "Calculating IPv4/TCP RSS hash"
103
+net_rx_pkt_rss_ip4_udp(void) "Calculating IPv4/UDP RSS hash"
104
net_rx_pkt_rss_ip6_tcp(void) "Calculating IPv6/TCP RSS hash"
105
+net_rx_pkt_rss_ip6_udp(void) "Calculating IPv6/UDP RSS hash"
106
net_rx_pkt_rss_ip6(void) "Calculating IPv6 RSS hash"
107
net_rx_pkt_rss_ip6_ex(void) "Calculating IPv6/EX RSS hash"
108
+net_rx_pkt_rss_ip6_ex_tcp(void) "Calculating IPv6/EX/TCP RSS hash"
109
+net_rx_pkt_rss_ip6_ex_udp(void) "Calculating IPv6/EX/UDP RSS hash"
110
net_rx_pkt_rss_hash(size_t rss_length, uint32_t rss_hash) "RSS hash for %zu bytes: 0x%X"
111
net_rx_pkt_rss_add_chunk(void* ptr, size_t size, size_t input_offset) "Add RSS chunk %p, %zu bytes, RSS input offset %zu bytes"
178
112
179
--
113
--
180
2.7.4
114
2.5.0
181
115
182
116
diff view generated by jsdifflib
1
Spec said offloads should be le64, so use virtio_ldq_p() to guarantee
1
From: Yuri Benditovich <yuri.benditovich@daynix.com>
2
valid endian.
3
2
4
Fixes: 644c98587d4c ("virtio-net: dynamic network offloads configuration")
3
When requested to calculate the hash for TCPV6 packet,
5
Cc: qemu-stable@nongnu.org
4
ignore overrides of source and destination addresses
6
Cc: Dmitry Fleytman <dfleytma@redhat.com>
5
in in extension headers.
6
Use these overrides when new hash type NetPktRssIpV6TcpEx
7
requested.
8
Use this type in e1000e hash calculation for IPv6 TCP, which
9
should take in account overrides of the addresses.
10
11
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
12
Acked-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
7
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
Signed-off-by: Jason Wang <jasowang@redhat.com>
8
---
14
---
9
hw/net/virtio-net.c | 2 ++
15
hw/net/e1000e_core.c | 2 +-
10
1 file changed, 2 insertions(+)
16
hw/net/net_rx_pkt.c | 2 +-
17
2 files changed, 2 insertions(+), 2 deletions(-)
11
18
12
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
19
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
13
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/net/virtio-net.c
21
--- a/hw/net/e1000e_core.c
15
+++ b/hw/net/virtio-net.c
22
+++ b/hw/net/e1000e_core.c
16
@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_offloads(VirtIONet *n, uint8_t cmd,
23
@@ -XXX,XX +XXX,XX @@ e1000e_rss_calc_hash(E1000ECore *core,
17
if (cmd == VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET) {
24
type = NetPktRssIpV4Tcp;
18
uint64_t supported_offloads;
25
break;
19
26
case E1000_MRQ_RSS_TYPE_IPV6TCP:
20
+ offloads = virtio_ldq_p(vdev, &offloads);
27
- type = NetPktRssIpV6Tcp;
21
+
28
+ type = NetPktRssIpV6TcpEx;
22
if (!n->has_vnet_hdr) {
29
break;
23
return VIRTIO_NET_ERR;
30
case E1000_MRQ_RSS_TYPE_IPV6:
24
}
31
type = NetPktRssIpV6;
32
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
33
index XXXXXXX..XXXXXXX 100644
34
--- a/hw/net/net_rx_pkt.c
35
+++ b/hw/net/net_rx_pkt.c
36
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_calc_rss_hash(struct NetRxPkt *pkt,
37
assert(pkt->isip6);
38
assert(pkt->istcp);
39
trace_net_rx_pkt_rss_ip6_tcp();
40
- _net_rx_rss_prepare_ip6(&rss_input[0], pkt, true, &rss_length);
41
+ _net_rx_rss_prepare_ip6(&rss_input[0], pkt, false, &rss_length);
42
_net_rx_rss_prepare_tcp(&rss_input[0], pkt, &rss_length);
43
break;
44
case NetPktRssIpV6:
25
--
45
--
26
2.7.4
46
2.5.0
27
47
28
48
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Bin Meng <bmeng.cn@gmail.com>
2
2
3
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
3
When CADENCE_GEM_ERR_DEBUG is turned on, there are several
4
compilation errors in DB_PRINT(). Fix them.
5
6
While we are here, update to use appropriate modifiers in
7
the same DB_PRINT() call.
8
9
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
10
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
4
Signed-off-by: Jason Wang <jasowang@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
5
---
12
---
6
docs/colo-proxy.txt | 26 ++++++++++++++++++++++++++
13
hw/net/cadence_gem.c | 11 ++++++-----
7
1 file changed, 26 insertions(+)
14
1 file changed, 6 insertions(+), 5 deletions(-)
8
15
9
diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
16
diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
10
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
11
--- a/docs/colo-proxy.txt
18
--- a/hw/net/cadence_gem.c
12
+++ b/docs/colo-proxy.txt
19
+++ b/hw/net/cadence_gem.c
13
@@ -XXX,XX +XXX,XX @@ Secondary(ip:3.3.3.8):
20
@@ -XXX,XX +XXX,XX @@ static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size)
14
-chardev socket,id=red1,host=3.3.3.3,port=9004
21
return -1;
15
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
22
}
16
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
23
17
+-object filter-rewriter,id=f3,netdev=hn0,queue=all
24
- DB_PRINT("copy %d bytes to 0x%x\n", MIN(bytes_to_copy, rxbufsize),
18
+
25
- rx_desc_get_buffer(s->rx_desc[q]));
19
+If you want to use virtio-net-pci or other driver with vnet_header:
26
+ DB_PRINT("copy %u bytes to 0x%" PRIx64 "\n",
20
+
27
+ MIN(bytes_to_copy, rxbufsize),
21
+Primary(ip:3.3.3.3):
28
+ rx_desc_get_buffer(s, s->rx_desc[q]));
22
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
29
23
+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
30
/* Copy packet data to emulated DMA buffer */
24
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
31
address_space_write(&s->dma_as, rx_desc_get_buffer(s, s->rx_desc[q]) +
25
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
32
@@ -XXX,XX +XXX,XX @@ static void gem_transmit(CadenceGEMState *s)
26
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
33
27
+-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
34
if (tx_desc_get_length(desc) > sizeof(tx_packet) -
28
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
35
(p - tx_packet)) {
29
+-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
36
- DB_PRINT("TX descriptor @ 0x%x too large: size 0x%x space " \
30
+-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
37
- "0x%x\n", (unsigned)packet_desc_addr,
31
+-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out,vnet_hdr_support
38
- (unsigned)tx_desc_get_length(desc),
32
+-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0,vnet_hdr_support
39
+ DB_PRINT("TX descriptor @ 0x%" HWADDR_PRIx \
33
+-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
40
+ " too large: size 0x%x space 0x%zx\n",
34
+
41
+ packet_desc_addr, tx_desc_get_length(desc),
35
+Secondary(ip:3.3.3.8):
42
sizeof(tx_packet) - (p - tx_packet));
36
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
43
break;
37
+-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
44
}
38
+-chardev socket,id=red0,host=3.3.3.3,port=9003
39
+-chardev socket,id=red1,host=3.3.3.3,port=9004
40
+-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0,vnet_hdr_support
41
+-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1,vnet_hdr_support
42
+-object filter-rewriter,id=f3,netdev=hn0,queue=all,vnet_hdr_support
43
44
Note:
45
a.COLO-proxy must work with COLO-frame and Block-replication.
46
--
45
--
47
2.7.4
46
2.5.0
48
47
49
48
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Lukas Straub <lukasstraub2@web.de>
2
2
3
Make colo-compare and filter-rewriter can parse vnet packet.
3
After failover the Secondary side of replication shouldn't change state, because
4
it now functions as our primary disk.
4
5
5
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
6
In replication_start, replication_do_checkpoint, replication_stop, ignore
7
the request if current state is BLOCK_REPLICATION_DONE (sucessful failover) or
8
BLOCK_REPLICATION_FAILOVER (failover in progres i.e. currently merging active
9
and hidden images into the base image).
10
11
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
12
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
13
Acked-by: Max Reitz <mreitz@redhat.com>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
14
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
15
---
8
net/colo.c | 6 +++---
16
block/replication.c | 35 ++++++++++++++++++++++++++++++++++-
9
1 file changed, 3 insertions(+), 3 deletions(-)
17
1 file changed, 34 insertions(+), 1 deletion(-)
10
18
11
diff --git a/net/colo.c b/net/colo.c
19
diff --git a/block/replication.c b/block/replication.c
12
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
13
--- a/net/colo.c
21
--- a/block/replication.c
14
+++ b/net/colo.c
22
+++ b/block/replication.c
15
@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
23
@@ -XXX,XX +XXX,XX @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
16
{
24
aio_context_acquire(aio_context);
17
int network_length;
25
s = bs->opaque;
18
static const uint8_t vlan[] = {0x81, 0x00};
26
19
- uint8_t *data = pkt->data;
27
+ if (s->stage == BLOCK_REPLICATION_DONE ||
20
+ uint8_t *data = pkt->data + pkt->vnet_hdr_len;
28
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
21
uint16_t l3_proto;
29
+ /*
22
ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
30
+ * This case happens when a secondary is promoted to primary.
23
31
+ * Ignore the request because the secondary side of replication
24
- if (pkt->size < ETH_HLEN) {
32
+ * doesn't have to do anything anymore.
25
+ if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
33
+ */
26
trace_colo_proxy_main("pkt->size < ETH_HLEN");
34
+ aio_context_release(aio_context);
27
return 1;
35
+ return;
36
+ }
37
+
38
if (s->stage != BLOCK_REPLICATION_NONE) {
39
error_setg(errp, "Block replication is running or done");
40
aio_context_release(aio_context);
41
@@ -XXX,XX +XXX,XX @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
42
aio_context_acquire(aio_context);
43
s = bs->opaque;
44
45
+ if (s->stage == BLOCK_REPLICATION_DONE ||
46
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
47
+ /*
48
+ * This case happens when a secondary was promoted to primary.
49
+ * Ignore the request because the secondary side of replication
50
+ * doesn't have to do anything anymore.
51
+ */
52
+ aio_context_release(aio_context);
53
+ return;
54
+ }
55
+
56
if (s->mode == REPLICATION_MODE_SECONDARY) {
57
secondary_do_checkpoint(s, errp);
28
}
58
}
29
@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
59
@@ -XXX,XX +XXX,XX @@ static void replication_get_error(ReplicationState *rs, Error **errp)
30
}
60
aio_context_acquire(aio_context);
31
61
s = bs->opaque;
32
network_length = pkt->ip->ip_hl * 4;
62
33
- if (pkt->size < l2hdr_len + network_length) {
63
- if (s->stage != BLOCK_REPLICATION_RUNNING) {
34
+ if (pkt->size < l2hdr_len + network_length + pkt->vnet_hdr_len) {
64
+ if (s->stage == BLOCK_REPLICATION_NONE) {
35
trace_colo_proxy_main("pkt->size < network_header + network_length");
65
error_setg(errp, "Block replication is not running");
36
return 1;
66
aio_context_release(aio_context);
37
}
67
return;
68
@@ -XXX,XX +XXX,XX @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
69
aio_context_acquire(aio_context);
70
s = bs->opaque;
71
72
+ if (s->stage == BLOCK_REPLICATION_DONE ||
73
+ s->stage == BLOCK_REPLICATION_FAILOVER) {
74
+ /*
75
+ * This case happens when a secondary was promoted to primary.
76
+ * Ignore the request because the secondary side of replication
77
+ * doesn't have to do anything anymore.
78
+ */
79
+ aio_context_release(aio_context);
80
+ return;
81
+ }
82
+
83
if (s->stage != BLOCK_REPLICATION_RUNNING) {
84
error_setg(errp, "Block replication is not running");
85
aio_context_release(aio_context);
38
--
86
--
39
2.7.4
87
2.5.0
40
88
41
89
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Lukas Straub <lukasstraub2@web.de>
2
2
3
We can use this property flush and send packet with vnet_hdr_len.
3
This simulates the case that happens when we resume COLO after failover.
4
4
5
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
5
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
7
---
8
net/colo-compare.c | 8 ++++++--
8
tests/test-replication.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
9
net/colo.c | 3 ++-
9
1 file changed, 52 insertions(+)
10
net/colo.h | 4 +++-
11
net/filter-rewriter.c | 2 +-
12
4 files changed, 12 insertions(+), 5 deletions(-)
13
10
14
diff --git a/net/colo-compare.c b/net/colo-compare.c
11
diff --git a/tests/test-replication.c b/tests/test-replication.c
15
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
16
--- a/net/colo-compare.c
13
--- a/tests/test-replication.c
17
+++ b/net/colo-compare.c
14
+++ b/tests/test-replication.c
18
@@ -XXX,XX +XXX,XX @@ static int packet_enqueue(CompareState *s, int mode)
15
@@ -XXX,XX +XXX,XX @@ static void test_secondary_stop(void)
19
Connection *conn;
16
teardown_secondary();
20
21
if (mode == PRIMARY_IN) {
22
- pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
23
+ pkt = packet_new(s->pri_rs.buf,
24
+ s->pri_rs.packet_len,
25
+ s->pri_rs.vnet_hdr_len);
26
} else {
27
- pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
28
+ pkt = packet_new(s->sec_rs.buf,
29
+ s->sec_rs.packet_len,
30
+ s->sec_rs.vnet_hdr_len);
31
}
32
33
if (parse_packet_early(pkt)) {
34
diff --git a/net/colo.c b/net/colo.c
35
index XXXXXXX..XXXXXXX 100644
36
--- a/net/colo.c
37
+++ b/net/colo.c
38
@@ -XXX,XX +XXX,XX @@ void connection_destroy(void *opaque)
39
g_slice_free(Connection, conn);
40
}
17
}
41
18
42
-Packet *packet_new(const void *data, int size)
19
+static void test_secondary_continuous_replication(void)
43
+Packet *packet_new(const void *data, int size, int vnet_hdr_len)
20
+{
21
+ BlockBackend *top_blk, *local_blk;
22
+ Error *local_err = NULL;
23
+
24
+ top_blk = start_secondary();
25
+ replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
26
+ g_assert(!local_err);
27
+
28
+ /* write 0x22 to s_local_disk (IMG_SIZE / 2, IMG_SIZE) */
29
+ local_blk = blk_by_name(S_LOCAL_DISK_ID);
30
+ test_blk_write(local_blk, 0x22, IMG_SIZE / 2, IMG_SIZE / 2, false);
31
+
32
+ /* replication will backup s_local_disk to s_hidden_disk */
33
+ test_blk_read(top_blk, 0x11, IMG_SIZE / 2,
34
+ IMG_SIZE / 2, 0, IMG_SIZE, false);
35
+
36
+ /* write 0x33 to s_active_disk (0, IMG_SIZE / 2) */
37
+ test_blk_write(top_blk, 0x33, 0, IMG_SIZE / 2, false);
38
+
39
+ /* do failover (active commit) */
40
+ replication_stop_all(true, &local_err);
41
+ g_assert(!local_err);
42
+
43
+ /* it should ignore all requests from now on */
44
+
45
+ /* start after failover */
46
+ replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
47
+ g_assert(!local_err);
48
+
49
+ /* checkpoint */
50
+ replication_do_checkpoint_all(&local_err);
51
+ g_assert(!local_err);
52
+
53
+ /* stop */
54
+ replication_stop_all(true, &local_err);
55
+ g_assert(!local_err);
56
+
57
+ /* read from s_local_disk (0, IMG_SIZE / 2) */
58
+ test_blk_read(top_blk, 0x33, 0, IMG_SIZE / 2,
59
+ 0, IMG_SIZE / 2, false);
60
+
61
+
62
+ /* read from s_local_disk (IMG_SIZE / 2, IMG_SIZE) */
63
+ test_blk_read(top_blk, 0x22, IMG_SIZE / 2,
64
+ IMG_SIZE / 2, 0, IMG_SIZE, false);
65
+
66
+ teardown_secondary();
67
+}
68
+
69
static void test_secondary_do_checkpoint(void)
44
{
70
{
45
Packet *pkt = g_slice_new(Packet);
71
BlockBackend *top_blk, *local_blk;
46
72
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
47
pkt->data = g_memdup(data, size);
73
g_test_add_func("/replication/secondary/write", test_secondary_write);
48
pkt->size = size;
74
g_test_add_func("/replication/secondary/start", test_secondary_start);
49
pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
75
g_test_add_func("/replication/secondary/stop", test_secondary_stop);
50
+ pkt->vnet_hdr_len = vnet_hdr_len;
76
+ g_test_add_func("/replication/secondary/continuous_replication",
51
77
+ test_secondary_continuous_replication);
52
return pkt;
78
g_test_add_func("/replication/secondary/do_checkpoint",
53
}
79
test_secondary_do_checkpoint);
54
diff --git a/net/colo.h b/net/colo.h
80
g_test_add_func("/replication/secondary/get_error_all",
55
index XXXXXXX..XXXXXXX 100644
56
--- a/net/colo.h
57
+++ b/net/colo.h
58
@@ -XXX,XX +XXX,XX @@ typedef struct Packet {
59
int size;
60
/* Time of packet creation, in wall clock ms */
61
int64_t creation_ms;
62
+ /* Get vnet_hdr_len from filter */
63
+ uint32_t vnet_hdr_len;
64
} Packet;
65
66
typedef struct ConnectionKey {
67
@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
68
ConnectionKey *key,
69
GQueue *conn_list);
70
void connection_hashtable_reset(GHashTable *connection_track_table);
71
-Packet *packet_new(const void *data, int size);
72
+Packet *packet_new(const void *data, int size, int vnet_hdr_len);
73
void packet_destroy(void *opaque, void *user_data);
74
75
#endif /* QEMU_COLO_PROXY_H */
76
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
77
index XXXXXXX..XXXXXXX 100644
78
--- a/net/filter-rewriter.c
79
+++ b/net/filter-rewriter.c
80
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
81
char *buf = g_malloc0(size);
82
83
iov_to_buf(iov, iovcnt, 0, buf, size);
84
- pkt = packet_new(buf, size);
85
+ pkt = packet_new(buf, size, 0);
86
g_free(buf);
87
88
/*
89
--
81
--
90
2.7.4
82
2.5.0
91
83
92
84
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Lukas Straub <lukasstraub2@web.de>
2
2
3
We add the vnet_hdr_support option for filter-rewriter, default is disabled.
3
To switch the Secondary to Primary, we need to insert new filters
4
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
4
before the filter-rewriter.
5
You can use it for example:
5
6
-object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr_support
6
Add the options insert= and position= to be able to insert filters
7
7
anywhere in the filter list.
8
We get the vnet_hdr_len from NetClientState that make us
8
9
parse net packet correctly.
9
position should be "head" or "tail" to insert at the head or
10
10
tail of the filter list or it should be "id=<id>" to specify
11
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
11
the id of another filter.
12
insert should be either "before" or "behind" to specify where to
13
insert the new filter relative to the one specified with position.
14
15
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
16
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
12
Signed-off-by: Jason Wang <jasowang@redhat.com>
17
Signed-off-by: Jason Wang <jasowang@redhat.com>
13
---
18
---
14
net/filter-rewriter.c | 37 ++++++++++++++++++++++++++++++++++++-
19
include/net/filter.h | 2 ++
15
qemu-options.hx | 4 ++--
20
net/filter.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++-
16
2 files changed, 38 insertions(+), 3 deletions(-)
21
qemu-options.hx | 31 +++++++++++++++---
17
22
3 files changed, 119 insertions(+), 6 deletions(-)
18
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
23
24
diff --git a/include/net/filter.h b/include/net/filter.h
19
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
20
--- a/net/filter-rewriter.c
26
--- a/include/net/filter.h
21
+++ b/net/filter-rewriter.c
27
+++ b/include/net/filter.h
22
@@ -XXX,XX +XXX,XX @@
28
@@ -XXX,XX +XXX,XX @@ struct NetFilterState {
23
#include "qemu-common.h"
29
NetClientState *netdev;
24
#include "qapi/error.h"
30
NetFilterDirection direction;
25
#include "qapi/qmp/qerror.h"
31
bool on;
26
+#include "qemu/error-report.h"
32
+ char *position;
27
#include "qapi-visit.h"
33
+ bool insert_before_flag;
28
#include "qom/object.h"
34
QTAILQ_ENTRY(NetFilterState) next;
29
#include "qemu/main-loop.h"
35
};
30
@@ -XXX,XX +XXX,XX @@ typedef struct RewriterState {
36
31
NetQueue *incoming_queue;
37
diff --git a/net/filter.c b/net/filter.c
32
/* hashtable to save connection */
38
index XXXXXXX..XXXXXXX 100644
33
GHashTable *connection_track_table;
39
--- a/net/filter.c
34
+ bool vnet_hdr;
40
+++ b/net/filter.c
35
} RewriterState;
41
@@ -XXX,XX +XXX,XX @@ static void netfilter_set_status(Object *obj, const char *str, Error **errp)
36
42
}
37
static void filter_rewriter_flush(NetFilterState *nf)
43
}
38
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
44
39
ConnectionKey key;
45
+static char *netfilter_get_position(Object *obj, Error **errp)
40
Packet *pkt;
46
+{
41
ssize_t size = iov_size(iov, iovcnt);
47
+ NetFilterState *nf = NETFILTER(obj);
42
+ ssize_t vnet_hdr_len = 0;
48
+
43
char *buf = g_malloc0(size);
49
+ return g_strdup(nf->position);
44
50
+}
45
iov_to_buf(iov, iovcnt, 0, buf, size);
51
+
46
- pkt = packet_new(buf, size, 0);
52
+static void netfilter_set_position(Object *obj, const char *str, Error **errp)
47
+
53
+{
48
+ if (s->vnet_hdr) {
54
+ NetFilterState *nf = NETFILTER(obj);
49
+ vnet_hdr_len = nf->netdev->vnet_hdr_len;
55
+
56
+ nf->position = g_strdup(str);
57
+}
58
+
59
+static char *netfilter_get_insert(Object *obj, Error **errp)
60
+{
61
+ NetFilterState *nf = NETFILTER(obj);
62
+
63
+ return nf->insert_before_flag ? g_strdup("before") : g_strdup("behind");
64
+}
65
+
66
+static void netfilter_set_insert(Object *obj, const char *str, Error **errp)
67
+{
68
+ NetFilterState *nf = NETFILTER(obj);
69
+
70
+ if (strcmp(str, "before") && strcmp(str, "behind")) {
71
+ error_setg(errp, "Invalid value for netfilter insert, "
72
+ "should be 'before' or 'behind'");
73
+ return;
50
+ }
74
+ }
51
+
75
+
52
+ pkt = packet_new(buf, size, vnet_hdr_len);
76
+ nf->insert_before_flag = !strcmp(str, "before");
53
g_free(buf);
77
+}
54
78
+
55
/*
79
static void netfilter_init(Object *obj)
56
@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
57
s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
58
}
59
60
+static bool filter_rewriter_get_vnet_hdr(Object *obj, Error **errp)
61
+{
62
+ RewriterState *s = FILTER_COLO_REWRITER(obj);
63
+
64
+ return s->vnet_hdr;
65
+}
66
+
67
+static void filter_rewriter_set_vnet_hdr(Object *obj,
68
+ bool value,
69
+ Error **errp)
70
+{
71
+ RewriterState *s = FILTER_COLO_REWRITER(obj);
72
+
73
+ s->vnet_hdr = value;
74
+}
75
+
76
+static void filter_rewriter_init(Object *obj)
77
+{
78
+ RewriterState *s = FILTER_COLO_REWRITER(obj);
79
+
80
+ s->vnet_hdr = false;
81
+ object_property_add_bool(obj, "vnet_hdr_support",
82
+ filter_rewriter_get_vnet_hdr,
83
+ filter_rewriter_set_vnet_hdr, NULL);
84
+}
85
+
86
static void colo_rewriter_class_init(ObjectClass *oc, void *data)
87
{
80
{
88
NetFilterClass *nfc = NETFILTER_CLASS(oc);
81
NetFilterState *nf = NETFILTER(obj);
89
@@ -XXX,XX +XXX,XX @@ static const TypeInfo colo_rewriter_info = {
82
90
.name = TYPE_FILTER_REWRITER,
83
nf->on = true;
91
.parent = TYPE_NETFILTER,
84
+ nf->insert_before_flag = false;
92
.class_init = colo_rewriter_class_init,
85
+ nf->position = g_strdup("tail");
93
+ .instance_init = filter_rewriter_init,
86
94
.instance_size = sizeof(RewriterState),
87
object_property_add_str(obj, "netdev",
95
};
88
netfilter_get_netdev_id, netfilter_set_netdev_id,
96
89
@@ -XXX,XX +XXX,XX @@ static void netfilter_init(Object *obj)
90
object_property_add_str(obj, "status",
91
netfilter_get_status, netfilter_set_status,
92
NULL);
93
+ object_property_add_str(obj, "position",
94
+ netfilter_get_position, netfilter_set_position,
95
+ NULL);
96
+ object_property_add_str(obj, "insert",
97
+ netfilter_get_insert, netfilter_set_insert,
98
+ NULL);
99
}
100
101
static void netfilter_complete(UserCreatable *uc, Error **errp)
102
{
103
NetFilterState *nf = NETFILTER(uc);
104
+ NetFilterState *position = NULL;
105
NetClientState *ncs[MAX_QUEUE_NUM];
106
NetFilterClass *nfc = NETFILTER_GET_CLASS(uc);
107
int queues;
108
@@ -XXX,XX +XXX,XX @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
109
return;
110
}
111
112
+ if (strcmp(nf->position, "head") && strcmp(nf->position, "tail")) {
113
+ Object *container;
114
+ Object *obj;
115
+ char *position_id;
116
+
117
+ if (!g_str_has_prefix(nf->position, "id=")) {
118
+ error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "position",
119
+ "'head', 'tail' or 'id=<id>'");
120
+ return;
121
+ }
122
+
123
+ /* get the id from the string */
124
+ position_id = g_strndup(nf->position + 3, strlen(nf->position) - 3);
125
+
126
+ /* Search for the position to insert before/behind */
127
+ container = object_get_objects_root();
128
+ obj = object_resolve_path_component(container, position_id);
129
+ if (!obj) {
130
+ error_setg(errp, "filter '%s' not found", position_id);
131
+ g_free(position_id);
132
+ return;
133
+ }
134
+
135
+ position = NETFILTER(obj);
136
+
137
+ if (position->netdev != ncs[0]) {
138
+ error_setg(errp, "filter '%s' belongs to a different netdev",
139
+ position_id);
140
+ g_free(position_id);
141
+ return;
142
+ }
143
+
144
+ g_free(position_id);
145
+ }
146
+
147
nf->netdev = ncs[0];
148
149
if (nfc->setup) {
150
@@ -XXX,XX +XXX,XX @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
151
return;
152
}
153
}
154
- QTAILQ_INSERT_TAIL(&nf->netdev->filters, nf, next);
155
+
156
+ if (position) {
157
+ if (nf->insert_before_flag) {
158
+ QTAILQ_INSERT_BEFORE(position, nf, next);
159
+ } else {
160
+ QTAILQ_INSERT_AFTER(&nf->netdev->filters, position, nf, next);
161
+ }
162
+ } else if (!strcmp(nf->position, "head")) {
163
+ QTAILQ_INSERT_HEAD(&nf->netdev->filters, nf, next);
164
+ } else if (!strcmp(nf->position, "tail")) {
165
+ QTAILQ_INSERT_TAIL(&nf->netdev->filters, nf, next);
166
+ }
167
}
168
169
static void netfilter_finalize(Object *obj)
170
@@ -XXX,XX +XXX,XX @@ static void netfilter_finalize(Object *obj)
171
QTAILQ_REMOVE(&nf->netdev->filters, nf, next);
172
}
173
g_free(nf->netdev_id);
174
+ g_free(nf->position);
175
}
176
177
static void default_handle_event(NetFilterState *nf, int event, Error **errp)
97
diff --git a/qemu-options.hx b/qemu-options.hx
178
diff --git a/qemu-options.hx b/qemu-options.hx
98
index XXXXXXX..XXXXXXX 100644
179
index XXXXXXX..XXXXXXX 100644
99
--- a/qemu-options.hx
180
--- a/qemu-options.hx
100
+++ b/qemu-options.hx
181
+++ b/qemu-options.hx
182
@@ -XXX,XX +XXX,XX @@ applications, they can do this through this parameter. Its format is
183
a gnutls priority string as described at
184
@url{https://gnutls.org/manual/html_node/Priority-Strings.html}.
185
186
-@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{on|off}]
187
+@item -object filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{on|off}][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
188
189
Interval @var{t} can't be 0, this filter batches the packet delivery: all
190
packets arriving in a given interval on netdev @var{netdevid} are delayed
191
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
192
@option{tx}: the filter is attached to the transmit queue of the netdev,
193
where it will receive packets sent by the netdev.
194
195
-@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
196
+position @var{head|tail|id=<id>} is an option to specify where the
197
+filter should be inserted in the filter list. It can be applied to any
198
+netfilter.
199
+
200
+@option{head}: the filter is inserted at the head of the filter
201
+ list, before any existing filters.
202
+
203
+@option{tail}: the filter is inserted at the tail of the filter
204
+ list, behind any existing filters (default).
205
+
206
+@option{id=<id>}: the filter is inserted before or behind the filter
207
+ specified by <id>, see the insert option below.
208
+
209
+insert @var{behind|before} is an option to specify where to insert the
210
+new filter relative to the one specified with position=id=<id>. It can
211
+be applied to any netfilter.
212
+
213
+@option{before}: insert before the specified filter.
214
+
215
+@option{behind}: insert behind the specified filter (default).
216
+
217
+@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
218
219
filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
220
221
-@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
222
+@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
223
224
filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev
225
@var{chardevid},and redirect indev's packet to filter.if it has the vnet_hdr_support flag,
101
@@ -XXX,XX +XXX,XX @@ Create a filter-redirector we need to differ outdev id from indev id, id can not
226
@@ -XXX,XX +XXX,XX @@ Create a filter-redirector we need to differ outdev id from indev id, id can not
102
be the same. we can just use indev or outdev, but at least one of indev or outdev
227
be the same. we can just use indev or outdev, but at least one of indev or outdev
103
need to be specified.
228
need to be specified.
104
229
105
-@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid}[,queue=@var{all|rx|tx}]
230
-@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support]
106
+@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support]
231
+@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
107
232
108
Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
233
Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
109
secondary from primary to keep secondary tcp connection,and rewrite
234
secondary from primary to keep secondary tcp connection,and rewrite
110
tcp packet to primary from secondary make tcp packet can be handled by
235
@@ -XXX,XX +XXX,XX @@ colo secondary:
111
-client.
236
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
112
+client.if it has the vnet_hdr_support flag, we can parse packet with vnet header.
237
-object filter-rewriter,id=rew0,netdev=hn0,queue=all
113
238
114
usage:
239
-@item -object filter-dump,id=@var{id},netdev=@var{dev}[,file=@var{filename}][,maxlen=@var{len}]
115
colo secondary:
240
+@item -object filter-dump,id=@var{id},netdev=@var{dev}[,file=@var{filename}][,maxlen=@var{len}][,position=@var{head|tail|id=<id>}][,insert=@var{behind|before}]
241
242
Dump the network traffic on netdev @var{dev} to the file specified by
243
@var{filename}. At most @var{len} bytes (64k by default) per packet are stored.
116
--
244
--
117
2.7.4
245
2.5.0
118
246
119
247
diff view generated by jsdifflib
New patch
1
1
From: Lukas Straub <lukasstraub2@web.de>
2
3
Document the qemu command-line and qmp commands for continuous replication
4
5
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
6
Signed-off-by: Jason Wang <jasowang@redhat.com>
7
---
8
docs/COLO-FT.txt | 224 +++++++++++++++++++++++++++++++++------------
9
docs/block-replication.txt | 28 ++++--
10
2 files changed, 184 insertions(+), 68 deletions(-)
11
12
diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
13
index XXXXXXX..XXXXXXX 100644
14
--- a/docs/COLO-FT.txt
15
+++ b/docs/COLO-FT.txt
16
@@ -XXX,XX +XXX,XX @@ The diagram just shows the main qmp command, you can get the detail
17
in test procedure.
18
19
== Test procedure ==
20
-1. Startup qemu
21
-Primary:
22
-# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name primary \
23
- -device piix3-usb-uhci -vnc :7 \
24
- -device usb-tablet -netdev tap,id=hn0,vhost=off \
25
- -device virtio-net-pci,id=net-pci0,netdev=hn0 \
26
- -drive if=virtio,id=primary-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
27
- children.0.file.filename=1.raw,\
28
- children.0.driver=raw -S
29
-Secondary:
30
-# qemu-system-x86_64 -accel kvm -m 2048 -smp 2 -qmp stdio -name secondary \
31
- -device piix3-usb-uhci -vnc :7 \
32
- -device usb-tablet -netdev tap,id=hn0,vhost=off \
33
- -device virtio-net-pci,id=net-pci0,netdev=hn0 \
34
- -drive if=none,id=secondary-disk0,file.filename=1.raw,driver=raw,node-name=node0 \
35
- -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
36
- file.driver=qcow2,top-id=active-disk0,\
37
- file.file.filename=/mnt/ramfs/active_disk.img,\
38
- file.backing.driver=qcow2,\
39
- file.backing.file.filename=/mnt/ramfs/hidden_disk.img,\
40
- file.backing.backing=secondary-disk0 \
41
- -incoming tcp:0:8888
42
-
43
-2. On Secondary VM's QEMU monitor, issue command
44
+Note: Here we are running both instances on the same host for testing,
45
+change the IP Addresses if you want to run it on two hosts. Initally
46
+127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
47
+
48
+== Startup qemu ==
49
+1. Primary:
50
+Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts.
51
+You don't need to change any IP's here, because 0.0.0.0 listens on any
52
+interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
53
+instance.
54
+
55
+# imagefolder="/mnt/vms/colo-test-primary"
56
+
57
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \
58
+ -device piix3-usb-uhci -device usb-tablet -name primary \
59
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
60
+ -device rtl8139,id=e0,netdev=hn0 \
61
+ -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \
62
+ -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \
63
+ -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \
64
+ -chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
65
+ -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait \
66
+ -chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
67
+ -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
68
+ -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
69
+ -object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
70
+ -object iothread,id=iothread1 \
71
+ -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,\
72
+outdev=compare_out0,iothread=iothread1 \
73
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
74
+children.0.file.filename=$imagefolder/primary.qcow2,children.0.driver=qcow2 -S
75
+
76
+2. Secondary:
77
+Note: Active and hidden images need to be created only once and the
78
+size should be the same as primary.qcow2. Again, you don't need to change
79
+any IP's here, except for the $primary_ip variable.
80
+
81
+# imagefolder="/mnt/vms/colo-test-secondary"
82
+# primary_ip=127.0.0.1
83
+
84
+# qemu-img create -f qcow2 $imagefolder/secondary-active.qcow2 10G
85
+
86
+# qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
87
+
88
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp stdio \
89
+ -device piix3-usb-uhci -device usb-tablet -name secondary \
90
+ -netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
91
+ -device rtl8139,id=e0,netdev=hn0 \
92
+ -chardev socket,id=red0,host=$primary_ip,port=9003,reconnect=1 \
93
+ -chardev socket,id=red1,host=$primary_ip,port=9004,reconnect=1 \
94
+ -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
95
+ -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
96
+ -object filter-rewriter,id=rew0,netdev=hn0,queue=all \
97
+ -drive if=none,id=parent0,file.filename=$imagefolder/primary.qcow2,driver=qcow2 \
98
+ -drive if=none,id=childs0,driver=replication,mode=secondary,file.driver=qcow2,\
99
+top-id=colo-disk0,file.file.filename=$imagefolder/secondary-active.qcow2,\
100
+file.backing.driver=qcow2,file.backing.file.filename=$imagefolder/secondary-hidden.qcow2,\
101
+file.backing.backing=parent0 \
102
+ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,\
103
+children.0=childs0 \
104
+ -incoming tcp:0.0.0.0:9998
105
+
106
+
107
+3. On Secondary VM's QEMU monitor, issue command
108
{'execute':'qmp_capabilities'}
109
-{ 'execute': 'nbd-server-start',
110
- 'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 'port': '8889'} } }
111
-}
112
-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 'writable': true } }
113
+{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0.0.0.0', 'port': '9999'} } } }
114
+{'execute': 'nbd-server-add', 'arguments': {'device': 'parent0', 'writable': true } }
115
116
Note:
117
a. The qmp command nbd-server-start and nbd-server-add must be run
118
before running the qmp command migrate on primary QEMU
119
b. Active disk, hidden disk and nbd target's length should be the
120
same.
121
- c. It is better to put active disk and hidden disk in ramdisk.
122
+ c. It is better to put active disk and hidden disk in ramdisk. They
123
+ will be merged into the parent disk on failover.
124
125
-3. On Primary VM's QEMU monitor, issue command:
126
+4. On Primary VM's QEMU monitor, issue command:
127
{'execute':'qmp_capabilities'}
128
-{ 'execute': 'human-monitor-command',
129
- 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=xx.xx.xx.xx,file.port=8889,file.export=secondary-disk0,node-name=nbd_client0'}}
130
-{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'primary-disk0', 'node': 'nbd_client0' } }
131
-{ 'execute': 'migrate-set-capabilities',
132
- 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
133
-{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:xx.xx.xx.xx:8888' } }
134
+{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}}
135
+{'execute': 'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 'replication0' } }
136
+{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
137
+{'execute': 'migrate', 'arguments': {'uri': 'tcp:127.0.0.2:9998' } }
138
139
Note:
140
a. There should be only one NBD Client for each primary disk.
141
- b. xx.xx.xx.xx is the secondary physical machine's hostname or IP
142
- c. The qmp command line must be run after running qmp command line in
143
+ b. The qmp command line must be run after running qmp command line in
144
secondary qemu.
145
146
-4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
147
+5. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced.
148
You can issue command '{ "execute": "migrate-set-parameters" , "arguments":{ "x-checkpoint-delay": 2000 } }'
149
-to change the checkpoint period time
150
+to change the idle checkpoint period time
151
+
152
+6. Failover test
153
+You can kill one of the VMs and Failover on the surviving VM:
154
+
155
+If you killed the Secondary, then follow "Primary Failover". After that,
156
+if you want to resume the replication, follow "Primary resume replication"
157
+
158
+If you killed the Primary, then follow "Secondary Failover". After that,
159
+if you want to resume the replication, follow "Secondary resume replication"
160
+
161
+== Primary Failover ==
162
+The Secondary died, resume on the Primary
163
+
164
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'child': 'children.1'} }
165
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_del replication0' } }
166
+{'execute': 'object-del', 'arguments':{ 'id': 'comp0' } }
167
+{'execute': 'object-del', 'arguments':{ 'id': 'iothread1' } }
168
+{'execute': 'object-del', 'arguments':{ 'id': 'm0' } }
169
+{'execute': 'object-del', 'arguments':{ 'id': 'redire0' } }
170
+{'execute': 'object-del', 'arguments':{ 'id': 'redire1' } }
171
+{'execute': 'x-colo-lost-heartbeat' }
172
+
173
+== Secondary Failover ==
174
+The Primary died, resume on the Secondary and prepare to become the new Primary
175
+
176
+{'execute': 'nbd-server-stop'}
177
+{'execute': 'x-colo-lost-heartbeat'}
178
+
179
+{'execute': 'object-del', 'arguments':{ 'id': 'f2' } }
180
+{'execute': 'object-del', 'arguments':{ 'id': 'f1' } }
181
+{'execute': 'chardev-remove', 'arguments':{ 'id': 'red1' } }
182
+{'execute': 'chardev-remove', 'arguments':{ 'id': 'red0' } }
183
+
184
+{'execute': 'chardev-add', 'arguments':{ 'id': 'mirror0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9003' } }, 'server': true } } } }
185
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare1', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '0.0.0.0', 'port': '9004' } }, 'server': true } } } }
186
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': true } } } }
187
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare0-0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9001' } }, 'server': false } } } }
188
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': true } } } }
189
+{'execute': 'chardev-add', 'arguments':{ 'id': 'compare_out0', 'backend': {'type': 'socket', 'data': {'addr': { 'type': 'inet', 'data': { 'host': '127.0.0.1', 'port': '9005' } }, 'server': false } } } }
190
+
191
+== Primary resume replication ==
192
+Resume replication after new Secondary is up.
193
+
194
+Start the new Secondary (Steps 2 and 3 above), then on the Primary:
195
+{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.2:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} }
196
+
197
+Wait until disk is synced, then:
198
+{'execute': 'stop'}
199
+{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync'} }
200
+
201
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.2,file.port=9999,file.export=parent0,node-name=replication0'}}
202
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } }
203
+
204
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } }
205
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } }
206
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } }
207
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } }
208
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } }
209
+
210
+{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
211
+{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.2:9998' } }
212
+
213
+Note:
214
+If this Primary previously was a Secondary, then we need to insert the
215
+filters before the filter-rewriter by using the
216
+"'insert': 'before', 'position': 'id=rew0'" Options. See below.
217
+
218
+== Secondary resume replication ==
219
+Become Primary and resume replication after new Secondary is up. Note
220
+that now 127.0.0.1 is the Secondary and 127.0.0.2 is the Primary.
221
+
222
+Start the new Secondary (Steps 2 and 3 above, but with primary_ip=127.0.0.2),
223
+then on the old Secondary:
224
+{'execute': 'drive-mirror', 'arguments':{ 'device': 'colo-disk0', 'job-id': 'resync', 'target': 'nbd://127.0.0.1:9999/parent0', 'mode': 'existing', 'format': 'raw', 'sync': 'full'} }
225
+
226
+Wait until disk is synced, then:
227
+{'execute': 'stop'}
228
+{'execute': 'block-job-cancel', 'arguments':{ 'device': 'resync' } }
229
230
-5. Failover test
231
-You can kill Primary VM and run 'x_colo_lost_heartbeat' in Secondary VM's
232
-monitor at the same time, then SVM will failover and client will not detect this
233
-change.
234
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=127.0.0.1,file.port=9999,file.export=parent0,node-name=replication0'}}
235
+{'execute': 'x-blockdev-change', 'arguments':{ 'parent': 'colo-disk0', 'node': 'replication0' } }
236
237
-Before issuing '{ "execute": "x-colo-lost-heartbeat" }' command, we have to
238
-issue block related command to stop block replication.
239
-Primary:
240
- Remove the nbd child from the quorum:
241
- { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk0', 'child': 'children.1'}}
242
- { 'execute': 'human-monitor-command','arguments': {'command-line': 'drive_del blk-buddy0'}}
243
- Note: there is no qmp command to remove the blockdev now
244
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-mirror', 'id': 'm0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'tx', 'outdev': 'mirror0' } } }
245
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire0', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'indev': 'compare_out' } } }
246
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'filter-redirector', 'id': 'redire1', 'props': { 'insert': 'before', 'position': 'id=rew0', 'netdev': 'hn0', 'queue': 'rx', 'outdev': 'compare0' } } }
247
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'iothread', 'id': 'iothread1' } }
248
+{'execute': 'object-add', 'arguments':{ 'qom-type': 'colo-compare', 'id': 'comp0', 'props': { 'primary_in': 'compare0-0', 'secondary_in': 'compare1', 'outdev': 'compare_out0', 'iothread': 'iothread1' } } }
249
250
-Secondary:
251
- The primary host is down, so we should do the following thing:
252
- { 'execute': 'nbd-server-stop' }
253
+{'execute': 'migrate-set-capabilities', 'arguments':{ 'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
254
+{'execute': 'migrate', 'arguments':{ 'uri': 'tcp:127.0.0.1:9998' } }
255
256
== TODO ==
257
-1. Support continuous VM replication.
258
-2. Support shared storage.
259
-3. Develop the heartbeat part.
260
-4. Reduce checkpoint VM’s downtime while doing checkpoint.
261
+1. Support shared storage.
262
+2. Develop the heartbeat part.
263
+3. Reduce checkpoint VM’s downtime while doing checkpoint.
264
diff --git a/docs/block-replication.txt b/docs/block-replication.txt
265
index XXXXXXX..XXXXXXX 100644
266
--- a/docs/block-replication.txt
267
+++ b/docs/block-replication.txt
268
@@ -XXX,XX +XXX,XX @@ blocks that are already in QEMU.
269
^ || .----------
270
| || | Secondary
271
1 Quorum || '----------
272
- / \ ||
273
- / \ ||
274
- Primary 2 filter
275
- disk ^ virtio-blk
276
- | ^
277
- 3 NBD -------> 3 NBD |
278
+ / \ || virtio-blk
279
+ / \ || ^
280
+ Primary 2 filter |
281
+ disk ^ 7 Quorum
282
+ | /
283
+ 3 NBD -------> 3 NBD /
284
client || server 2 filter
285
|| ^ ^
286
--------. || | |
287
@@ -XXX,XX +XXX,XX @@ any state that would otherwise be lost by the speculative write-through
288
of the NBD server into the secondary disk. So before block replication,
289
the primary disk and secondary disk should contain the same data.
290
291
+7) The secondary also has a quorum node, so after secondary failover it
292
+can become the new primary and continue replication.
293
+
294
+
295
== Failure Handling ==
296
There are 7 internal errors when block replication is running:
297
1. I/O error on primary disk
298
@@ -XXX,XX +XXX,XX @@ Primary:
299
leading whitespace.
300
5. The qmp command line must be run after running qmp command line in
301
secondary qemu.
302
- 6. After failover we need remove children.1 (replication driver).
303
+ 6. After primary failover we need remove children.1 (replication driver).
304
305
Secondary:
306
-drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
307
- -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
308
+ -drive if=none,id=childs1,driver=replication,mode=secondary,top-id=childs1
309
file.file.filename=active_disk.qcow2,\
310
file.driver=qcow2,\
311
file.backing.file.filename=hidden_disk.qcow2,\
312
file.backing.driver=qcow2,\
313
file.backing.backing=colo1
314
+ -drive if=xxx,driver=quorum,read-pattern=fifo,id=top-disk1,\
315
+ vote-threshold=1,children.0=childs1
316
317
Then run qmp command in secondary qemu:
318
{ 'execute': 'nbd-server-start',
319
@@ -XXX,XX +XXX,XX @@ Secondary:
320
The primary host is down, so we should do the following thing:
321
{ 'execute': 'nbd-server-stop' }
322
323
+Promote Secondary to Primary:
324
+ see COLO-FT.txt
325
+
326
TODO:
327
-1. Continuous block replication
328
-2. Shared disk
329
+1. Shared disk
330
--
331
2.5.0
332
333
diff view generated by jsdifflib
1
From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
1
From: Stefan Hajnoczi <stefanha@redhat.com>
2
2
3
We add the vnet_hdr_support option for filter-redirector, default is disabled.
3
The L2TPv3 RFC number is 3931:
4
If you use virtio-net-pci net driver or other driver needs vnet_hdr, please enable it.
4
https://tools.ietf.org/html/rfc3931
5
Because colo-compare or other modules needs the vnet_hdr_len to parse
6
packet, we add this new option send the len to others.
7
You can use it for example:
8
-object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr_support
9
5
10
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
6
Reported-by: Henrik Johansson <henrikjohansson@rocketmail.com>
7
Reviewed-by: Stefan Weil <sw@weilnetz.de>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Jason Wang <jasowang@redhat.com>
9
Signed-off-by: Jason Wang <jasowang@redhat.com>
12
---
10
---
13
net/filter-mirror.c | 23 +++++++++++++++++++++++
11
qemu-options.hx | 4 ++--
14
qemu-options.hx | 6 +++---
12
1 file changed, 2 insertions(+), 2 deletions(-)
15
2 files changed, 26 insertions(+), 3 deletions(-)
16
13
17
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/net/filter-mirror.c
20
+++ b/net/filter-mirror.c
21
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
22
s->outdev = g_strdup(value);
23
}
24
25
+static bool filter_redirector_get_vnet_hdr(Object *obj, Error **errp)
26
+{
27
+ MirrorState *s = FILTER_REDIRECTOR(obj);
28
+
29
+ return s->vnet_hdr;
30
+}
31
+
32
+static void filter_redirector_set_vnet_hdr(Object *obj,
33
+ bool value,
34
+ Error **errp)
35
+{
36
+ MirrorState *s = FILTER_REDIRECTOR(obj);
37
+
38
+ s->vnet_hdr = value;
39
+}
40
+
41
static void filter_mirror_init(Object *obj)
42
{
43
MirrorState *s = FILTER_MIRROR(obj);
44
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_init(Object *obj)
45
46
static void filter_redirector_init(Object *obj)
47
{
48
+ MirrorState *s = FILTER_REDIRECTOR(obj);
49
+
50
object_property_add_str(obj, "indev", filter_redirector_get_indev,
51
filter_redirector_set_indev, NULL);
52
object_property_add_str(obj, "outdev", filter_redirector_get_outdev,
53
filter_redirector_set_outdev, NULL);
54
+
55
+ s->vnet_hdr = false;
56
+ object_property_add_bool(obj, "vnet_hdr_support",
57
+ filter_redirector_get_vnet_hdr,
58
+ filter_redirector_set_vnet_hdr, NULL);
59
}
60
61
static void filter_mirror_fini(Object *obj)
62
diff --git a/qemu-options.hx b/qemu-options.hx
14
diff --git a/qemu-options.hx b/qemu-options.hx
63
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
64
--- a/qemu-options.hx
16
--- a/qemu-options.hx
65
+++ b/qemu-options.hx
17
+++ b/qemu-options.hx
66
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
18
@@ -XXX,XX +XXX,XX @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
67
19
" Linux kernel 3.3+ as well as most routers can talk\n"
68
filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
20
" L2TPv3. This transport allows connecting a VM to a VM,\n"
69
21
" VM to a router and even VM to Host. It is a nearly-universal\n"
70
-@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
22
- " standard (RFC3391). Note - this implementation uses static\n"
71
-outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
23
+ " standard (RFC3931). Note - this implementation uses static\n"
72
+@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
24
" pre-configured tunnels (same as the Linux kernel).\n"
73
25
" use 'src=' to specify source address\n"
74
filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev
26
" use 'dst=' to specify destination address\n"
75
-@var{chardevid},and redirect indev's packet to filter.
27
@@ -XXX,XX +XXX,XX @@ Example (send packets from host's 1.2.3.4):
76
+@var{chardevid},and redirect indev's packet to filter.if it has the vnet_hdr_support flag,
28
@end example
77
+filter-redirector will redirect packet with vnet_hdr_len.
29
78
Create a filter-redirector we need to differ outdev id from indev id, id can not
30
@item -netdev l2tpv3,id=@var{id},src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
79
be the same. we can just use indev or outdev, but at least one of indev or outdev
31
-Configure a L2TPv3 pseudowire host network backend. L2TPv3 (RFC3391) is a
80
need to be specified.
32
+Configure a L2TPv3 pseudowire host network backend. L2TPv3 (RFC3931) is a
33
popular protocol to transport Ethernet (and other Layer 2) data frames between
34
two systems. It is present in routers, firewalls and the Linux kernel
35
(from version 3.3 onwards).
81
--
36
--
82
2.7.4
37
2.5.0
83
38
84
39
diff view generated by jsdifflib