hw/virtio/vhost-vsock.c | 3 +++ qapi/char.json | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+)
If a program in host communicates with a vsock device in guest via
'vsock://', but the device is not ready, the 'connect' syscall will
block and then timeout after 2 second default.(the timeout is defined
in kernel: #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)).
We can avoid this case if qemu reports an event when the vsock is
ready and the program waits the event before connecting.
Report vsock running event so that the upper application can
control boot sequence.
see https://github.com/kata-containers/runtime/pull/1918
Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
---
v2: fix typo
---
hw/virtio/vhost-vsock.c | 3 +++
qapi/char.json | 22 ++++++++++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 0371493..a5920fd 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -22,6 +22,7 @@
#include "qemu/iov.h"
#include "qemu/module.h"
#include "monitor/monitor.h"
+#include "qapi/qapi-events-char.h"
enum {
VHOST_VSOCK_SAVEVM_VERSION = 0,
@@ -68,6 +69,8 @@ static int vhost_vsock_set_running(VHostVSock *vsock, int start)
if (ret < 0) {
return -errno;
}
+ qapi_event_send_vsock_running(vsock->conf.guest_cid, start != 0);
+
return 0;
}
diff --git a/qapi/char.json b/qapi/char.json
index a6e81ac..4cfbcf2 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -570,3 +570,25 @@
{ 'event': 'VSERPORT_CHANGE',
'data': { 'id': 'str',
'open': 'bool' } }
+
+##
+# @VSOCK_RUNNING:
+#
+# Emitted when the guest changes the vsock status.
+#
+# @cid: guest context ID
+#
+# @running: true if the vsock is running
+#
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "VSOCK_RUNNING",
+# "data": { "cid": "123456", "running": true },
+# "timestamp": { "seconds": 1401385907, "microseconds": 422329 } }
+#
+##
+{ 'event': 'VSOCK_RUNNING',
+ 'data': { 'cid': 'uint64',
+ 'running': 'bool' } }
--
2.9.5
On Mon, Aug 05, 2019 at 11:32:31AM +0800, Ning Bo wrote: > If a program in host communicates with a vsock device in guest via > 'vsock://', but the device is not ready, the 'connect' syscall will > block and then timeout after 2 second default.(the timeout is defined > in kernel: #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)). > We can avoid this case if qemu reports an event when the vsock is > ready and the program waits the event before connecting. > > Report vsock running event so that the upper application can > control boot sequence. > see https://github.com/kata-containers/runtime/pull/1918 Please describe the issue with connect(2) in detail. Are you observing that connect(2) always times out when called before the guest driver hasn't set the virtio-vsock status register to VIRTIO_CONFIG_S_DRIVER_OK? I think that adding a QMP event is working around the issue rather than fixing the root cause. This is probably a vhost_vsock.ko problem and should be fixed there. Stefan
Let me describe the issue with an example via `nc-vsock`: Let's assume the Guest cid is 3. execute 'rmmod vmw_vsock_virtio_transport' in Guest, then execute 'while true; do nc-vsock 3 1234' in Host. Host Guest # rmmod vmw_vsock_virtio_transport # while true; do ./nc-vsock 3 1234; done (after 2 second) connect: Connection timed out (after 2 second) connect: Connection timed out ... # modprobe vmw_vsock_virtio_transport connect: Connection reset by peer connect: Connection reset by peer connect: Connection reset by peer ... # nc-vsock -l 1234 Connetion from cid 2 port ***... (stop printing) The above process simulates the communication process between the `kata-runtime` and `kata-agent` after starting the Guest. In order to connect to `kata-agent` as soon as possible, `kata-runtime` will continuously try to connect to `kata-agent` in a loop. see https://github.com/kata-containers/runtime/blob/d054556f60f092335a22a288011fa29539ad4ccc/vendor/github.com/kata-containers/agent/protocols/client/client.go#L327 But when the vsock device in the Guest is not ready, the connection will block for 2 seconds. This situation actually slows down the entire startup time of `kata-runtime`. > I think that adding a QMP event is working around the issue rather than > fixing the root cause. This is probably a vhost_vsock.ko problem and > should be fixed there. After looking at the source code of vhost_vsock.ko, I think it is possible to optimize the logic here too. The simple patch is as follows. Do you think the modification is appropriate? diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 9f57736f..8fad67be 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -51,6 +51,7 @@ struct vhost_vsock { atomic_t queued_replies; u32 guest_cid; + u32 state; }; static u32 vhost_transport_get_local_cid(void) @@ -497,6 +541,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) mutex_unlock(&vq->mutex); } + vsock->state = 1; mutex_unlock(&vsock->dev.mutex); return 0; @@ -535,6 +580,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock) vq->private_data = NULL; mutex_unlock(&vq->mutex); } + vsock->state = 0; err: mutex_unlock(&vsock->dev.mutex); @@ -786,6 +832,27 @@ static struct miscdevice vhost_vsock_misc = { .fops = &vhost_vsock_fops, }; +int vhost_transport_connect(struct vsock_sock *vsk) { + struct vhost_vsock *vsock; + + rcu_read_lock(); + + /* Find the vhost_vsock according to guest context id */ + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + if (!vsock) { + rcu_read_unlock(); + return -ENODEV; + } + + rcu_read_unlock(); + + if (vsock->state == 1) { + return virtio_transport_connect(vsk); + } else { + return -ECONNRESET; + } +} + static struct virtio_transport vhost_transport = { .transport = { .get_local_cid = vhost_transport_get_local_cid, @@ -793,7 +860,7 @@ static struct virtio_transport vhost_transport = { .init = virtio_transport_do_socket_init, .destruct = virtio_transport_destruct, .release = virtio_transport_release, - .connect = virtio_transport_connect, + .connect = vhost_transport_connect, .shutdown = virtio_transport_shutdown, .cancel_pkt = vhost_transport_cancel_pkt,
On Thu, Nov 28, 2019 at 07:26:47PM +0800, ning.bo9@zte.com.cn wrote: > Let me describe the issue with an example via `nc-vsock`: > > Let's assume the Guest cid is 3. > execute 'rmmod vmw_vsock_virtio_transport' in Guest, > then execute 'while true; do nc-vsock 3 1234' in Host. > > Host Guest > # rmmod vmw_vsock_virtio_transport > > # while true; do ./nc-vsock 3 1234; done > (after 2 second) > connect: Connection timed out > (after 2 second) > connect: Connection timed out > ... > > # modprobe vmw_vsock_virtio_transport > > connect: Connection reset by peer > connect: Connection reset by peer > connect: Connection reset by peer > ... > > # nc-vsock -l 1234 > Connetion from cid 2 port ***... > (stop printing) > > > The above process simulates the communication process between > the `kata-runtime` and `kata-agent` after starting the Guest. > In order to connect to `kata-agent` as soon as possible, > `kata-runtime` will continuously try to connect to `kata-agent` in a loop. > see https://github.com/kata-containers/runtime/blob/d054556f60f092335a22a288011fa29539ad4ccc/vendor/github.com/kata-containers/agent/protocols/client/client.go#L327 > But when the vsock device in the Guest is not ready, the connection > will block for 2 seconds. This situation actually slows down > the entire startup time of `kata-runtime`. This can be done efficiently as follows: 1. kata-runtime listens on a vsock port 2. kata-agent-port=PORT is added to the kernel command-line options 3. kata-agent parses the port number and connects to the host This eliminates the reconnection attempts. > > I think that adding a QMP event is working around the issue rather than > > fixing the root cause. This is probably a vhost_vsock.ko problem and > > should be fixed there. > > After looking at the source code of vhost_vsock.ko, > I think it is possible to optimize the logic here too. > The simple patch is as follows. Do you think the modification is appropriate? > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 9f57736f..8fad67be 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -51,6 +51,7 @@ struct vhost_vsock { > atomic_t queued_replies; > > u32 guest_cid; > + u32 state; > }; > > static u32 vhost_transport_get_local_cid(void) > @@ -497,6 +541,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) > > mutex_unlock(&vq->mutex); > } > + vsock->state = 1; > > mutex_unlock(&vsock->dev.mutex); > return 0; > @@ -535,6 +580,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock) > vq->private_data = NULL; > mutex_unlock(&vq->mutex); > } > + vsock->state = 0; > > err: > mutex_unlock(&vsock->dev.mutex); > @@ -786,6 +832,27 @@ static struct miscdevice vhost_vsock_misc = { > .fops = &vhost_vsock_fops, > }; > > +int vhost_transport_connect(struct vsock_sock *vsk) { > + struct vhost_vsock *vsock; > + > + rcu_read_lock(); > + > + /* Find the vhost_vsock according to guest context id */ > + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); > + if (!vsock) { > + rcu_read_unlock(); > + return -ENODEV; > + } > + > + rcu_read_unlock(); > + > + if (vsock->state == 1) { > + return virtio_transport_connect(vsk); > + } else { > + return -ECONNRESET; > + } > +} > + > static struct virtio_transport vhost_transport = { > .transport = { > .get_local_cid = vhost_transport_get_local_cid, > @@ -793,7 +860,7 @@ static struct virtio_transport vhost_transport = { > .init = virtio_transport_do_socket_init, > .destruct = virtio_transport_destruct, > .release = virtio_transport_release, > - .connect = virtio_transport_connect, > + .connect = vhost_transport_connect, > .shutdown = virtio_transport_shutdown, > .cancel_pkt = vhost_transport_cancel_pkt, I'm not keen on adding a special case for vhost_vsock.ko connect. Userspace APIs to avoid the 2 second wait already exist: 1. The SO_VM_SOCKETS_CONNECT_TIMEOUT socket option controls the connect timeout for this socket. 2. Non-blocking connect allows the userspace process to do other things while a connection attempt is being made. But the best solution is the one I mentioned above. Stefan
> This can be done efficiently as follows: > 1. kata-runtime listens on a vsock port > 2. kata-agent-port=PORT is added to the kernel command-line options > 3. kata-agent parses the port number and connects to the host > > This eliminates the reconnection attempts. There will be an additional problem if do this: Who decides which port the `runtime` should listen? Consider the worst case: The ports selected by two `runtime` running in parallel always conflict, and this case is unavoidable, even if we can reduce the possibility of conflicts through algorithms. Because we don't have a daemon that can allocate unique port to `runtime`. > Userspace APIs to avoid the 2 second wait already exist: > > 1. The SO_VM_SOCKETS_CONNECT_TIMEOUT socket option controls the connect > timeout for this socket. Yes, it has the same effect > 2. Non-blocking connect allows the userspace process to do other things > while a connection attempt is being made. I don't think the `tunime` has anything to do except wait for the response from the `agent` at that moment Now let me sort out the currently known methods: 1. `runtime` does not connect until it receives the qmp event reported by qemu when the `agent` opens the vsock device. - The method looks inappropriate now. 2. adding a special case for vhost_vsock.ko. - Also inappropriate. 3. connect to `runtime` from `agent`. - `runtime` may not be able to choose the right port. 4. Use `SO_VM_SOCKETS_CONNECT_TIMEOUT` option. - The effect is similar to method 2, no need to modify the kernel module code. I have an additional question: If useing method 4, when `runtime` calls connect use NONBLOCK option with very short timeout in an infinite loop, the kernel maybe frequently creates timers. Is there any other side effects?-----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl3yHvUACgkQnKSrs4Gr c8jg5ggAqaIQAS2Z81lTIi4bs475raquTl3SUzc+6T8yciP/Xs1Sb7tVdHx3WwFq v1eqefEKrNNpdjUncOKoHRa4uMQZJSlVaJCsEmHUKBGOQPi+hJ8X0Q57/w4hEYQ6 bXrVPlwFK1vBzPPTr1w4qKbKIJyqCYrjhxUxr2KeVr1q8jpvdxnXTILTLWU1JCNS Fh1l69CTM0RjtRiW4mbskNspNCluS5sq3KG0PMCBW+VqPNP9rXL6C3qpIwM1RY9p XTrUNSS4wqRNXl2Ug/Pt52Vwr4YAAezsyC+JOCUZbC3nvzR/C2L4i1p/HLVvOuDI 9nsEqtr1Cj7xBuCKq9KCTfS2jpCsTg== =0QNN -----END PGP SIGNATURE-----
On Fri, Dec 13, 2019 at 03:11:54PM +0800, ning.bo9@zte.com.cn wrote: > > This can be done efficiently as follows: > > 1. kata-runtime listens on a vsock port > > 2. kata-agent-port=PORT is added to the kernel command-line options > > 3. kata-agent parses the port number and connects to the host > > > > This eliminates the reconnection attempts. > > There will be an additional problem if do this: > Who decides which port the `runtime` should listen? Let the host kernel automatically assign a port using VMADDR_PORT_ANY. It works like this: struct sockaddr_vm svm = { .svm_family = AF_VSOCK, .svm_port = VMADDR_PORT_ANY, .svm_cid = VMADDR_CID_ANY, }; int fd = socket(AF_VSOCK, SOCK_STREAM, 0); ... if (bind(fd, (const struct sockaddr *)&svm, sizeof(svm)) < 0) { ... } socklen_t socklen = sizeof(svm); if (getsockname(fd, (struct sockaddr *)&svm, &socklen) < 0) { ... } printf("cid %u port %u\n", svm.svm_cid, svm.svm_port); > Consider the worst case: > The ports selected by two `runtime` running in parallel always conflict, > and this case is unavoidable, even if we can reduce the possibility of conflicts through algorithms. > Because we don't have a daemon that can allocate unique port to `runtime`. The kernel assigns unique ports and only fails if the entire port namespace is exhausted. The port namespace is 32-bits so this is not a real-world concern. Does this information clarify how the runtime can connect to the guest agent without loops or delays? Stefan
> > There will be an additional problem if do this: > > Who decides which port the `runtime` should listen? > > Let the host kernel automatically assign a port using VMADDR_PORT_ANY. > It works like this: > > struct sockaddr_vm svm = { > .svm_family = AF_VSOCK, > .svm_port = VMADDR_PORT_ANY, > .svm_cid = VMADDR_CID_ANY, > }; > > int fd = socket(AF_VSOCK, SOCK_STREAM, 0); > ... > if (bind(fd, (const struct sockaddr )&svm, sizeof(svm)) < 0) { > ... > } > > socklen_t socklen = sizeof(svm); > if (getsockname(fd, (struct sockaddr *)&svm, &socklen) < 0) { > ... > } > > printf("cid %u port %u\n", svm.svm_cid, svm.svm_port); > > > Consider the worst case: > > The ports selected by two `runtime` running in parallel always conflict, > > and this case is unavoidable, even if we can reduce the possibility of > > conflicts through algorithms. > > Because we don't have a daemon that can allocate unique port to `runtime`. > > The kernel assigns unique ports and only fails if the entire port > namespace is exhausted. The port namespace is 32-bits so this is not a > real-world concern. > > Does this information clarify how the runtime can connect to the guest > agent without loops or delays? Thank you very much. I will do as you instructed above
On Thu, Dec 12, 2019 at 11:05:25AM +0000, Stefan Hajnoczi wrote: > On Thu, Nov 28, 2019 at 07:26:47PM +0800, ning.bo9@zte.com.cn wrote: > > Let me describe the issue with an example via `nc-vsock`: > > > > Let's assume the Guest cid is 3. > > execute 'rmmod vmw_vsock_virtio_transport' in Guest, > > then execute 'while true; do nc-vsock 3 1234' in Host. > > > > Host Guest > > # rmmod vmw_vsock_virtio_transport > > > > # while true; do ./nc-vsock 3 1234; done > > (after 2 second) > > connect: Connection timed out > > (after 2 second) > > connect: Connection timed out > > ... > > > > # modprobe vmw_vsock_virtio_transport > > > > connect: Connection reset by peer > > connect: Connection reset by peer > > connect: Connection reset by peer > > ... > > > > # nc-vsock -l 1234 > > Connetion from cid 2 port ***... > > (stop printing) > > > > > > The above process simulates the communication process between > > the `kata-runtime` and `kata-agent` after starting the Guest. > > In order to connect to `kata-agent` as soon as possible, > > `kata-runtime` will continuously try to connect to `kata-agent` in a loop. > > see https://github.com/kata-containers/runtime/blob/d054556f60f092335a22a288011fa29539ad4ccc/vendor/github.com/kata-containers/agent/protocols/client/client.go#L327 > > But when the vsock device in the Guest is not ready, the connection > > will block for 2 seconds. This situation actually slows down > > the entire startup time of `kata-runtime`. > > This can be done efficiently as follows: > 1. kata-runtime listens on a vsock port > 2. kata-agent-port=PORT is added to the kernel command-line options > 3. kata-agent parses the port number and connects to the host > > This eliminates the reconnection attempts. Then we'll get the same problem in reverse, won't we? Agent must now be running before guest can boot ... Or did I miss anything? > > > I think that adding a QMP event is working around the issue rather than > > > fixing the root cause. This is probably a vhost_vsock.ko problem and > > > should be fixed there. > > > > After looking at the source code of vhost_vsock.ko, > > I think it is possible to optimize the logic here too. > > The simple patch is as follows. Do you think the modification is appropriate? > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > > index 9f57736f..8fad67be 100644 > > --- a/drivers/vhost/vsock.c > > +++ b/drivers/vhost/vsock.c > > @@ -51,6 +51,7 @@ struct vhost_vsock { > > atomic_t queued_replies; > > > > u32 guest_cid; > > + u32 state; > > }; > > > > static u32 vhost_transport_get_local_cid(void) > > @@ -497,6 +541,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) > > > > mutex_unlock(&vq->mutex); > > } > > + vsock->state = 1; > > > > mutex_unlock(&vsock->dev.mutex); > > return 0; > > @@ -535,6 +580,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock) > > vq->private_data = NULL; > > mutex_unlock(&vq->mutex); > > } > > + vsock->state = 0; > > > > err: > > mutex_unlock(&vsock->dev.mutex); > > @@ -786,6 +832,27 @@ static struct miscdevice vhost_vsock_misc = { > > .fops = &vhost_vsock_fops, > > }; > > > > +int vhost_transport_connect(struct vsock_sock *vsk) { > > + struct vhost_vsock *vsock; > > + > > + rcu_read_lock(); > > + > > + /* Find the vhost_vsock according to guest context id */ > > + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); > > + if (!vsock) { > > + rcu_read_unlock(); > > + return -ENODEV; > > + } > > + > > + rcu_read_unlock(); > > + > > + if (vsock->state == 1) { > > + return virtio_transport_connect(vsk); > > + } else { > > + return -ECONNRESET; > > + } > > +} > > + > > static struct virtio_transport vhost_transport = { > > .transport = { > > .get_local_cid = vhost_transport_get_local_cid, > > @@ -793,7 +860,7 @@ static struct virtio_transport vhost_transport = { > > .init = virtio_transport_do_socket_init, > > .destruct = virtio_transport_destruct, > > .release = virtio_transport_release, > > - .connect = virtio_transport_connect, > > + .connect = vhost_transport_connect, > > .shutdown = virtio_transport_shutdown, > > .cancel_pkt = vhost_transport_cancel_pkt, > > I'm not keen on adding a special case for vhost_vsock.ko connect. > > Userspace APIs to avoid the 2 second wait already exist: > > 1. The SO_VM_SOCKETS_CONNECT_TIMEOUT socket option controls the connect > timeout for this socket. > > 2. Non-blocking connect allows the userspace process to do other things > while a connection attempt is being made. > > But the best solution is the one I mentioned above. > > Stefan
On Thu, Dec 12, 2019 at 06:24:55AM -0500, Michael S. Tsirkin wrote: > On Thu, Dec 12, 2019 at 11:05:25AM +0000, Stefan Hajnoczi wrote: > > On Thu, Nov 28, 2019 at 07:26:47PM +0800, ning.bo9@zte.com.cn wrote: > > > Let me describe the issue with an example via `nc-vsock`: > > > > > > Let's assume the Guest cid is 3. > > > execute 'rmmod vmw_vsock_virtio_transport' in Guest, > > > then execute 'while true; do nc-vsock 3 1234' in Host. > > > > > > Host Guest > > > # rmmod vmw_vsock_virtio_transport > > > > > > # while true; do ./nc-vsock 3 1234; done > > > (after 2 second) > > > connect: Connection timed out > > > (after 2 second) > > > connect: Connection timed out > > > ... > > > > > > # modprobe vmw_vsock_virtio_transport > > > > > > connect: Connection reset by peer > > > connect: Connection reset by peer > > > connect: Connection reset by peer > > > ... > > > > > > # nc-vsock -l 1234 > > > Connetion from cid 2 port ***... > > > (stop printing) > > > > > > > > > The above process simulates the communication process between > > > the `kata-runtime` and `kata-agent` after starting the Guest. > > > In order to connect to `kata-agent` as soon as possible, > > > `kata-runtime` will continuously try to connect to `kata-agent` in a loop. > > > see https://github.com/kata-containers/runtime/blob/d054556f60f092335a22a288011fa29539ad4ccc/vendor/github.com/kata-containers/agent/protocols/client/client.go#L327 > > > But when the vsock device in the Guest is not ready, the connection > > > will block for 2 seconds. This situation actually slows down > > > the entire startup time of `kata-runtime`. > > > > This can be done efficiently as follows: > > 1. kata-runtime listens on a vsock port > > 2. kata-agent-port=PORT is added to the kernel command-line options > > 3. kata-agent parses the port number and connects to the host > > > > This eliminates the reconnection attempts. > > Then we'll get the same problem in reverse, won't we? > Agent must now be running before guest can boot ... > Or did I miss anything? kata-runtime launches QEMU. The QEMU guest runs kata-agent. Therefore it is guaranteed that kata-runtime's listen socket will be set up before the agent executes. Stefan
© 2016 - 2024 Red Hat, Inc.