From nobody Mon Feb 9 22:03:53 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1632287407; cv=none; d=zohomail.com; s=zohoarc; b=j+z+HwWTVyaPw8mPKFcTk9OP7pVlEQvKjQFkTS0j2ZWDo5Sl8KWTb7G0KXsHo8TEFTuSkeu43eYFrIH60voWZiazd2mcct0zmrS5y/M7E6waqUnQK9jcRguayL0tEUcGBLuXZnQp7rLtVSgLx1Ex60zBptca1ymGbeGLUPey2Kc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1632287407; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=vBBc7NS+1v1rGmaFetuIDC5Y33W5aFRudDBZJSLPg9Q=; b=hrm3QO6MJVaSDpGfHL1oz8uAeFq+LAyAkBZHNKJvbRQG8joSzYit0Liee5acOSj+AwLcl1BungyHZ4ZFO2/9P1/al14phYvP77UAQ9cZwwiq/post0eBaDC0LVULE1Cp9zxAkDN6A/8IM2EBrwIfjD0z2FnuuQDc2p8QdfuAQa4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1632287407250389.5665488036601; Tue, 21 Sep 2021 22:10:07 -0700 (PDT) Received: from localhost ([::1]:46868 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSuWC-0002Em-Il for importer@patchew.org; Wed, 22 Sep 2021 01:10:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42392) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSuPr-0002EM-EF for qemu-devel@nongnu.org; Wed, 22 Sep 2021 01:03:31 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:38252) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSuPp-000515-8J for qemu-devel@nongnu.org; Wed, 22 Sep 2021 01:03:31 -0400 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-359-psWQw1dCNBKU4LNxaK94CA-1; Wed, 22 Sep 2021 01:03:24 -0400 Received: by mail-qv1-f71.google.com with SMTP id h9-20020a05621413a900b0037a2d3eaf8fso9592991qvz.8 for ; Tue, 21 Sep 2021 22:03:24 -0700 (PDT) Received: from LeoBras.redhat.com ([2804:431:c7f0:e5d7:bbae:108a:d2ca:1c18]) by smtp.gmail.com with ESMTPSA id q192sm926675qka.93.2021.09.21.22.03.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Sep 2021 22:03:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632287008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vBBc7NS+1v1rGmaFetuIDC5Y33W5aFRudDBZJSLPg9Q=; b=jFY13hvumS5KIS2PQWIfq+ep7yghhFAEOVvc+ztcWLchfVIyze8cxROC60Dw7I4fbP3huh DGu3zrN0LpV6XJp5d3UKEY+qoj3QWm0861zMo8vjsz0+azjS9WmvDBngkZNwP046hZxy4t iVKzMjgxKTTlOKaG75S8xAb40dBzbM0= X-MC-Unique: psWQw1dCNBKU4LNxaK94CA-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vBBc7NS+1v1rGmaFetuIDC5Y33W5aFRudDBZJSLPg9Q=; b=hpZgjmR+hl45e91ixur8avXHczVJv+g8eqgcNbV7/CzLenvJ6ftjmKnQLNpm9jUPhR UCnqVV7PA6OWr3pE5mfTEKO7wX/IYZfY+/wsL0XvKjW9F/60HGWSIiKRbpYSYShRTYYT 6bguA16vrrYK6Ikmy/QTY2P9F5BzEeQP/uJB6JrZ05FawnenhP2bEZaMAC6BXiwrwsI/ 5YAQuDq7Zmdmoy4ajnnnte4zPD7GU2FTD1zwAmKF2njaf4B7Elwx0uKtaELsT6sNAqRs Gvnzr/AIICZieBVR1PjDaEv1gbV5OO3XLbfotuzB5ZmJnZxvZE0uPQagK/z9I4jK2YKr 2jrg== X-Gm-Message-State: AOAM531kGTQdsh0mZf6SGjQMTeSo7L/8KWdCg8cSrSIg+acDNTV7d4gG j8Kq3OViAt+f69WZVYnfdkRvw/+BJ0SaoZU2Jo/CQQHDEA7vaLJr2ir7KvkZOf9aUY+0QLyj3KG yZ/J9bfK30RvJklo= X-Received: by 2002:ae9:d613:: with SMTP id r19mr29438682qkk.180.1632287003786; Tue, 21 Sep 2021 22:03:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/NBIvc3Zm3F0GRgW27xuq8dS7+9S5y4udrBjKOHQA1/+3ziErgJvyS9g0QH/6JAb3P51eTw== X-Received: by 2002:ae9:d613:: with SMTP id r19mr29438663qkk.180.1632287003550; Tue, 21 Sep 2021 22:03:23 -0700 (PDT) From: Leonardo Bras To: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Peter Xu , Jason Wang Subject: [PATCH v2 2/3] QIOChannelSocket: Implement io_async_write & io_async_flush Date: Wed, 22 Sep 2021 02:03:39 -0300 Message-Id: <20210922050340.614781-3-leobras@redhat.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210922050340.614781-1-leobras@redhat.com> References: <20210922050340.614781-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=216.205.24.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.475, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1632287407634100001 Content-Type: text/plain; charset="utf-8" Implement the new optional callbacks io_async_write and io_async_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY feature is available in the host kernel, and TCP sockets are used. qio_channel_socket_writev() contents were moved to a helper function __qio_channel_socket_writev() which accepts an extra 'flag' argument. This helper function is used to implement qio_channel_socket_writev(), with flags =3D 0, keeping it's behavior unchanged, and qio_channel_socket_async_writev() with flags =3D MSG_ZEROCOPY. qio_channel_socket_async_flush() was implemented by reading the socket's er= ror queue, which will have information on MSG_ZEROCOPY send completion. There is no need to worry with re-sending packets in case any error happens= , as MSG_ZEROCOPY only works with TCP and it will re-tranmsmit if any error ocur= s. Notes on using async_write(): - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid cop= ying, some caution is necessary to avoid overwriting any buffer before it's sent. If something like this happen, a newer version of the buffer may be sent in= stead. - If this is a problem, it's recommended to use async_flush() before freein= g or re-using the buffer. . Signed-off-by: Leonardo Bras --- include/io/channel-socket.h | 2 + io/channel-socket.c | 145 ++++++++++++++++++++++++++++++++++-- 2 files changed, 140 insertions(+), 7 deletions(-) diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h index e747e63514..4d1be0637a 100644 --- a/include/io/channel-socket.h +++ b/include/io/channel-socket.h @@ -47,6 +47,8 @@ struct QIOChannelSocket { socklen_t localAddrLen; struct sockaddr_storage remoteAddr; socklen_t remoteAddrLen; + ssize_t async_queued; + ssize_t async_sent; }; =20 =20 diff --git a/io/channel-socket.c b/io/channel-socket.c index 606ec97cf7..128fab4cd2 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -26,9 +26,23 @@ #include "io/channel-watch.h" #include "trace.h" #include "qapi/clone-visitor.h" +#ifdef CONFIG_LINUX +#include +#include +#endif =20 #define SOCKET_MAX_FDS 16 =20 +static ssize_t qio_channel_socket_async_writev(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + Error **errp); + +static void qio_channel_socket_async_flush(QIOChannel *ioc, + Error **errp); + SocketAddress * qio_channel_socket_get_local_address(QIOChannelSocket *ioc, Error **errp) @@ -55,6 +69,8 @@ qio_channel_socket_new(void) =20 sioc =3D QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); sioc->fd =3D -1; + sioc->async_queued =3D 0; + sioc->async_sent =3D 0; =20 ioc =3D QIO_CHANNEL(sioc); qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); @@ -140,6 +156,7 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *i= oc, Error **errp) { int fd; + int ret, v =3D 1; =20 trace_qio_channel_socket_connect_sync(ioc, addr); fd =3D socket_connect(addr, errp); @@ -154,6 +171,19 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *= ioc, return -1; } =20 +#ifdef CONFIG_LINUX + if (addr->type !=3D SOCKET_ADDRESS_TYPE_INET) { + return 0; + } + + ret =3D qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); + if (ret >=3D 0) { + QIOChannelClass *klass =3D QIO_CHANNEL_GET_CLASS(ioc); + klass->io_async_writev =3D qio_channel_socket_async_writev; + klass->io_async_flush =3D qio_channel_socket_async_flush; + } +#endif + return 0; } =20 @@ -520,12 +550,13 @@ static ssize_t qio_channel_socket_readv(QIOChannel *i= oc, return ret; } =20 -static ssize_t qio_channel_socket_writev(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, - size_t nfds, - Error **errp) +static ssize_t __qio_channel_socket_writev(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + int flags, + Error **errp) { QIOChannelSocket *sioc =3D QIO_CHANNEL_SOCKET(ioc); ssize_t ret; @@ -558,7 +589,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *io= c, } =20 retry: - ret =3D sendmsg(sioc->fd, &msg, 0); + ret =3D sendmsg(sioc->fd, &msg, flags); if (ret <=3D 0) { if (errno =3D=3D EAGAIN) { return QIO_CHANNEL_ERR_BLOCK; @@ -572,6 +603,106 @@ static ssize_t qio_channel_socket_writev(QIOChannel *= ioc, } return ret; } + +static ssize_t qio_channel_socket_writev(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + Error **errp) +{ + return __qio_channel_socket_writev(ioc, iov, niov, fds, nfds, 0, errp); +} + +static ssize_t qio_channel_socket_async_writev(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + Error **errp) +{ + QIOChannelSocket *sioc =3D QIO_CHANNEL_SOCKET(ioc); + + sioc->async_queued++; + + return __qio_channel_socket_writev(ioc, iov, niov, fds, nfds, MSG_ZERO= COPY, + errp); +} + + +static void qio_channel_socket_async_flush(QIOChannel *ioc, + Error **errp) +{ + QIOChannelSocket *sioc =3D QIO_CHANNEL_SOCKET(ioc); + struct msghdr msg =3D {}; + struct pollfd pfd; + struct sock_extended_err *serr; + struct cmsghdr *cm; + char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)]; + int ret; + + memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)); + msg.msg_control =3D control; + msg.msg_controllen =3D sizeof(control); + + while (sioc->async_sent < sioc->async_queued) { + ret =3D recvmsg(sioc->fd, &msg, MSG_ERRQUEUE); + if (ret < 0) { + if (errno =3D=3D EAGAIN) { + /* Nothing on errqueue, wait */ + pfd.fd =3D sioc->fd; + pfd.events =3D 0; + ret =3D poll(&pfd, 1, 250); + if (ret =3D=3D 0) { + /* + * Timeout : After 250ms without receiving any zerocopy + * notification, consider all data as sent. + */ + break; + } else if (ret < 0 || + (pfd.revents & (POLLERR | POLLHUP | POLLNVAL)))= { + error_setg_errno(errp, errno, + "Poll error"); + break; + } else { + continue; + } + } + if (errno =3D=3D EINTR) { + continue; + } + + error_setg_errno(errp, errno, + "Unable to read errqueue"); + break; + } + + cm =3D CMSG_FIRSTHDR(&msg); + if (cm->cmsg_level !=3D SOL_IP && + cm->cmsg_type !=3D IP_RECVERR) { + error_setg_errno(errp, EPROTOTYPE, + "Wrong cmsg in errqueue"); + break; + } + + serr =3D (void *) CMSG_DATA(cm); + if (serr->ee_errno !=3D SO_EE_ORIGIN_NONE) { + error_setg_errno(errp, serr->ee_errno, + "Error on socket"); + break; + } + if (serr->ee_origin !=3D SO_EE_ORIGIN_ZEROCOPY) { + error_setg_errno(errp, serr->ee_origin, + "Error not from zerocopy"); + break; + } + + /* No errors, count sent ids*/ + sioc->async_sent +=3D serr->ee_data - serr->ee_info + 1; + } +} + + #else /* WIN32 */ static ssize_t qio_channel_socket_readv(QIOChannel *ioc, const struct iovec *iov, --=20 2.33.0