From nobody Thu May 2 20:03:44 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1516707274161757.4732723559672; Tue, 23 Jan 2018 03:34:34 -0800 (PST) Received: from localhost ([::1]:53239 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1edwqT-00066Z-4y for importer@patchew.org; Tue, 23 Jan 2018 06:34:29 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52255) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1edwpc-0005lE-RO for qemu-devel@nongnu.org; Tue, 23 Jan 2018 06:33:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1edwpZ-0007PG-Hv for qemu-devel@nongnu.org; Tue, 23 Jan 2018 06:33:36 -0500 Received: from 3.mo68.mail-out.ovh.net ([46.105.58.60]:35154) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1edwpZ-0007Cd-5q for qemu-devel@nongnu.org; Tue, 23 Jan 2018 06:33:33 -0500 Received: from player698.ha.ovh.net (b7.ovh.net [213.186.33.57]) by mo68.mail-out.ovh.net (Postfix) with ESMTP id 6A601B9C0B for ; Tue, 23 Jan 2018 12:33:23 +0100 (CET) Received: from [192.168.0.243] (lns-bzn-46-82-253-208-248.adsl.proxad.net [82.253.208.248]) (Authenticated sender: groug@kaod.org) by player698.ha.ovh.net (Postfix) with ESMTPA id B9C28520087; Tue, 23 Jan 2018 12:33:17 +0100 (CET) From: Greg Kurz To: qemu-devel@nongnu.org Date: Tue, 23 Jan 2018 12:33:17 +0100 Message-ID: <151670719721.7533.6287116389556641300.stgit@bahia> User-Agent: StGit/0.17.1-46-g6855-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Ovh-Tracer-Id: 2126543449895049509 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedtvddrudeggdeftdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 46.105.58.60 Subject: [Qemu-devel] [PATCH] 9pfs: Correctly handle cancelled requests X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Keno Fischer , Stefano Stabellini , Greg Kurz , "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 From: Keno Fischer # Background I was investigating spurious non-deterministic EINTR returns from various 9p file system operations in a Linux guest served from the qemu 9p server. ## EINTR, ERESTARTSYS and the linux kernel When a signal arrives that the Linux kernel needs to deliver to user-space while a given thread is blocked (in the 9p case waiting for a reply to its request in 9p_client_rpc -> wait_event_interruptible), it asks whatever driver is currently running to abort its current operation (in the 9p case causing the submission of a TFLUSH message) and return to user space. In these situations, the error message reported is generally ERESTARTSYS. If the userspace processes specified SA_RESTART, this means that the system call will get restarted upon completion of the signal handler delivery (assuming the signal handler doesn't modify the process state in complicated ways not relevant here). If SA_RESTART is not specified, ERESTARTSYS gets translated to EINTR and user space is expected to handle the restart itself. ## The 9p TFLUSH command The 9p TFLUSH commands requests that the server abort an ongoing operation. The man page [1] specifies: ``` If it recognizes oldtag as the tag of a pending transaction, it should abor= t any pending response and discard that tag. [...] When the client sends a Tflush, it must wait to receive the corresponding R= flush before reusing oldtag for subsequent messages. If a response to the flushed= request is received before the Rflush, the client must honor the response as if it = had not been flushed, since the completed request may signify a state change in the= server ``` In particular, this means that the server must not send a reply with the or= ignal tag in response to the cancellation request, because the client is obligated to interpret such a reply as a coincidental reply to the original request. # The bug When qemu receives a TFlush request, it sets the `cancelled` flag on the re= levant pdu. This flag is periodically checked, e.g. in `v9fs_co_name_to_path`, and= if set, the operation is aborted and the error is set to EINTR. However, the s= erver then violates the spec, by returning to the client an Rerror response, rath= er than discarding the message entirely. As a result, the client is required to assume that said Rerror response is a result of the original request, not a result of the cancellation and thus passes the EINTR error back to user s= pace. This is not the worst thing it could do, however as discussed above, the co= rrect error code would have been ERESTARTSYS, such that user space programs with SA_RESTART set get correctly restarted upon completion of the signal handle= r. Instead, such programs get spurious EINTR results that they were not expect= ing to handle. It should be noted that there are plenty of user space programs that do not set SA_RESTART and do not correctly handle EINTR either. However, that is t= hen a userspace bug. It should also be noted that this bug has been mitigated by a recent commit to the Linux kernel [2], which essentially prevents the ker= nel from sending Tflush requests unless the process is about to die (in which c= ase the process likely doesn't care about the response). Nevertheless, for older kernels and to comply with the spec, I believe this change is beneficial. # Implementation The fix is fairly simple, just skipping notification of a reply if the pdu was previously cancelled. We do however, also notify the transport layer that we're doing this, so it can clean up any resources it may be holding. I also added a new trace event to distinguish operations that caused an error reply from those that were cancelled. One complication is that we only omit sending the message on EINTR errors in order to avoid confusing the rest of the code (which may assume that a client knows about a fid if it sucessfully passed it off to pud_complete without checking for cancellation status). This does mean that if the server acts upon the cancellation flag, it always needs to set err to EINTR. I bel= ieve this is true of the current code. [1] https://9fans.github.io/plan9port/man/man9/flush.html [2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc89= 103754de52 Signed-off-by: Keno Fischer Reviewed-by: Greg Kurz [groug, send a zero-sized reply instead of detaching the buffer] Signed-off-by: Greg Kurz Acked-by: Michael S. Tsirkin Reviewed-by: Stefano Stabellini --- To be effective, a patch is needed for the 9pnet_virtio driver in Linux as well: https://sourceforge.net/p/v9fs/mailman/message/36200555/ Stefano, As you suggested, the right thing to do is indeed to inform the transport layer that the request was consumed, even if we don't send a 9p reply to the client (MST suggested the same for the kernel-side patches I had sent a month ago on the v9fs-developper list). So in the end the following patch is not needed: http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00175.html Zeroing the PDU size before pushing it back does job for virtio, and it seems it will also work for Xen (at least that's my impression after reading the code in QEMU and Linux). But before merging that, I'd appreciate an ack from you. Cheers, -- Greg --- hw/9pfs/9p.c | 18 ++++++++++++++++++ hw/9pfs/trace-events | 1 + 2 files changed, 19 insertions(+) diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c index 909a61139405..73dafffe239f 100644 --- a/hw/9pfs/9p.c +++ b/hw/9pfs/9p.c @@ -630,6 +630,24 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ss= ize_t len) V9fsState *s =3D pdu->s; int ret; =20 + /* + * The 9p spec requires that successfully cancelled pdus receive no re= ply. + * Sending a reply would confuse clients because they would + * assume that any EINTR is the actual result of the operation, + * rather than a consequence of the cancellation. However, if + * the operation completed (succesfully or with an error other + * than caused be cancellation), we do send out that reply, both + * for efficiency and to avoid confusing the rest of the state machine + * that assumes passing a non-error here will mean a successful + * transmission of the reply. + */ + bool discard =3D pdu->cancelled && len =3D=3D -EINTR; + if (discard) { + trace_v9fs_rcancel(pdu->tag, pdu->id); + pdu->size =3D 0; + goto out_notify; + } + if (len < 0) { int err =3D -len; len =3D 7; diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events index 08a4abf22ea4..1aee350c42f1 100644 --- a/hw/9pfs/trace-events +++ b/hw/9pfs/trace-events @@ -1,6 +1,7 @@ # See docs/devel/tracing.txt for syntax documentation. =20 # hw/9pfs/virtio-9p.c +v9fs_rcancel(uint16_t tag, uint8_t id) "tag %d id %d" v9fs_rerror(uint16_t tag, uint8_t id, int err) "tag %d id %d err %d" v9fs_version(uint16_t tag, uint8_t id, int32_t msize, char* version) "tag = %d id %d msize %d version %s" v9fs_version_return(uint16_t tag, uint8_t id, int32_t msize, char* version= ) "tag %d id %d msize %d version %s"