From nobody Thu Apr 3 10:32:03 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1742919138; cv=none; d=zohomail.com; s=zohoarc; b=oK5DjpZsWtCuAAZSe1vVqWH0fTAev/FQok/uPqA9MKuEe3SoC1uCJf8mhcm65yVwYsA9ju3X7z81BLLBSbD8S2Ps74v3aLbH3W+mLvttCuU+N4jHTkg9mPp28u83asenjWxQwYACccjwIlnD0aYj9vbdBS103CLsriH+aG3451U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1742919138; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=vMGs3E6Ei0m6WABpctn8IHBAUugpx7h6jWwNQOkWU5M=; b=bCEQCFbJssIrYmq9QOEdaMjObNuLH/X914VgvQ6vX4lq5wTCyFmrGmrgy9RpU1wXgDYUTRsbpgUNCTKgMDHlJdAVSvhvjQmhjJm4jyLS4ZAPX80NAtao7ig/hSqQx8Ag3jBi0iAZhACATdyh3pSpvbBYTdkIMvyV3AMfJnMgPzo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1742919138619378.55440600372856; Tue, 25 Mar 2025 09:12:18 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tx6qd-0007D3-0B; Tue, 25 Mar 2025 12:09:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tx6ox-0005Tz-4n for qemu-devel@nongnu.org; Tue, 25 Mar 2025 12:08:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tx6ot-0007Z4-HX for qemu-devel@nongnu.org; Tue, 25 Mar 2025 12:08:06 -0400 Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-554-E_QsW6--Pd2H0sbhOk-UMg-1; Tue, 25 Mar 2025 12:06:41 -0400 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43cf327e9a2so46327995e9.3 for ; Tue, 25 Mar 2025 09:06:41 -0700 (PDT) Received: from localhost (p200300cfd74f9db6ee8035b86ef736e5.dip0.t-ipconnect.de. [2003:cf:d74f:9db6:ee80:35b8:6ef7:36e5]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d4fd277d5sm155950915e9.19.2025.03.25.09.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Mar 2025 09:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1742918882; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vMGs3E6Ei0m6WABpctn8IHBAUugpx7h6jWwNQOkWU5M=; b=bM15d98EwDwnq3JswC1SurV+wvlmxiZVK3h92R7tc+kkS4R6OFhyyr4WwcHPY4IHCzoq5z hUGaoOKMXqTGAIbZGAxTsbfbTIqor+bi3maMFR2VAsBUFotKf+Sd/wXUXpHEQBUaoZhuTz ofSqEXR997r1gIQSuTspgW9tP1UgOXM= X-MC-Unique: E_QsW6--Pd2H0sbhOk-UMg-1 X-Mimecast-MFC-AGG-ID: E_QsW6--Pd2H0sbhOk-UMg_1742918800 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742918800; x=1743523600; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vMGs3E6Ei0m6WABpctn8IHBAUugpx7h6jWwNQOkWU5M=; b=uoyg0MV56P5xYgbbuKgFLKi6maqFBV+tXWQIZ/X9ng3kNn1f7G7H9euDiI+lB58N8C zbTyBC44cZ+eDRaP8k/0U2WLCedXzpBWJ3wrZXFPTAU0nJpZDS1aS0w5B76v0k9bQReS ncA8VQDrZ3bs0YR+8yl1jR1agbDzLoza0hRXNUlxZl0QEnTy/+dH9REu6MxGPFpUuSsI OMlIfoXNFOm2w89aRyA7SuPVDNKdUG7pSrnjYlRmqc8dq/Kiwp3ex4+6s4hFsB06A9Q1 SI1/qB9DqYVqiTwZgW+LFiDx0HXu4YK4p4ohd/5ZzOKp5GpN3f2x2sKdhNl6xn8LbQV0 NKKw== X-Gm-Message-State: AOJu0YwOGy04sgv7ajk+WsDAoVb81Kj12gI6bv1BJoA9TDQplnnWk0lx z0cKz/Ptv37sg/YVZ04ZHRNbUtohL/1YbXQGU6VR6aAl4HAb81t4Zo5p5BD4QGvQxaiCOObC1IJ V2dERIwyOn4e935SFWiDhA5dq8yeuzMENe5IWfMCnD3wSvO61HAi/ X-Gm-Gg: ASbGncvlKVvRYTMZ4pp/raAN3e7r45LTXB3mARLeKjIHCWXH0ViKpbMuRYXsy9pXE8g pXtG8Hkbo4k3gs5utUVzmiMdse8Z2uoAxB7O2eW2Iq3emcQXgIXKWWgmbNYpdljAAVJUYnkSldG AeK2mbH8eaIqQ8JAVoa+Z2IssTpzHDPiIq8p/DkidwHwIISwQ8egm9AMzsty3NWrd+SXbs//31a gyhcKsOAFBbcFVza+0ZP/KGHqNiFCZlBR3rPl1YkuqfT4w/uXedJ3MCibH2Fktdzur3uPGtnOmr lD77EqLeddoZILdxOT/+6LuYYu+NqAmNCTdbTuIV9XU3MVaaNJrrES1OlIlIRGgoxZzCO3NVdw= = X-Received: by 2002:a05:600c:b8d:b0:43b:ce08:c382 with SMTP id 5b1f17b1804b1-43d509f6797mr176878155e9.16.1742918798719; Tue, 25 Mar 2025 09:06:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEkCedYmAoZ641KoSS1am6gtapvsT19N1TBNd5OEUdcmp12ej5vJPMbwQWip3GAukepRcpNYQ== X-Received: by 2002:a05:600c:b8d:b0:43b:ce08:c382 with SMTP id 5b1f17b1804b1-43d509f6797mr176876355e9.16.1742918797260; Tue, 25 Mar 2025 09:06:37 -0700 (PDT) From: Hanna Czenczek To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, Hanna Czenczek , Kevin Wolf , qemu-stable@nongnu.org Subject: [PATCH 01/15] fuse: Copy write buffer content before polling Date: Tue, 25 Mar 2025 17:06:35 +0100 Message-ID: <20250325160635.118812-1-hreitz@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250325160529.117543-1-hreitz@redhat.com> References: <20250325160529.117543-1-hreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=hreitz@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1742919140157019000 Content-Type: text/plain; charset="utf-8" Polling in I/O functions can lead to nested read_from_fuse_export() calls, overwriting the request buffer's content. The only function affected by this is fuse_write(), which therefore must use a bounce buffer or corruption may occur. Note that in addition we do not know whether libfuse-internal structures can cope with this nesting, and even if we did, we probably cannot rely on it in the future. This is the main reason why we want to remove libfuse from the I/O path. I do not have a good reproducer for this other than: $ dd if=3D/dev/urandom of=3Dimage bs=3D1M count=3D4096 $ dd if=3D/dev/zero of=3Dcopy bs=3D1M count=3D4096 $ touch fuse-export $ qemu-storage-daemon \ --blockdev file,node-name=3Dfile,filename=3Dcopy \ --export \ fuse,id=3Dexp,node-name=3Dfile,mountpoint=3Dfuse-export,writable=3Dtrue= \ & Other shell: $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export $ killall -SIGINT qemu-storage-daemon $ qemu-img compare image copy Content mismatch at offset 0! (The -t none in qemu-img convert is important.) I tried reproducing this with throttle and small aio_write requests from another qemu-io instance, but for some reason all requests are perfectly serialized then. I think in theory we should get parallel writes only if we set fi->parallel_direct_writes in fuse_open(). In fact, I can confirm that if we do that, that throttle-based reproducer works (i.e. does get parallel (nested) write requests). I have no idea why we still get parallel requests with qemu-img convert anyway. Also, a later patch in this series will set fi->parallel_direct_writes and note that it makes basically no difference when running fio on the current libfuse-based version of our code. It does make a difference without libfuse. So something quite fishy is going on. I will try to investigate further what the root cause is, but I think for now let's assume that calling blk_pwrite() can invalidate the buffer contents through nested polling. Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Czenczek --- block/export/fuse.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index 465cc9891d..a12f479492 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque) goto out; } =20 + /* + * Note that polling in any request-processing function can lead to a = nested + * read_from_fuse_export() call, which will overwrite the contents of + * exp->fuse_buf. Anything that takes a buffer needs to take care tha= t the + * content is copied before potentially polling. + */ fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf); =20 out: @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode= , const char *buf, size_t size, off_t offset, struct fuse_file_info *f= i) { FuseExport *exp =3D fuse_req_userdata(req); + void *copied; int64_t length; int ret; =20 @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inod= e, const char *buf, return; } =20 + /* + * Heed the note on read_from_fuse_export(): If we poll (which any blk= _*() + * I/O function may do), read_from_fuse_export() may be nested, overwr= iting + * the request buffer content. Therefore, we must copy it here. + */ + copied =3D blk_blockalign(exp->common.blk, size); + memcpy(copied, buf, size); + /** * Clients will expect short writes at EOF, so we have to limit * offset+size to the image length. @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode= , const char *buf, length =3D blk_getlength(exp->common.blk); if (length < 0) { fuse_reply_err(req, -length); - return; + goto free_buffer; } =20 if (offset + size > length) { @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t ino= de, const char *buf, ret =3D fuse_do_truncate(exp, offset + size, true, PREALLOC_MO= DE_OFF); if (ret < 0) { fuse_reply_err(req, -ret); - return; + goto free_buffer; } } else { size =3D length - offset; } } =20 - ret =3D blk_pwrite(exp->common.blk, offset, size, buf, 0); + ret =3D blk_pwrite(exp->common.blk, offset, size, copied, 0); if (ret >=3D 0) { fuse_reply_write(req, size); } else { fuse_reply_err(req, -ret); } + +free_buffer: + qemu_vfree(copied); } =20 /** --=20 2.48.1