From nobody Sun Feb 8 15:46:59 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1619608743; cv=none; d=zohomail.com; s=zohoarc; b=kJIjQZ3JTApVjxqyatooNZciA78ISAdwjDg6UR1WUv9HVaP7Zzc/doFYQqbAB5m1bv7Wec9WeE811sBBLDHksVdbZV3jRdp3szZx1jqu2YAy1Tfrp9WL8o2u1hgilcHn4cjF3QgRzJ3XdYGqoi6/Wk68BCVwHau43e7CAlQPYpU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1619608743; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=vaU6R0aK4DNMKLGcGgqR2HOQ+NBT7Xp/8ZjmE4c9aPw=; b=j6YUn37JgO/PyoPHd+ffuPt27K3JxDnXufj+3jl9khf2HDYTDgKHppqjTGDVTdcz2Iqo+WyIXEYm8K3gTTX4gIvO2hu7ad0R3BGHUmlSFrM2NMcM+9PcnlgFw3tiuTgFczVgMb4BzAVtDX4OdgTT+qhDGHOoKDgmAZ1yhKlMQqg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1619608743886950.1109964676604; Wed, 28 Apr 2021 04:19:03 -0700 (PDT) Received: from localhost ([::1]:47626 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lbiDe-0008TJ-Q4 for importer@patchew.org; Wed, 28 Apr 2021 07:19:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36414) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lbhxX-0006lY-TE for qemu-devel@nongnu.org; Wed, 28 Apr 2021 07:02:23 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54683) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lbhxS-0005Ha-ML for qemu-devel@nongnu.org; Wed, 28 Apr 2021 07:02:23 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-191-EDHqzSb_MDyOTIVTzd-ePQ-1; Wed, 28 Apr 2021 07:02:14 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0A060107ACFC; Wed, 28 Apr 2021 11:02:13 +0000 (UTC) Received: from dgilbert-t580.localhost (ovpn-115-35.ams2.redhat.com [10.36.115.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 36C7210016FC; Wed, 28 Apr 2021 11:02:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619607738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vaU6R0aK4DNMKLGcGgqR2HOQ+NBT7Xp/8ZjmE4c9aPw=; b=PlpFwbjYTHbWva/McmeAjIenvuTMULu413rM6bnAwK9Agx674h0mznAuMMDKbvPirkm5XP DewkXXyspvB44bDxi27WQzjha5BPWrmz1TVMgeKaoZ58Yn/7eDC5ladtcTDdU4n1bUqnzd YLHlUhI3iMwg/cyhhTp7ubzoZHFAHQQ= X-MC-Unique: EDHqzSb_MDyOTIVTzd-ePQ-1 From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, vgoyal@redhat.com, stefanha@redhat.com, groug@kaod.org Subject: [PATCH v3 20/26] DAX/unmap virtiofsd: Parse unmappable elements Date: Wed, 28 Apr 2021 12:00:54 +0100 Message-Id: <20210428110100.27757-21-dgilbert@redhat.com> In-Reply-To: <20210428110100.27757-1-dgilbert@redhat.com> References: <20210428110100.27757-1-dgilbert@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dgilbert@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=216.205.24.124; envelope-from=dgilbert@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.22, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: virtio-fs@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" For some read/writes the virtio queue elements are unmappable by the daemon; these are cases where the data is to be read/written from non-RAM. In viritofs's case this is typically a direct read/write into an mmap'd DAX file also on virtiofs (possibly on another instance). When we receive a virtio queue element, check that we have enough mappable data to handle the headers. Make a note of the number of unmappable 'in' entries (ie. for read data back to the VMM), and flag the fuse_bufvec for 'out' entries with a new flag FUSE_BUF_PHYS_ADDR. Signed-off-by: Dr. David Alan Gilbert with fix by: Signed-off-by: Liu Bo Reviewed-by: Stefan Hajnoczi --- tools/virtiofsd/buffer.c | 4 +- tools/virtiofsd/fuse_common.h | 7 ++ tools/virtiofsd/fuse_virtio.c | 230 ++++++++++++++++++++++++---------- 3 files changed, 173 insertions(+), 68 deletions(-) diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c index 874f01c488..1a050aa441 100644 --- a/tools/virtiofsd/buffer.c +++ b/tools/virtiofsd/buffer.c @@ -77,6 +77,7 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst,= size_t dst_off, ssize_t res =3D 0; size_t copied =3D 0; =20 + assert(!(src->flags & FUSE_BUF_PHYS_ADDR)); while (len) { if (dst->flags & FUSE_BUF_FD_SEEK) { res =3D pwrite(dst->fd, (char *)src->mem + src_off, len, @@ -272,7 +273,8 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct = fuse_bufvec *srcv) * process */ for (i =3D 0; i < srcv->count; i++) { - if (srcv->buf[i].flags & FUSE_BUF_IS_FD) { + if ((srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR) || + (srcv->buf[i].flags & FUSE_BUF_IS_FD)) { break; } } diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h index fa9671872e..af43cf19f9 100644 --- a/tools/virtiofsd/fuse_common.h +++ b/tools/virtiofsd/fuse_common.h @@ -626,6 +626,13 @@ enum fuse_buf_flags { * detected. */ FUSE_BUF_FD_RETRY =3D (1 << 3), + + /** + * The addresses in the iovec represent guest physical addresses + * that can't be mapped by the daemon process. + * IO must be bounced back to the VMM to do it. + */ + FUSE_BUF_PHYS_ADDR =3D (1 << 4), }; =20 /** diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c index 91317bade8..f8fd158bb2 100644 --- a/tools/virtiofsd/fuse_virtio.c +++ b/tools/virtiofsd/fuse_virtio.c @@ -49,6 +49,10 @@ typedef struct { VuVirtqElement elem; struct fuse_chan ch; =20 + /* Number of unmappable iovecs */ + unsigned bad_in_num; + unsigned bad_out_num; + /* Used to complete requests that involve no reply */ bool reply_sent; } FVRequest; @@ -353,8 +357,10 @@ int virtio_send_data_iov(struct fuse_session *se, stru= ct fuse_chan *ch, =20 /* The 'in' part of the elem is to qemu */ unsigned int in_num =3D elem->in_num; + unsigned int bad_in_num =3D req->bad_in_num; struct iovec *in_sg =3D elem->in_sg; size_t in_len =3D iov_size(in_sg, in_num); + size_t in_len_writeable =3D iov_size(in_sg, in_num - bad_in_num); fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n= ", __func__, elem->index, in_num, in_len); =20 @@ -362,7 +368,7 @@ int virtio_send_data_iov(struct fuse_session *se, struc= t fuse_chan *ch, * The elem should have room for a 'fuse_out_header' (out from fuse) * plus the data based on the len in the header. */ - if (in_len < sizeof(struct fuse_out_header)) { + if (in_len_writeable < sizeof(struct fuse_out_header)) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n", __func__, elem->index); ret =3D E2BIG; @@ -389,7 +395,7 @@ int virtio_send_data_iov(struct fuse_session *se, struc= t fuse_chan *ch, memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num); /* These get updated as we skip */ struct iovec *in_sg_ptr =3D in_sg_cpy; - int in_sg_cpy_count =3D in_num; + int in_sg_cpy_count =3D in_num - bad_in_num; =20 /* skip over parts of in_sg that contained the header iov */ size_t skip_size =3D iov_len; @@ -523,17 +529,21 @@ static void fv_queue_worker(gpointer data, gpointer u= ser_data) =20 /* The 'out' part of the elem is from qemu */ unsigned int out_num =3D elem->out_num; + unsigned int out_num_readable =3D out_num - req->bad_out_num; struct iovec *out_sg =3D elem->out_sg; size_t out_len =3D iov_size(out_sg, out_num); + size_t out_len_readable =3D iov_size(out_sg, out_num_readable); fuse_log(FUSE_LOG_DEBUG, - "%s: elem %d: with %d out desc of length %zd\n", - __func__, elem->index, out_num, out_len); + "%s: elem %d: with %d out desc of length %zd" + " bad_in_num=3D%u bad_out_num=3D%u\n", + __func__, elem->index, out_num, out_len, req->bad_in_num, + req->bad_out_num); =20 /* * The elem should contain a 'fuse_in_header' (in to fuse) * plus the data based on the len in the header. */ - if (out_len < sizeof(struct fuse_in_header)) { + if (out_len_readable < sizeof(struct fuse_in_header)) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n", __func__, elem->index); assert(0); /* TODO */ @@ -544,80 +554,163 @@ static void fv_queue_worker(gpointer data, gpointer = user_data) assert(0); /* TODO */ } /* Copy just the fuse_in_header and look at it */ - copy_from_iov(&fbuf, out_num, out_sg, + copy_from_iov(&fbuf, out_num_readable, out_sg, sizeof(struct fuse_in_header)); memcpy(&inh, fbuf.mem, sizeof(struct fuse_in_header)); =20 pbufv =3D NULL; /* Compiler thinks an unitialised path */ - if (inh.opcode =3D=3D FUSE_WRITE && - out_len >=3D (sizeof(struct fuse_in_header) + - sizeof(struct fuse_write_in))) { - /* - * For a write we don't actually need to copy the - * data, we can just do it straight out of guest memory - * but we must still copy the headers in case the guest - * was nasty and changed them while we were using them. - */ - fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__); - - fbuf.size =3D copy_from_iov(&fbuf, out_num, out_sg, - sizeof(struct fuse_in_header) + - sizeof(struct fuse_write_in)); - /* That copy reread the in_header, make sure we use the original */ - memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header)); - - /* Allocate the bufv, with space for the rest of the iov */ - pbufv =3D malloc(sizeof(struct fuse_bufvec) + - sizeof(struct fuse_buf) * out_num); - if (!pbufv) { - fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", - __func__); - goto out; - } + if (req->bad_in_num || req->bad_out_num) { + bool handled_unmappable =3D false; + + if (!req->bad_in_num && + inh.opcode =3D=3D FUSE_WRITE && + out_len_readable >=3D (sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in))) { + handled_unmappable =3D true; + + /* copy the fuse_write_in header after fuse_in_header */ + fbuf.size =3D copy_from_iov(&fbuf, out_num_readable, out_sg, + sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in)); + /* That copy reread the in_header, make sure we use the origin= al */ + memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header)); + + /* Allocate the bufv, with space for the rest of the iov */ + pbufv =3D malloc(sizeof(struct fuse_bufvec) + + sizeof(struct fuse_buf) * out_num); + if (!pbufv) { + fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", + __func__); + goto out; + } =20 - allocated_bufv =3D true; - pbufv->count =3D 1; - pbufv->buf[0] =3D fbuf; + allocated_bufv =3D true; + pbufv->count =3D 1; + pbufv->buf[0] =3D fbuf; =20 - size_t iovindex, pbufvindex, iov_bytes_skip; - pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ + size_t iovindex, pbufvindex, iov_bytes_skip; + pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ =20 - if (!skip_iov(out_sg, out_num, - sizeof(struct fuse_in_header) + - sizeof(struct fuse_write_in), - &iovindex, &iov_bytes_skip)) { - fuse_log(FUSE_LOG_ERR, "%s: skip failed\n", - __func__); - goto out; - } + if (!skip_iov(out_sg, out_num, + sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in), + &iovindex, &iov_bytes_skip)) { + fuse_log(FUSE_LOG_ERR, "%s: skip failed\n", + __func__); + goto out; + } =20 - for (; iovindex < out_num; iovindex++, pbufvindex++) { - pbufv->count++; - pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ - pbufv->buf[pbufvindex].flags =3D 0; - pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; - pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; - - if (iov_bytes_skip) { - pbufv->buf[pbufvindex].mem +=3D iov_bytes_skip; - pbufv->buf[pbufvindex].size -=3D iov_bytes_skip; - iov_bytes_skip =3D 0; + for (; iovindex < out_num; iovindex++, pbufvindex++) { + pbufv->count++; + pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ + pbufv->buf[pbufvindex].flags =3D + (iovindex < out_num_readable) ? 0 : + FUSE_BUF_PHYS_ADDR; + pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; + pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; + + if (iov_bytes_skip) { + pbufv->buf[pbufvindex].mem +=3D iov_bytes_skip; + pbufv->buf[pbufvindex].size -=3D iov_bytes_skip; + iov_bytes_skip =3D 0; + } } } - } else { - /* Normal (non fast write) path */ =20 - copy_from_iov(&fbuf, out_num, out_sg, se->bufsize); - /* That copy reread the in_header, make sure we use the original */ - memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header)); - fbuf.size =3D out_len; + if (req->bad_in_num && + inh.opcode =3D=3D FUSE_READ && + out_len_readable >=3D + (sizeof(struct fuse_in_header) + sizeof(struct fuse_read_i= n))) { + fuse_log(FUSE_LOG_DEBUG, + "Unmappable read case " + "in_num=3D%d bad_in_num=3D%d\n", + elem->in_num, req->bad_in_num); + handled_unmappable =3D true; + } + + if (!handled_unmappable) { + fuse_log(FUSE_LOG_ERR, + "Unhandled unmappable element: out: %d(b:%d) in: " + "%d(b:%d)", + out_num, req->bad_out_num, elem->in_num, req->bad_in_= num); + fv_panic(dev, "Unhandled unmappable element"); + } + } + + if (!req->bad_out_num) { + if (inh.opcode =3D=3D FUSE_WRITE && + out_len_readable >=3D (sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in))) { + /* + * For a write we don't actually need to copy the + * data, we can just do it straight out of guest memory + * but we must still copy the headers in case the guest + * was nasty and changed them while we were using them. + */ + fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", + __func__); + + fbuf.size =3D copy_from_iov(&fbuf, out_num, out_sg, + sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in)); + /* That copy reread the in_header, make sure we use the origin= al */ + memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header)); + + /* Allocate the bufv, with space for the rest of the iov */ + pbufv =3D malloc(sizeof(struct fuse_bufvec) + + sizeof(struct fuse_buf) * out_num); + if (!pbufv) { + fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", + __func__); + goto out; + } + + allocated_bufv =3D true; + pbufv->count =3D 1; + pbufv->buf[0] =3D fbuf; =20 - /* TODO! Endianness of header */ + size_t iovindex, pbufvindex, iov_bytes_skip; + pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ =20 - /* TODO: Add checks for fuse_session_exited */ - bufv.buf[0] =3D fbuf; - bufv.count =3D 1; - pbufv =3D &bufv; + if (!skip_iov(out_sg, out_num, + sizeof(struct fuse_in_header) + + sizeof(struct fuse_write_in), + &iovindex, &iov_bytes_skip)) { + fuse_log(FUSE_LOG_ERR, "%s: skip failed\n", + __func__); + goto out; + } + + for (; iovindex < out_num; iovindex++, pbufvindex++) { + pbufv->count++; + pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ + pbufv->buf[pbufvindex].flags =3D 0; + pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; + pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; + + if (iov_bytes_skip) { + pbufv->buf[pbufvindex].mem +=3D iov_bytes_skip; + pbufv->buf[pbufvindex].size -=3D iov_bytes_skip; + iov_bytes_skip =3D 0; + } + } + } else { + /* Normal (non fast write) path */ + + /* Copy the rest of the buffer */ + copy_from_iov(&fbuf, out_num, out_sg, se->bufsize); + /* That copy reread the in_header, make sure we use the origin= al */ + memcpy(fbuf.mem, &inh, sizeof(struct fuse_in_header)); + + fbuf.size =3D out_len; + + /* TODO! Endianness of header */ + + /* TODO: Add checks for fuse_session_exited */ + bufv.buf[0] =3D fbuf; + bufv.count =3D 1; + pbufv =3D &bufv; + } } pbufv->idx =3D 0; pbufv->off =3D 0; @@ -732,13 +825,16 @@ static void *fv_queue_thread(void *opaque) __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes); =20 while (1) { + unsigned int bad_in_num =3D 0, bad_out_num =3D 0; FVRequest *req =3D vu_queue_pop(dev, q, sizeof(FVRequest), - NULL, NULL); + &bad_in_num, &bad_out_num); if (!req) { break; } =20 req->reply_sent =3D false; + req->bad_in_num =3D bad_in_num; + req->bad_out_num =3D bad_out_num; =20 if (!se->thread_pool_size) { req_list =3D g_list_prepend(req_list, req); --=20 2.31.1