From nobody Mon Feb 9 09:09:23 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1612899126; cv=none; d=zohomail.com; s=zohoarc; b=KFlPPjrC+bdyfURXGMdzQUjUzb7YMDI+eLlbg3DQ6jZIQ1YjX0eqphmLjxa7fUKTPMWG97q4Z/K/m1cvhynAC9EbaA6C3NJuPxjUw4dCEctMgo4xcSzbav14rGQb1jP3BeXmWQy694pFTHQqlARjA6MDXHsV4CbzPII34msOGgM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1612899126; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=faTCVSeQLqChQ09ek9mWExtkX355uDV6HPGJ7MDvDg8=; b=nQOYNuMsll8yhBL/qWMv4nobforV1lNVrajpKvVXL9s/NyO+1OXZtT2LJRFtbLLwfRkaaHxbHKphFQVoNfWeAOrrBzsPKhquI0lnEBh/gMpDxtVYGQ6rnFOU34VLOyDz/2SUkySuvWdcPSG1JWG6L/MJlP1gr5N6kJHUFnhKQVg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1612899125758446.9960047984781; Tue, 9 Feb 2021 11:32:05 -0800 (PST) Received: from localhost ([::1]:48736 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l9Yk0-0004Xe-Gb for importer@patchew.org; Tue, 09 Feb 2021 14:32:04 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:52462) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l9YLB-00047o-5d for qemu-devel@nongnu.org; Tue, 09 Feb 2021 14:06:25 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:48767) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1l9YKu-0002PQ-1p for qemu-devel@nongnu.org; Tue, 09 Feb 2021 14:06:23 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-31-TqHOhcwoOACvU7jHeP9GOw-1; Tue, 09 Feb 2021 14:06:01 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D4A3F192AB7D for ; Tue, 9 Feb 2021 19:06:00 +0000 (UTC) Received: from dgilbert-t580.localhost (ovpn-112-66.ams2.redhat.com [10.36.112.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A80C60C04; Tue, 9 Feb 2021 19:05:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612897563; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=faTCVSeQLqChQ09ek9mWExtkX355uDV6HPGJ7MDvDg8=; b=bpyofkvJrVXTAWqLkQMbsBJ8SYJrEiuXAuG3kKWP4XcHo9+iM5JsPrc15NTFaYMd13sIsK DwCgVtShYz7MF77IFi3io5ZARJHV7ozFWjyFn6b0UnP8I17m3Bch/nfmIBC3IYnaaL7pFZ /qjnUSs8V27hzYT982CL1c1I++BqrH4= X-MC-Unique: TqHOhcwoOACvU7jHeP9GOw-1 From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, vgoyal@redhat.com, stefanha@redhat.com, virtio-fs@redhat.com, marcandre.lureau@redhat.com, mst@redhat.com Subject: [PATCH 18/24] DAX/unmap virtiofsd: Parse unmappable elements Date: Tue, 9 Feb 2021 19:02:18 +0000 Message-Id: <20210209190224.62827-19-dgilbert@redhat.com> In-Reply-To: <20210209190224.62827-1-dgilbert@redhat.com> References: <20210209190224.62827-1-dgilbert@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dgilbert@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=216.205.24.124; envelope-from=dgilbert@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.57, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" For some read/writes the virtio queue elements are unmappable by the daemon; these are cases where the data is to be read/written from non-RAM. In viritofs's case this is typically a direct read/write into an mmap'd DAX file also on virtiofs (possibly on another instance). When we receive a virtio queue element, check that we have enough mappable data to handle the headers. Make a note of the number of unmappable 'in' entries (ie. for read data back to the VMM), and flag the fuse_bufvec for 'out' entries with a new flag FUSE_BUF_PHYS_ADDR. Signed-off-by: Dr. David Alan Gilbert with fix by: Signed-off-by: Liu Bo --- tools/virtiofsd/buffer.c | 4 +- tools/virtiofsd/fuse_common.h | 7 ++ tools/virtiofsd/fuse_virtio.c | 191 ++++++++++++++++++++++++---------- 3 files changed, 145 insertions(+), 57 deletions(-) diff --git a/tools/virtiofsd/buffer.c b/tools/virtiofsd/buffer.c index 874f01c488..1a050aa441 100644 --- a/tools/virtiofsd/buffer.c +++ b/tools/virtiofsd/buffer.c @@ -77,6 +77,7 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst,= size_t dst_off, ssize_t res =3D 0; size_t copied =3D 0; =20 + assert(!(src->flags & FUSE_BUF_PHYS_ADDR)); while (len) { if (dst->flags & FUSE_BUF_FD_SEEK) { res =3D pwrite(dst->fd, (char *)src->mem + src_off, len, @@ -272,7 +273,8 @@ ssize_t fuse_buf_copy(struct fuse_bufvec *dstv, struct = fuse_bufvec *srcv) * process */ for (i =3D 0; i < srcv->count; i++) { - if (srcv->buf[i].flags & FUSE_BUF_IS_FD) { + if ((srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR) || + (srcv->buf[i].flags & FUSE_BUF_IS_FD)) { break; } } diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h index a090040bb2..ed9280de91 100644 --- a/tools/virtiofsd/fuse_common.h +++ b/tools/virtiofsd/fuse_common.h @@ -611,6 +611,13 @@ enum fuse_buf_flags { * detected. */ FUSE_BUF_FD_RETRY =3D (1 << 3), + + /** + * The addresses in the iovec represent guest physical addresses + * that can't be mapped by the daemon process. + * IO must be bounced back to the VMM to do it. + */ + FUSE_BUF_PHYS_ADDR =3D (1 << 4), }; =20 /** diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c index 8feb3c0261..8fa438525f 100644 --- a/tools/virtiofsd/fuse_virtio.c +++ b/tools/virtiofsd/fuse_virtio.c @@ -49,6 +49,10 @@ typedef struct { VuVirtqElement elem; struct fuse_chan ch; =20 + /* Number of unmappable iovecs */ + unsigned bad_in_num; + unsigned bad_out_num; + /* Used to complete requests that involve no reply */ bool reply_sent; } FVRequest; @@ -291,8 +295,10 @@ int virtio_send_data_iov(struct fuse_session *se, stru= ct fuse_chan *ch, =20 /* The 'in' part of the elem is to qemu */ unsigned int in_num =3D elem->in_num; + unsigned int bad_in_num =3D req->bad_in_num; struct iovec *in_sg =3D elem->in_sg; size_t in_len =3D iov_size(in_sg, in_num); + size_t in_len_writeable =3D iov_size(in_sg, in_num - bad_in_num); fuse_log(FUSE_LOG_DEBUG, "%s: elem %d: with %d in desc of length %zd\n= ", __func__, elem->index, in_num, in_len); =20 @@ -300,7 +306,7 @@ int virtio_send_data_iov(struct fuse_session *se, struc= t fuse_chan *ch, * The elem should have room for a 'fuse_out_header' (out from fuse) * plus the data based on the len in the header. */ - if (in_len < sizeof(struct fuse_out_header)) { + if (in_len_writeable < sizeof(struct fuse_out_header)) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for out_header\n", __func__, elem->index); ret =3D E2BIG; @@ -327,7 +333,7 @@ int virtio_send_data_iov(struct fuse_session *se, struc= t fuse_chan *ch, memcpy(in_sg_cpy, in_sg, sizeof(struct iovec) * in_num); /* These get updated as we skip */ struct iovec *in_sg_ptr =3D in_sg_cpy; - int in_sg_cpy_count =3D in_num; + int in_sg_cpy_count =3D in_num - bad_in_num; =20 /* skip over parts of in_sg that contained the header iov */ size_t skip_size =3D iov_len; @@ -460,17 +466,21 @@ static void fv_queue_worker(gpointer data, gpointer u= ser_data) =20 /* The 'out' part of the elem is from qemu */ unsigned int out_num =3D elem->out_num; + unsigned int out_num_readable =3D out_num - req->bad_out_num; struct iovec *out_sg =3D elem->out_sg; size_t out_len =3D iov_size(out_sg, out_num); + size_t out_len_readable =3D iov_size(out_sg, out_num_readable); fuse_log(FUSE_LOG_DEBUG, - "%s: elem %d: with %d out desc of length %zd\n", - __func__, elem->index, out_num, out_len); + "%s: elem %d: with %d out desc of length %zd" + " bad_in_num=3D%u bad_out_num=3D%u\n", + __func__, elem->index, out_num, out_len, req->bad_in_num, + req->bad_out_num); =20 /* * The elem should contain a 'fuse_in_header' (in to fuse) * plus the data based on the len in the header. */ - if (out_len < sizeof(struct fuse_in_header)) { + if (out_len_readable < sizeof(struct fuse_in_header)) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too short for in_header\n", __func__, elem->index); assert(0); /* TODO */ @@ -484,63 +494,129 @@ static void fv_queue_worker(gpointer data, gpointer = user_data) copy_from_iov(&fbuf, 1, out_sg); =20 pbufv =3D NULL; /* Compiler thinks an unitialised path */ - if (out_num > 2 && - out_sg[0].iov_len =3D=3D sizeof(struct fuse_in_header) && - ((struct fuse_in_header *)fbuf.mem)->opcode =3D=3D FUSE_WRITE && - out_sg[1].iov_len =3D=3D sizeof(struct fuse_write_in)) { - /* - * For a write we don't actually need to copy the - * data, we can just do it straight out of guest memory - * but we must still copy the headers in case the guest - * was nasty and changed them while we were using them. - */ - fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", __func__); - - /* copy the fuse_write_in header afte rthe fuse_in_header */ - fbuf.mem +=3D out_sg->iov_len; - copy_from_iov(&fbuf, 1, out_sg + 1); - fbuf.mem -=3D out_sg->iov_len; - fbuf.size =3D out_sg[0].iov_len + out_sg[1].iov_len; - - /* Allocate the bufv, with space for the rest of the iov */ - pbufv =3D malloc(sizeof(struct fuse_bufvec) + - sizeof(struct fuse_buf) * (out_num - 2)); - if (!pbufv) { - fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", - __func__); - goto out; - } + if (req->bad_in_num || req->bad_out_num) { + bool handled_unmappable =3D false; + + if (out_num > 2 && out_num_readable >=3D 2 && !req->bad_in_num && + out_sg[0].iov_len =3D=3D sizeof(struct fuse_in_header) && + ((struct fuse_in_header *)fbuf.mem)->opcode =3D=3D FUSE_WRITE = && + out_sg[1].iov_len =3D=3D sizeof(struct fuse_write_in)) { + handled_unmappable =3D true; + + /* copy the fuse_write_in header after fuse_in_header */ + fbuf.mem +=3D out_sg->iov_len; + copy_from_iov(&fbuf, 1, out_sg + 1); + fbuf.mem -=3D out_sg->iov_len; + fbuf.size =3D out_sg[0].iov_len + out_sg[1].iov_len; + + /* Allocate the bufv, with space for the rest of the iov */ + pbufv =3D malloc(sizeof(struct fuse_bufvec) + + sizeof(struct fuse_buf) * (out_num - 2)); + if (!pbufv) { + fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", + __func__); + goto out; + } =20 - allocated_bufv =3D true; - pbufv->count =3D 1; - pbufv->buf[0] =3D fbuf; + allocated_bufv =3D true; + pbufv->count =3D 1; + pbufv->buf[0] =3D fbuf; + + size_t iovindex, pbufvindex; + iovindex =3D 2; /* 2 headers, separate iovs */ + pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ + + for (; iovindex < out_num; iovindex++, pbufvindex++) { + pbufv->count++; + pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ + pbufv->buf[pbufvindex].flags =3D + (iovindex < out_num_readable) ? 0 : + FUSE_BUF_PHYS_ADDR; + pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; + pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; + } + } =20 - size_t iovindex, pbufvindex; - iovindex =3D 2; /* 2 headers, separate iovs */ - pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ + if (out_num =3D=3D 2 && out_num_readable =3D=3D 2 && req->bad_in_n= um && + out_sg[0].iov_len =3D=3D sizeof(struct fuse_in_header) && + ((struct fuse_in_header *)fbuf.mem)->opcode =3D=3D FUSE_READ && + out_sg[1].iov_len =3D=3D sizeof(struct fuse_read_in)) { + fuse_log(FUSE_LOG_DEBUG, + "Unmappable read case " + "in_num=3D%d bad_in_num=3D%d\n", + elem->in_num, req->bad_in_num); + handled_unmappable =3D true; + } =20 - for (; iovindex < out_num; iovindex++, pbufvindex++) { - pbufv->count++; - pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ - pbufv->buf[pbufvindex].flags =3D 0; - pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; - pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; + if (!handled_unmappable) { + fuse_log(FUSE_LOG_ERR, + "Unhandled unmappable element: out: %d(b:%d) in: " + "%d(b:%d)", + out_num, req->bad_out_num, elem->in_num, req->bad_in_= num); + fv_panic(dev, "Unhandled unmappable element"); } - } else { - /* Normal (non fast write) path */ + } + + if (!req->bad_out_num) { + if (out_num > 2 && + out_sg[0].iov_len =3D=3D sizeof(struct fuse_in_header) && + ((struct fuse_in_header *)fbuf.mem)->opcode =3D=3D FUSE_WRITE = && + out_sg[1].iov_len =3D=3D sizeof(struct fuse_write_in)) { + /* + * For a write we don't actually need to copy the + * data, we can just do it straight out of guest memory + * but we must still copy the headers in case the guest + * was nasty and changed them while we were using them. + */ + fuse_log(FUSE_LOG_DEBUG, "%s: Write special case\n", + __func__); + + /* copy the fuse_write_in header after fuse_in_header */ + fbuf.mem +=3D out_sg->iov_len; + copy_from_iov(&fbuf, 1, out_sg + 1); + fbuf.mem -=3D out_sg->iov_len; + fbuf.size =3D out_sg[0].iov_len + out_sg[1].iov_len; + + /* Allocate the bufv, with space for the rest of the iov */ + pbufv =3D malloc(sizeof(struct fuse_bufvec) + + sizeof(struct fuse_buf) * (out_num - 2)); + if (!pbufv) { + fuse_log(FUSE_LOG_ERR, "%s: pbufv malloc failed\n", + __func__); + goto out; + } =20 - /* Copy the rest of the buffer */ - fbuf.mem +=3D out_sg->iov_len; - copy_from_iov(&fbuf, out_num - 1, out_sg + 1); - fbuf.mem -=3D out_sg->iov_len; - fbuf.size =3D out_len; + allocated_bufv =3D true; + pbufv->count =3D 1; + pbufv->buf[0] =3D fbuf; =20 - /* TODO! Endianness of header */ + size_t iovindex, pbufvindex; + iovindex =3D 2; /* 2 headers, separate iovs */ + pbufvindex =3D 1; /* 2 headers, 1 fusebuf */ =20 - /* TODO: Add checks for fuse_session_exited */ - bufv.buf[0] =3D fbuf; - bufv.count =3D 1; - pbufv =3D &bufv; + for (; iovindex < out_num; iovindex++, pbufvindex++) { + pbufv->count++; + pbufv->buf[pbufvindex].pos =3D ~0; /* Dummy */ + pbufv->buf[pbufvindex].flags =3D 0; + pbufv->buf[pbufvindex].mem =3D out_sg[iovindex].iov_base; + pbufv->buf[pbufvindex].size =3D out_sg[iovindex].iov_len; + } + } else { + /* Normal (non fast write) path */ + + /* Copy the rest of the buffer */ + fbuf.mem +=3D out_sg->iov_len; + copy_from_iov(&fbuf, out_num - 1, out_sg + 1); + fbuf.mem -=3D out_sg->iov_len; + fbuf.size =3D out_len; + + /* TODO! Endianness of header */ + + /* TODO: Add checks for fuse_session_exited */ + bufv.buf[0] =3D fbuf; + bufv.count =3D 1; + pbufv =3D &bufv; + } } pbufv->idx =3D 0; pbufv->off =3D 0; @@ -657,13 +733,16 @@ static void *fv_queue_thread(void *opaque) __func__, qi->qidx, (size_t)evalue, in_bytes, out_bytes); =20 while (1) { + unsigned int bad_in_num =3D 0, bad_out_num =3D 0; FVRequest *req =3D vu_queue_pop(dev, q, sizeof(FVRequest), - NULL, NULL); + &bad_in_num, &bad_out_num); if (!req) { break; } =20 req->reply_sent =3D false; + req->bad_in_num =3D bad_in_num; + req->bad_out_num =3D bad_out_num; =20 if (!se->thread_pool_size) { req_list =3D g_list_prepend(req_list, req); --=20 2.29.2