From nobody Fri Nov 7 14:20:48 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1548339594013947.8769969655629; Thu, 24 Jan 2019 06:19:54 -0800 (PST) Received: from localhost ([127.0.0.1]:54724 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gmfr8-0000cE-QU for importer@patchew.org; Thu, 24 Jan 2019 09:19:46 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42074) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gmfpY-0008DJ-6f for qemu-devel@nongnu.org; Thu, 24 Jan 2019 09:18:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gmfpS-0003jW-3Z for qemu-devel@nongnu.org; Thu, 24 Jan 2019 09:18:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56392) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gmfp6-0003Qj-8L; Thu, 24 Jan 2019 09:17:42 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1AC86CAA8C; Thu, 24 Jan 2019 14:17:37 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-116-169.ams2.redhat.com [10.36.116.169]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB03E1001F3D; Thu, 24 Jan 2019 14:17:35 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Thu, 24 Jan 2019 15:17:31 +0100 Message-Id: <20190124141731.21509-1-kwolf@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 24 Jan 2019 14:17:37 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH] file-posix: Cache lseek result for data regions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, qemu-devel@nongnu.org, mreitz@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" Depending on the exact image layout and the storage backend (tmpfs is konwn to have very slow SEEK_HOLE/SEEK_DATA), caching lseek results can save us a lot of time e.g. during a mirror block job or qemu-img convert with a fragmented source image (.bdrv_co_block_status on the protocol layer can be called for every single cluster in the extreme case). We may only cache data regions because of possible concurrent writers. This means that we can later treat a recently punched hole as data, but this is safe. We can't cache holes because then we might treat recently written data as holes, which can cause corruption. Signed-off-by: Kevin Wolf Reviewed-by: Eric Blake --- block/file-posix.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 8aee7a3fb8..7272c7c99d 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -168,6 +168,12 @@ typedef struct BDRVRawState { bool needs_alignment; bool check_cache_dropped; =20 + struct seek_data_cache { + bool valid; + uint64_t start; + uint64_t end; + } seek_data_cache; + PRManager *pr_mgr; } BDRVRawState; =20 @@ -1555,8 +1561,17 @@ static int handle_aiocb_write_zeroes_unmap(void *opa= que) { RawPosixAIOData *aiocb =3D opaque; BDRVRawState *s G_GNUC_UNUSED =3D aiocb->bs->opaque; + struct seek_data_cache *sdc; int ret; =20 + /* Invalidate seek_data_cache if it overlaps */ + sdc =3D &s->seek_data_cache; + if (sdc->valid && !(sdc->end < aiocb->aio_offset || + sdc->start > aiocb->aio_offset + aiocb->aio_nbytes= )) + { + sdc->valid =3D false; + } + /* First try to write zeros and unmap at the same time */ =20 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE @@ -1634,11 +1649,20 @@ static int handle_aiocb_discard(void *opaque) RawPosixAIOData *aiocb =3D opaque; int ret =3D -EOPNOTSUPP; BDRVRawState *s =3D aiocb->bs->opaque; + struct seek_data_cache *sdc; =20 if (!s->has_discard) { return -ENOTSUP; } =20 + /* Invalidate seek_data_cache if it overlaps */ + sdc =3D &s->seek_data_cache; + if (sdc->valid && !(sdc->end < aiocb->aio_offset || + sdc->start > aiocb->aio_offset + aiocb->aio_nbytes= )) + { + sdc->valid =3D false; + } + if (aiocb->aio_type & QEMU_AIO_BLKDEV) { #ifdef BLKDISCARD do { @@ -2424,6 +2448,8 @@ static int coroutine_fn raw_co_block_status(BlockDriv= erState *bs, int64_t *map, BlockDriverState **file) { + BDRVRawState *s =3D bs->opaque; + struct seek_data_cache *sdc; off_t data =3D 0, hole =3D 0; int ret; =20 @@ -2439,6 +2465,14 @@ static int coroutine_fn raw_co_block_status(BlockDri= verState *bs, return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; } =20 + sdc =3D &s->seek_data_cache; + if (sdc->valid && sdc->start <=3D offset && sdc->end > offset) { + *pnum =3D MIN(bytes, sdc->end - offset); + *map =3D offset; + *file =3D bs; + return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; + } + ret =3D find_allocation(bs, offset, &data, &hole); if (ret =3D=3D -ENXIO) { /* Trailing hole */ @@ -2451,14 +2485,27 @@ static int coroutine_fn raw_co_block_status(BlockDr= iverState *bs, } else if (data =3D=3D offset) { /* On a data extent, compute bytes to the end of the extent, * possibly including a partial sector at EOF. */ - *pnum =3D MIN(bytes, hole - offset); + *pnum =3D hole - offset; ret =3D BDRV_BLOCK_DATA; } else { /* On a hole, compute bytes to the beginning of the next extent. = */ assert(hole =3D=3D offset); - *pnum =3D MIN(bytes, data - offset); + *pnum =3D data - offset; ret =3D BDRV_BLOCK_ZERO; } + + /* Caching allocated ranges is okay even if another process writes to = the + * same file because we allow declaring things allocated even if there= is a + * hole. However, we cannot cache holes without risking corruption. */ + if (ret =3D=3D BDRV_BLOCK_DATA) { + *sdc =3D (struct seek_data_cache) { + .valid =3D true, + .start =3D offset, + .end =3D offset + *pnum, + }; + } + + *pnum =3D MIN(*pnum, bytes); *map =3D offset; *file =3D bs; return ret | BDRV_BLOCK_OFFSET_VALID; --=20 2.20.1