From nobody Sun May 19 01:15:19 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1488974956237410.26603244101636; Wed, 8 Mar 2017 04:09:16 -0800 (PST) Received: from localhost ([::1]:55884 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1claOz-0002GC-LA for importer@patchew.org; Wed, 08 Mar 2017 07:09:09 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57277) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1claOS-00029V-DI for qemu-devel@nongnu.org; Wed, 08 Mar 2017 07:08:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1claOR-0007xb-57 for qemu-devel@nongnu.org; Wed, 08 Mar 2017 07:08:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38078) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1claOG-0007vq-9o; Wed, 08 Mar 2017 07:08:24 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5E209C04BD23; Wed, 8 Mar 2017 12:08:24 +0000 (UTC) Received: from lemon.redhat.com (ovpn-8-23.pek2.redhat.com [10.72.8.23]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v28C8Gem019540; Wed, 8 Mar 2017 07:08:17 -0500 From: Fam Zheng To: qemu-devel@nongnu.org Date: Wed, 8 Mar 2017 20:08:14 +0800 Message-Id: <20170308120814.29967-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 08 Mar 2017 12:08:24 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH for-2.9 v3] file-posix: Consider max_segments for BlockLimits.max_transfer X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , pbonzini@redhat.com, qemu-block@nongnu.org, Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" BlockLimits.max_transfer can be too high without this fix, guest will encounter I/O error or even get paused with werror=3Dstop or rerror=3Dstop.= The cause is explained below. Linux has a separate limit, /sys/block/.../queue/max_segments, which in the worst case can be more restrictive than the BLKSECTGET which we already consider (note that they are two different things). So, the failure scenario before this patch is: 1) host device has max_sectors_kb =3D 4096 and max_segments =3D 64; 2) guest learns max_sectors_kb limit from QEMU, but doesn't know max_segments; 3) guest issues e.g. a 512KB request thinking it's okay, but actually it's not, because it will be passed through to host device as an SG_IO req that has niov > 64; 4) host kernel doesn't like the segmenting of the request, and returns -EINVAL; This patch checks the max_segments sysfs entry for the host device and calculates a "conservative" bytes limit using the page size, which is then merged into the existing max_transfer limit. Guest will discover this from the usual virtual block device interfaces. (In the case of scsi-generic, it will be done in the INQUIRY reply interception in device model.) The other possibility is to actually propagate it as a separate limit, but it's not better. On the one hand, there is a big complication: the limit is per-LUN in QEMU PoV (because we can attach LUNs from different host HBAs to the same virtio-scsi bus), but the channel to communicate it in a per-LUN manner is missing down the stack; on the other hand, two limits versus one doesn't change much about the valid size of I/O (because guest has no control over host segmenting). Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL, was explored. Unfortunately there is no neat way to ensure the bounce buffer is less segmented (in terms of DMA addr) than the guest buffer. Practically, this bug is not very common. It is only reported on a Emulex (lpfc), so it's okay to get it fixed in the easier way. Reviewed-by: Paolo Bonzini Signed-off-by: Fam Zheng --- v3: Clearer commit message. [Kevin] v2: Use /sys/dev/block/MAJOR:MINOR/queue/max_segments. [Paolo] --- block/file-posix.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 4de1abd..c4c0663 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -668,6 +668,48 @@ static int hdev_get_max_transfer_length(BlockDriverSta= te *bs, int fd) #endif } =20 +static int hdev_get_max_segments(const struct stat *st) +{ +#ifdef CONFIG_LINUX + char buf[32]; + const char *end; + char *sysfspath; + int ret; + int fd =3D -1; + long max_segments; + + sysfspath =3D g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments= ", + major(st->st_rdev), minor(st->st_rdev)); + fd =3D open(sysfspath, O_RDONLY); + if (fd =3D=3D -1) { + ret =3D -errno; + goto out; + } + do { + ret =3D read(fd, buf, sizeof(buf)); + } while (ret =3D=3D -1 && errno =3D=3D EINTR); + if (ret < 0) { + ret =3D -errno; + goto out; + } else if (ret =3D=3D 0) { + ret =3D -EIO; + goto out; + } + buf[ret] =3D 0; + /* The file is ended with '\n', pass 'end' to accept that. */ + ret =3D qemu_strtol(buf, &end, 10, &max_segments); + if (ret =3D=3D 0 && end && *end =3D=3D '\n') { + ret =3D max_segments; + } + +out: + g_free(sysfspath); + return ret; +#else + return -ENOTSUP; +#endif +} + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s =3D bs->opaque; @@ -679,6 +721,11 @@ static void raw_refresh_limits(BlockDriverState *bs, E= rror **errp) if (ret > 0 && ret <=3D BDRV_REQUEST_MAX_BYTES) { bs->bl.max_transfer =3D pow2floor(ret); } + ret =3D hdev_get_max_segments(&st); + if (ret > 0) { + bs->bl.max_transfer =3D MIN(bs->bl.max_transfer, + ret * getpagesize()); + } } } =20 --=20 2.9.3