From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056734753258.5583218958177; Wed, 7 Feb 2018 18:25:34 -0800 (PST) Received: from localhost ([::1]:59237 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbu1-0002E3-Ve for importer@patchew.org; Wed, 07 Feb 2018 21:25:34 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41569) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbor-000610-8T for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbon-0004Ik-2n for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:13 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51466 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbom-0004Hb-Tx for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:08 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 01B4F40201A1; Thu, 8 Feb 2018 02:20:07 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4CDFBB0789; Thu, 8 Feb 2018 02:20:05 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:38 +0800 Message-Id: <20180208021953.7354-2-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 08 Feb 2018 02:20:07 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 08 Feb 2018 02:20:07 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 01/16] docker: change Fedora base image to fedora:27 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini Using "fedora:latest" makes behavior different depending on when you actually pulled the image from the docker repository. In my case, the supposedly "latest" image was a Fedora 25 download from 8 months ago, and the new "test-debug" test was failing. Use "27" to improve reproducibility and make it clear when the image is obsolete. Cc: Fam Zheng Cc: Marc-Andr=C3=A9 Lureau Signed-off-by: Paolo Bonzini Message-Id: <1515755504-21341-1-git-send-email-pbonzini@redhat.com> Reviewed-by: Marc-Andr=C3=A9 Lureau Signed-off-by: Fam Zheng --- tests/docker/dockerfiles/fedora.docker | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfi= les/fedora.docker index 26ede4f1d6..994a35a332 100644 --- a/tests/docker/dockerfiles/fedora.docker +++ b/tests/docker/dockerfiles/fedora.docker @@ -1,4 +1,4 @@ -FROM fedora:latest +FROM fedora:27 ENV PACKAGES \ ccache gettext git tar PyYAML sparse flex bison python3 bzip2 hostname= \ glib2-devel pixman-devel zlib-devel SDL-devel libfdt-devel \ --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056550796626.9693943535075; Wed, 7 Feb 2018 18:22:30 -0800 (PST) Received: from localhost ([::1]:59216 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbr0-0007dh-NT for importer@patchew.org; Wed, 07 Feb 2018 21:22:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41566) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbor-00060u-7Z for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbon-0004JK-9K for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:13 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49838 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbon-0004IR-3B for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:09 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ACEE14075171; Thu, 8 Feb 2018 02:20:08 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 87364B0794; Thu, 8 Feb 2018 02:20:07 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:39 +0800 Message-Id: <20180208021953.7354-3-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 08 Feb 2018 02:20:08 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 08 Feb 2018 02:20:08 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 02/16] test-coroutine: add simple CoMutex test X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini In preparation for adding a similar test using QemuLockable, add a very simple testcase that has two interleaved calls to lock and unlock. Reviewed-by: Stefan Hajnoczi Signed-off-by: Paolo Bonzini Message-Id: <20180203153935.8056-2-pbonzini@redhat.com> Reviewed-by: Richard Henderson Reviewed-by: Fam Zheng Signed-off-by: Fam Zheng --- tests/test-coroutine.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index 76c646107e..ab8fdf701e 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -175,7 +175,7 @@ static void coroutine_fn c1_fn(void *opaque) qemu_coroutine_enter(c2); } =20 -static void test_co_queue(void) +static void test_no_dangling_access(void) { Coroutine *c1; Coroutine *c2; @@ -195,6 +195,51 @@ static void test_co_queue(void) *c1 =3D tmp; } =20 +static bool locked; +static int done; + +static void coroutine_fn mutex_fn(void *opaque) +{ + CoMutex *m =3D opaque; + qemu_co_mutex_lock(m); + assert(!locked); + locked =3D true; + qemu_coroutine_yield(); + locked =3D false; + qemu_co_mutex_unlock(m); + done++; +} + +static void do_test_co_mutex(CoroutineEntry *entry, void *opaque) +{ + Coroutine *c1 =3D qemu_coroutine_create(entry, opaque); + Coroutine *c2 =3D qemu_coroutine_create(entry, opaque); + + done =3D 0; + qemu_coroutine_enter(c1); + g_assert(locked); + qemu_coroutine_enter(c2); + + /* Unlock queues c2. It is then started automatically when c1 yields = or + * terminates. + */ + qemu_coroutine_enter(c1); + g_assert_cmpint(done, =3D=3D, 1); + g_assert(locked); + + qemu_coroutine_enter(c2); + g_assert_cmpint(done, =3D=3D, 2); + g_assert(!locked); +} + +static void test_co_mutex(void) +{ + CoMutex m; + + qemu_co_mutex_init(&m); + do_test_co_mutex(mutex_fn, &m); +} + /* * Check that creation, enter, and return work */ @@ -422,7 +467,7 @@ int main(int argc, char **argv) * crash, so skip it. */ if (CONFIG_COROUTINE_POOL) { - g_test_add_func("/basic/co_queue", test_co_queue); + g_test_add_func("/basic/no-dangling-access", test_no_dangling_acce= ss); } =20 g_test_add_func("/basic/lifecycle", test_lifecycle); @@ -432,6 +477,7 @@ int main(int argc, char **argv) g_test_add_func("/basic/entered", test_entered); g_test_add_func("/basic/in_coroutine", test_in_coroutine); g_test_add_func("/basic/order", test_order); + g_test_add_func("/locking/co-mutex", test_co_mutex); if (g_test_perf()) { g_test_add_func("/perf/lifecycle", perf_lifecycle); g_test_add_func("/perf/nesting", perf_nesting); --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056734780724.5990248698874; Wed, 7 Feb 2018 18:25:34 -0800 (PST) Received: from localhost ([::1]:59238 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbu1-0002F5-Q1 for importer@patchew.org; Wed, 07 Feb 2018 21:25:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41573) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbor-000616-BJ for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbop-0004N5-8o for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:13 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60050 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbop-0004Mb-3g for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:11 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A9DD44085019; Thu, 8 Feb 2018 02:20:10 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3D0E6B0797; Thu, 8 Feb 2018 02:20:08 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:40 +0800 Message-Id: <20180208021953.7354-4-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:10 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 03/16] lockable: add QemuLockable X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini QemuLockable is a polymorphic lock type that takes an object and knows which function to use for locking and unlocking. The implementation could use C11 _Generic, but since the support is not very widespread I am instead using __builtin_choose_expr and __builtin_types_compatible_p, which are already used by include/qemu/atomic.h. QemuLockable can be used to implement lock guards, or to pass around a lock in such a way that a function can release it and re-acquire it. The next patch will do this for CoQueue. Signed-off-by: Paolo Bonzini Message-Id: <20180203153935.8056-3-pbonzini@redhat.com> Reviewed-by: Richard Henderson Reviewed-by: Stefan Hajnoczi Reviewed-by: Fam Zheng Signed-off-by: Fam Zheng --- include/qemu/compiler.h | 39 ++++++++++++++++++++ include/qemu/coroutine.h | 4 +- include/qemu/lockable.h | 96 ++++++++++++++++++++++++++++++++++++++++++++= ++++ include/qemu/thread.h | 5 +-- include/qemu/typedefs.h | 4 ++ tests/test-coroutine.c | 25 +++++++++++++ 6 files changed, 168 insertions(+), 5 deletions(-) create mode 100644 include/qemu/lockable.h diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h index 5fcc4f7ec7..2cbe6a4f16 100644 --- a/include/qemu/compiler.h +++ b/include/qemu/compiler.h @@ -114,5 +114,44 @@ #ifndef __has_feature #define __has_feature(x) 0 /* compatibility with non-clang compilers */ #endif +/* Implement C11 _Generic via GCC builtins. Example: + * + * QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x) + * + * The first argument is the discriminator. The last is the default value. + * The middle ones are tuples in "(type, expansion)" format. + */ + +/* First, find out the number of generic cases. */ +#define QEMU_GENERIC(x, ...) \ + QEMU_GENERIC_(typeof(x), __VA_ARGS__, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0) + +/* There will be extra arguments, but they are not used. */ +#define QEMU_GENERIC_(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, count, ..= .) \ + QEMU_GENERIC##count(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9) + +/* Two more helper macros, this time to extract items from a parenthesized + * list. + */ +#define QEMU_FIRST_(a, b) a +#define QEMU_SECOND_(a, b) b + +/* ... and a final one for the common part of the "recursion". */ +#define QEMU_GENERIC_IF(x, type_then, else_) = \ + __builtin_choose_expr(__builtin_types_compatible_p(x, = \ + QEMU_FIRST_ type_th= en), \ + QEMU_SECOND_ type_then, else_) + +/* CPP poor man's "recursion". */ +#define QEMU_GENERIC1(x, a0, ...) (a0) +#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, = __VA_ARGS__)) +#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, = __VA_ARGS__)) +#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, = __VA_ARGS__)) +#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, = __VA_ARGS__)) +#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, = __VA_ARGS__)) +#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, = __VA_ARGS__)) +#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, = __VA_ARGS__)) +#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, = __VA_ARGS__)) +#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x,= __VA_ARGS__)) =20 #endif /* COMPILER_H */ diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index ce2eb73670..8a5129741c 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -121,7 +121,7 @@ bool qemu_coroutine_entered(Coroutine *co); * Provides a mutex that can be used to synchronise coroutines */ struct CoWaitRecord; -typedef struct CoMutex { +struct CoMutex { /* Count of pending lockers; 0 for a free mutex, 1 for an * uncontended mutex. */ @@ -142,7 +142,7 @@ typedef struct CoMutex { unsigned handoff, sequence; =20 Coroutine *holder; -} CoMutex; +}; =20 /** * Initialises a CoMutex. This must be called before any other operation i= s used diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h new file mode 100644 index 0000000000..b6ed6c89ec --- /dev/null +++ b/include/qemu/lockable.h @@ -0,0 +1,96 @@ +/* + * Polymorphic locking functions (aka poor man templates) + * + * Copyright Red Hat, Inc. 2017, 2018 + * + * Author: Paolo Bonzini + * + * This work is licensed under the terms of the GNU LGPL, version 2 or lat= er. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QEMU_LOCKABLE_H +#define QEMU_LOCKABLE_H + +#include "qemu/coroutine.h" +#include "qemu/thread.h" + +typedef void QemuLockUnlockFunc(void *); + +struct QemuLockable { + void *object; + QemuLockUnlockFunc *lock; + QemuLockUnlockFunc *unlock; +}; + +/* This function gives an error if an invalid, non-NULL pointer type is pa= ssed + * to QEMU_MAKE_LOCKABLE. For optimized builds, we can rely on dead-code = elimination + * from the compiler, and give the errors already at link time. + */ +#ifdef __OPTIMIZE__ +void unknown_lock_type(void *); +#else +static inline void unknown_lock_type(void *unused) +{ + abort(); +} +#endif + +static inline __attribute__((__always_inline__)) QemuLockable * +qemu_make_lockable(void *x, QemuLockable *lockable) +{ + /* We cannot test this in a macro, otherwise we get compiler + * warnings like "the address of 'm' will always evaluate as 'true'". + */ + return x ? lockable : NULL; +} + +/* Auxiliary macros to simplify QEMU_MAKE_LOCABLE. */ +#define QEMU_LOCK_FUNC(x) ((QemuLockUnlockFunc *) \ + QEMU_GENERIC(x, \ + (QemuMutex *, qemu_mutex_lock), \ + (CoMutex *, qemu_co_mutex_lock), \ + (QemuSpin *, qemu_spin_lock), \ + unknown_lock_type)) + +#define QEMU_UNLOCK_FUNC(x) ((QemuLockUnlockFunc *) \ + QEMU_GENERIC(x, \ + (QemuMutex *, qemu_mutex_unlock), \ + (CoMutex *, qemu_co_mutex_unlock), \ + (QemuSpin *, qemu_spin_unlock), \ + unknown_lock_type)) + +/* In C, compound literals have the lifetime of an automatic variable. + * In C++ it would be different, but then C++ wouldn't need QemuLockable + * either... + */ +#define QEMU_MAKE_LOCKABLE_(x) qemu_make_lockable((x), &(QemuLockable) { = \ + .object =3D (x), \ + .lock =3D QEMU_LOCK_FUNC(x), \ + .unlock =3D QEMU_UNLOCK_FUNC(x), \ + }) + +/* QEMU_MAKE_LOCKABLE - Make a polymorphic QemuLockable + * + * @x: a lock object (currently one of QemuMutex, CoMutex, QemuSpin). + * + * Returns a QemuLockable object that can be passed around + * to a function that can operate with locks of any kind. + */ +#define QEMU_MAKE_LOCKABLE(x) \ + QEMU_GENERIC(x, \ + (QemuLockable *, (x)), \ + QEMU_MAKE_LOCKABLE_(x)) + +static inline void qemu_lockable_lock(QemuLockable *x) +{ + x->lock(x->object); +} + +static inline void qemu_lockable_unlock(QemuLockable *x) +{ + x->unlock(x->object); +} + +#endif diff --git a/include/qemu/thread.h b/include/qemu/thread.h index 9af4e945aa..ef7bd16123 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -4,7 +4,6 @@ #include "qemu/processor.h" #include "qemu/atomic.h" =20 -typedef struct QemuMutex QemuMutex; typedef struct QemuCond QemuCond; typedef struct QemuSemaphore QemuSemaphore; typedef struct QemuEvent QemuEvent; @@ -97,9 +96,9 @@ struct Notifier; void qemu_thread_atexit_add(struct Notifier *notifier); void qemu_thread_atexit_remove(struct Notifier *notifier); =20 -typedef struct QemuSpin { +struct QemuSpin { int value; -} QemuSpin; +}; =20 static inline void qemu_spin_init(QemuSpin *spin) { diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index 9bd7a834ba..5923849cdd 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -19,6 +19,7 @@ typedef struct BusClass BusClass; typedef struct BusState BusState; typedef struct Chardev Chardev; typedef struct CompatProperty CompatProperty; +typedef struct CoMutex CoMutex; typedef struct CPUAddressSpace CPUAddressSpace; typedef struct CPUState CPUState; typedef struct DeviceListener DeviceListener; @@ -86,9 +87,12 @@ typedef struct QEMUBH QEMUBH; typedef struct QemuConsole QemuConsole; typedef struct QemuDmaBuf QemuDmaBuf; typedef struct QEMUFile QEMUFile; +typedef struct QemuLockable QemuLockable; +typedef struct QemuMutex QemuMutex; typedef struct QemuOpt QemuOpt; typedef struct QemuOpts QemuOpts; typedef struct QemuOptsList QemuOptsList; +typedef struct QemuSpin QemuSpin; typedef struct QEMUSGList QEMUSGList; typedef struct QEMUTimer QEMUTimer; typedef struct QEMUTimerListGroup QEMUTimerListGroup; diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index ab8fdf701e..28e79b3210 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -14,6 +14,7 @@ #include "qemu/osdep.h" #include "qemu/coroutine.h" #include "qemu/coroutine_int.h" +#include "qemu/lockable.h" =20 /* * Check that qemu_in_coroutine() works @@ -210,6 +211,18 @@ static void coroutine_fn mutex_fn(void *opaque) done++; } =20 +static void coroutine_fn lockable_fn(void *opaque) +{ + QemuLockable *x =3D opaque; + qemu_lockable_lock(x); + assert(!locked); + locked =3D true; + qemu_coroutine_yield(); + locked =3D false; + qemu_lockable_unlock(x); + done++; +} + static void do_test_co_mutex(CoroutineEntry *entry, void *opaque) { Coroutine *c1 =3D qemu_coroutine_create(entry, opaque); @@ -240,6 +253,17 @@ static void test_co_mutex(void) do_test_co_mutex(mutex_fn, &m); } =20 +static void test_co_mutex_lockable(void) +{ + CoMutex m; + CoMutex *null_pointer =3D NULL; + + qemu_co_mutex_init(&m); + do_test_co_mutex(lockable_fn, QEMU_MAKE_LOCKABLE(&m)); + + g_assert(QEMU_MAKE_LOCKABLE(null_pointer) =3D=3D NULL); +} + /* * Check that creation, enter, and return work */ @@ -478,6 +502,7 @@ int main(int argc, char **argv) g_test_add_func("/basic/in_coroutine", test_in_coroutine); g_test_add_func("/basic/order", test_order); g_test_add_func("/locking/co-mutex", test_co_mutex); + g_test_add_func("/locking/co-mutex/lockable", test_co_mutex_lockable); if (g_test_perf()) { g_test_add_func("/perf/lifecycle", perf_lifecycle); g_test_add_func("/perf/nesting", perf_nesting); --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 15180565508002.9681959396320963; Wed, 7 Feb 2018 18:22:30 -0800 (PST) Received: from localhost ([::1]:59214 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbr0-0007dJ-7E for importer@patchew.org; Wed, 07 Feb 2018 21:22:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41602) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbos-00063d-HU for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbor-0004Oe-9o for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:14 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53800 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejboq-0004Nw-Oh for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:13 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 60984818532A; Thu, 8 Feb 2018 02:20:12 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3A56AB0794; Thu, 8 Feb 2018 02:20:10 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:41 +0800 Message-Id: <20180208021953.7354-5-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:12 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:12 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 04/16] coroutine-lock: convert CoQueue to use QemuLockable X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini There are cases in which a queued coroutine must be restarted from non-coroutine context (with qemu_co_enter_next). In this cases, qemu_co_enter_next also needs to be thread-safe, but it cannot use a CoMutex and so cannot qemu_co_queue_wait. Use QemuLockable so that the CoQueue can interchangeably use CoMutex or QemuMutex. Reviewed-by: Stefan Hajnoczi Signed-off-by: Paolo Bonzini Message-Id: <20180203153935.8056-4-pbonzini@redhat.com> Reviewed-by: Fam Zheng Signed-off-by: Fam Zheng --- include/qemu/coroutine.h | 6 +++++- util/qemu-coroutine-lock.c | 12 +++++++----- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index 8a5129741c..1e5f0957e6 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -183,7 +183,9 @@ void qemu_co_queue_init(CoQueue *queue); * caller of the coroutine. The mutex is unlocked during the wait and * locked again afterwards. */ -void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex); +#define qemu_co_queue_wait(queue, lock) \ + qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock)) +void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lo= ck); =20 /** * Restarts the next coroutine in the CoQueue and removes it from the queu= e. @@ -271,4 +273,6 @@ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, = int64_t ns); */ void coroutine_fn yield_until_fd_readable(int fd); =20 +#include "qemu/lockable.h" + #endif /* QEMU_COROUTINE_H */ diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c index 846ff9167f..2a66fc1467 100644 --- a/util/qemu-coroutine-lock.c +++ b/util/qemu-coroutine-lock.c @@ -40,13 +40,13 @@ void qemu_co_queue_init(CoQueue *queue) QSIMPLEQ_INIT(&queue->entries); } =20 -void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex) +void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lo= ck) { Coroutine *self =3D qemu_coroutine_self(); QSIMPLEQ_INSERT_TAIL(&queue->entries, self, co_queue_next); =20 - if (mutex) { - qemu_co_mutex_unlock(mutex); + if (lock) { + qemu_lockable_unlock(lock); } =20 /* There is no race condition here. Other threads will call @@ -60,9 +60,11 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoM= utex *mutex) /* TODO: OSv implements wait morphing here, where the wakeup * primitive automatically places the woken coroutine on the * mutex's queue. This avoids the thundering herd effect. + * This could be implemented for CoMutexes, but not really for + * other cases of QemuLockable. */ - if (mutex) { - qemu_co_mutex_lock(mutex); + if (lock) { + qemu_lockable_lock(lock); } } =20 --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056922228320.49837589381275; Wed, 7 Feb 2018 18:28:42 -0800 (PST) Received: from localhost ([::1]:59307 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbwz-00053G-LI for importer@patchew.org; Wed, 07 Feb 2018 21:28:37 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41626) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbot-00064w-Pr for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbos-0004Pk-Je for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:15 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60052 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbos-0004PO-CO for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:14 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 146B140363AD; Thu, 8 Feb 2018 02:20:14 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id E4A95B0789; Thu, 8 Feb 2018 02:20:12 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:42 +0800 Message-Id: <20180208021953.7354-6-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:14 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:14 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 05/16] coroutine-lock: make qemu_co_enter_next thread-safe X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini qemu_co_queue_next does not need to release and re-acquire the mutex, because the queued coroutine does not run immediately. However, this does not hold for qemu_co_enter_next. Now that qemu_co_queue_wait can synchronize (via QemuLockable) with code that is not running in coroutine context, it's important that code using qemu_co_enter_next can easily use a standardized locking idiom. First of all, qemu_co_enter_next must use aio_co_wake to restart the coroutine. Second, the function gains a second argument, a QemuLockable*, and the comments of qemu_co_queue_next and qemu_co_queue_restart_all are adjusted to clarify the difference. Signed-off-by: Paolo Bonzini Message-Id: <20180203153935.8056-5-pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi Reviewed-by: Fam Zheng Signed-off-by: Fam Zheng --- fsdev/qemu-fsdev-throttle.c | 4 ++-- include/qemu/coroutine.h | 19 +++++++++++++------ util/qemu-coroutine-lock.c | 10 ++++++++-- 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c index 49eebb5412..1dc07fbc12 100644 --- a/fsdev/qemu-fsdev-throttle.c +++ b/fsdev/qemu-fsdev-throttle.c @@ -20,13 +20,13 @@ static void fsdev_throttle_read_timer_cb(void *opaque) { FsThrottle *fst =3D opaque; - qemu_co_enter_next(&fst->throttled_reqs[false]); + qemu_co_enter_next(&fst->throttled_reqs[false], NULL); } =20 static void fsdev_throttle_write_timer_cb(void *opaque) { FsThrottle *fst =3D opaque; - qemu_co_enter_next(&fst->throttled_reqs[true]); + qemu_co_enter_next(&fst->throttled_reqs[true], NULL); } =20 void fsdev_throttle_parse_opts(QemuOpts *opts, FsThrottle *fst, Error **er= rp) diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index 1e5f0957e6..6f8a487041 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -188,21 +188,28 @@ void qemu_co_queue_init(CoQueue *queue); void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lo= ck); =20 /** - * Restarts the next coroutine in the CoQueue and removes it from the queu= e. - * - * Returns true if a coroutine was restarted, false if the queue is empty. + * Removes the next coroutine from the CoQueue, and wake it up. + * Returns true if a coroutine was removed, false if the queue is empty. */ bool coroutine_fn qemu_co_queue_next(CoQueue *queue); =20 /** - * Restarts all coroutines in the CoQueue and leaves the queue empty. + * Empties the CoQueue; all coroutines are woken up. */ void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue); =20 /** - * Enter the next coroutine in the queue + * Removes the next coroutine from the CoQueue, and wake it up. Unlike + * qemu_co_queue_next, this function releases the lock during aio_co_wake + * because it is meant to be used outside coroutine context; in that case,= the + * coroutine is entered immediately, before qemu_co_enter_next returns. + * + * If used in coroutine context, qemu_co_enter_next is equivalent to + * qemu_co_queue_next. */ -bool qemu_co_enter_next(CoQueue *queue); +#define qemu_co_enter_next(queue, lock) \ + qemu_co_enter_next_impl(queue, QEMU_MAKE_LOCKABLE(lock)) +bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock); =20 /** * Checks if the CoQueue is empty. diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c index 2a66fc1467..78fb79acf8 100644 --- a/util/qemu-coroutine-lock.c +++ b/util/qemu-coroutine-lock.c @@ -132,7 +132,7 @@ void coroutine_fn qemu_co_queue_restart_all(CoQueue *qu= eue) qemu_co_queue_do_restart(queue, false); } =20 -bool qemu_co_enter_next(CoQueue *queue) +bool qemu_co_enter_next_impl(CoQueue *queue, QemuLockable *lock) { Coroutine *next; =20 @@ -142,7 +142,13 @@ bool qemu_co_enter_next(CoQueue *queue) } =20 QSIMPLEQ_REMOVE_HEAD(&queue->entries, co_queue_next); - qemu_coroutine_enter(next); + if (lock) { + qemu_lockable_unlock(lock); + } + aio_co_wake(next); + if (lock) { + qemu_lockable_lock(lock); + } return true; } =20 --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056628759742.9655759069659; Wed, 7 Feb 2018 18:23:48 -0800 (PST) Received: from localhost ([::1]:59220 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbsJ-0000VW-Qo for importer@patchew.org; Wed, 07 Feb 2018 21:23:47 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41649) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbov-00066b-QV for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbou-0004Qm-I6 for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:17 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49846 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbou-0004QV-4V for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:16 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BCA944075171; Thu, 8 Feb 2018 02:20:15 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 981F2B0789; Thu, 8 Feb 2018 02:20:14 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:43 +0800 Message-Id: <20180208021953.7354-7-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 08 Feb 2018 02:20:15 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 08 Feb 2018 02:20:15 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 06/16] curl: convert to CoQueue X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini Now that CoQueues can use a QemuMutex for thread-safety, there is no need for curl to roll its own coroutine queue. Coroutines can be placed directly on the queue instead of using a list of CURLAIOCBs. Reviewed-by: Stefan Hajnoczi Signed-off-by: Paolo Bonzini Message-Id: <20180203153935.8056-6-pbonzini@redhat.com> Reviewed-by: Fam Zheng Signed-off-by: Fam Zheng --- block/curl.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/block/curl.c b/block/curl.c index 35cf417f59..cd578d3d14 100644 --- a/block/curl.c +++ b/block/curl.c @@ -101,8 +101,6 @@ typedef struct CURLAIOCB { =20 size_t start; size_t end; - - QSIMPLEQ_ENTRY(CURLAIOCB) next; } CURLAIOCB; =20 typedef struct CURLSocket { @@ -138,7 +136,7 @@ typedef struct BDRVCURLState { bool accept_range; AioContext *aio_context; QemuMutex mutex; - QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq; + CoQueue free_state_waitq; char *username; char *password; char *proxyusername; @@ -538,7 +536,6 @@ static int curl_init_state(BDRVCURLState *s, CURLState = *state) /* Called with s->mutex held. */ static void curl_clean_state(CURLState *s) { - CURLAIOCB *next; int j; for (j =3D 0; j < CURL_NUM_ACB; j++) { assert(!s->acb[j]); @@ -556,13 +553,7 @@ static void curl_clean_state(CURLState *s) =20 s->in_use =3D 0; =20 - next =3D QSIMPLEQ_FIRST(&s->s->free_state_waitq); - if (next) { - QSIMPLEQ_REMOVE_HEAD(&s->s->free_state_waitq, next); - qemu_mutex_unlock(&s->s->mutex); - aio_co_wake(next->co); - qemu_mutex_lock(&s->s->mutex); - } + qemu_co_enter_next(&s->s->free_state_waitq, &s->s->mutex); } =20 static void curl_parse_filename(const char *filename, QDict *options, @@ -784,7 +775,7 @@ static int curl_open(BlockDriverState *bs, QDict *optio= ns, int flags, } =20 DPRINTF("CURL: Opening %s\n", file); - QSIMPLEQ_INIT(&s->free_state_waitq); + qemu_co_queue_init(&s->free_state_waitq); s->aio_context =3D bdrv_get_aio_context(bs); s->url =3D g_strdup(file); qemu_mutex_lock(&s->mutex); @@ -888,10 +879,7 @@ static void curl_setup_preadv(BlockDriverState *bs, CU= RLAIOCB *acb) if (state) { break; } - QSIMPLEQ_INSERT_TAIL(&s->free_state_waitq, acb, next); - qemu_mutex_unlock(&s->mutex); - qemu_coroutine_yield(); - qemu_mutex_lock(&s->mutex); + qemu_co_queue_wait(&s->free_state_waitq, &s->mutex); } =20 if (curl_init_state(s, state) < 0) { --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 15180571190221020.1013548987648; Wed, 7 Feb 2018 18:31:59 -0800 (PST) Received: from localhost ([::1]:59374 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc0C-00083m-87 for importer@patchew.org; Wed, 07 Feb 2018 21:31:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41679) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp0-00069y-1C for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbov-0004RQ-VH for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:21 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60058 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbov-0004R7-Qt for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:17 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7040940363AD; Thu, 8 Feb 2018 02:20:17 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C3D2B0797; Thu, 8 Feb 2018 02:20:16 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:44 +0800 Message-Id: <20180208021953.7354-8-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:17 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:17 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 07/16] stubs: Add stubs for ram block API X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" These functions will be wanted by block-obj-y but the actual definition is in obj-y, so stub them to keep the linker happy. Signed-off-by: Fam Zheng Acked-by: Paolo Bonzini Message-Id: <20180110091846.10699-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi --- stubs/Makefile.objs | 1 + stubs/ram-block.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) create mode 100644 stubs/ram-block.c diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 8cfe34328a..2d59d84091 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -42,3 +42,4 @@ stub-obj-y +=3D vmgenid.o stub-obj-y +=3D xen-common.o stub-obj-y +=3D xen-hvm.o stub-obj-y +=3D pci-host-piix.o +stub-obj-y +=3D ram-block.o diff --git a/stubs/ram-block.c b/stubs/ram-block.c new file mode 100644 index 0000000000..cfa5d8678f --- /dev/null +++ b/stubs/ram-block.c @@ -0,0 +1,16 @@ +#include "qemu/osdep.h" +#include "exec/ramlist.h" +#include "exec/cpu-common.h" + +void ram_block_notifier_add(RAMBlockNotifier *n) +{ +} + +void ram_block_notifier_remove(RAMBlockNotifier *n) +{ +} + +int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque) +{ + return 0; +} --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057142244583.7275533523202; Wed, 7 Feb 2018 18:32:22 -0800 (PST) Received: from localhost ([::1]:59473 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc0X-0008Tv-C6 for importer@patchew.org; Wed, 07 Feb 2018 21:32:17 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41702) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp1-0006CK-Tv for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejboy-0004SL-PT for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:23 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57206 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejboy-0004SA-E4 for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:20 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 03E60EB6F8; Thu, 8 Feb 2018 02:20:20 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 006FDB0794; Thu, 8 Feb 2018 02:20:17 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:45 +0800 Message-Id: <20180208021953.7354-9-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Feb 2018 02:20:20 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Feb 2018 02:20:20 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 08/16] util: Introduce vfio helpers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is a library to manage the host vfio interface, which could be used to implement userspace device driver code in QEMU such as NVMe or net controllers. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-3-famz@redhat.com> Signed-off-by: Fam Zheng --- include/qemu/vfio-helpers.h | 33 ++ util/Makefile.objs | 1 + util/trace-events | 11 + util/vfio-helpers.c | 727 ++++++++++++++++++++++++++++++++++++++++= ++++ 4 files changed, 772 insertions(+) create mode 100644 include/qemu/vfio-helpers.h create mode 100644 util/vfio-helpers.c diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h new file mode 100644 index 0000000000..ce7e7b057f --- /dev/null +++ b/include/qemu/vfio-helpers.h @@ -0,0 +1,33 @@ +/* + * QEMU VFIO helpers + * + * Copyright 2016 - 2018 Red Hat, Inc. + * + * Authors: + * Fam Zheng + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_VFIO_HELPERS_H +#define QEMU_VFIO_HELPERS_H +#include "qemu/typedefs.h" + +typedef struct QEMUVFIOState QEMUVFIOState; + +QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp); +void qemu_vfio_close(QEMUVFIOState *s); +int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size, + bool temporary, uint64_t *iova_list); +int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s); +void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host); +void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index, + uint64_t offset, uint64_t size, + Error **errp); +void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar, + uint64_t offset, uint64_t size); +int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e, + int irq_type, Error **errp); + +#endif diff --git a/util/Makefile.objs b/util/Makefile.objs index 2973b0a323..3fb611631f 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -46,3 +46,4 @@ util-obj-y +=3D qht.o util-obj-y +=3D range.o util-obj-y +=3D stats64.o util-obj-y +=3D systemd.o +util-obj-$(CONFIG_LINUX) +=3D vfio-helpers.o diff --git a/util/trace-events b/util/trace-events index 515e6257fb..4822434c89 100644 --- a/util/trace-events +++ b/util/trace-events @@ -60,3 +60,14 @@ lockcnt_futex_wake(const void *lockcnt) "lockcnt %p waki= ng up one waiter" qemu_mutex_lock(void *mutex, const char *file, const int line) "waiting on= mutex %p (%s:%d)" qemu_mutex_locked(void *mutex, const char *file, const int line) "taken mu= tex %p (%s:%d)" qemu_mutex_unlock(void *mutex, const char *file, const int line) "released= mutex %p (%s:%d)" + +# util/vfio-helpers.c +qemu_vfio_dma_reset_temporary(void *s) "s %p" +qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p siz= e 0x%zx" +qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p s= ize 0x%zx" +qemu_vfio_find_mapping(void *s, void *p) "s %p host %p" +qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_= t iova) "s %p host %p size %zu index %d iova 0x%"PRIx64 +qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %= p host %p size %zu iova 0x%"PRIx64 +qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64= _t *iova) "s %p host %p size %zu temporary %d iova %p" +qemu_vfio_dma_map_invalid(void *s, void *mapping_host, size_t mapping_size= , void *host, size_t size) "s %p mapping %p %zu requested %p %zu" +qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p" diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c new file mode 100644 index 0000000000..f478b68400 --- /dev/null +++ b/util/vfio-helpers.c @@ -0,0 +1,727 @@ +/* + * VFIO utility + * + * Copyright 2016 - 2018 Red Hat, Inc. + * + * Authors: + * Fam Zheng + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include +#include "qapi/error.h" +#include "exec/ramlist.h" +#include "exec/cpu-common.h" +#include "trace.h" +#include "qemu/queue.h" +#include "qemu/error-report.h" +#include "standard-headers/linux/pci_regs.h" +#include "qemu/event_notifier.h" +#include "qemu/vfio-helpers.h" +#include "trace.h" + +#define QEMU_VFIO_DEBUG 0 + +#define QEMU_VFIO_IOVA_MIN 0x10000ULL +/* XXX: Once VFIO exposes the iova bit width in the IOMMU capability inter= face, + * we can use a runtime limit; alternatively it's also possible to do plat= form + * specific detection by reading sysfs entries. Until then, 39 is a safe b= et. + **/ +#define QEMU_VFIO_IOVA_MAX (1ULL << 39) + +typedef struct { + /* Page aligned addr. */ + void *host; + size_t size; + uint64_t iova; +} IOVAMapping; + +struct QEMUVFIOState { + QemuMutex lock; + + /* These fields are protected by BQL */ + int container; + int group; + int device; + RAMBlockNotifier ram_notifier; + struct vfio_region_info config_region_info, bar_region_info[6]; + + /* These fields are protected by @lock */ + /* VFIO's IO virtual address space is managed by splitting into a few + * sections: + * + * --------------- <=3D 0 + * |xxxxxxxxxxxxx| + * |-------------| <=3D QEMU_VFIO_IOVA_MIN + * | | + * | Fixed | + * | | + * |-------------| <=3D low_water_mark + * | | + * | Free | + * | | + * |-------------| <=3D high_water_mark + * | | + * | Temp | + * | | + * |-------------| <=3D QEMU_VFIO_IOVA_MAX + * |xxxxxxxxxxxxx| + * |xxxxxxxxxxxxx| + * --------------- + * + * - Addresses lower than QEMU_VFIO_IOVA_MIN are reserved as invalid; + * + * - Fixed mappings of HVAs are assigned "low" IOVAs in the range of + * [QEMU_VFIO_IOVA_MIN, low_water_mark). Once allocated they will n= ot be + * reclaimed - low_water_mark never shrinks; + * + * - IOVAs in range [low_water_mark, high_water_mark) are free; + * + * - IOVAs in range [high_water_mark, QEMU_VFIO_IOVA_MAX) are volatile + * mappings. At each qemu_vfio_dma_reset_temporary() call, the whole= area + * is recycled. The caller should make sure I/O's depending on these + * mappings are completed before calling. + **/ + uint64_t low_water_mark; + uint64_t high_water_mark; + IOVAMapping *mappings; + int nr_mappings; +}; + +/** + * Find group file by PCI device address as specified @device, and return = the + * path. The returned string is owned by caller and should be g_free'ed la= ter. + */ +static char *sysfs_find_group_file(const char *device, Error **errp) +{ + char *sysfs_link; + char *sysfs_group; + char *p; + char *path =3D NULL; + + sysfs_link =3D g_strdup_printf("/sys/bus/pci/devices/%s/iommu_group", = device); + sysfs_group =3D g_malloc(PATH_MAX); + if (readlink(sysfs_link, sysfs_group, PATH_MAX - 1) =3D=3D -1) { + error_setg_errno(errp, errno, "Failed to find iommu group sysfs pa= th"); + goto out; + } + p =3D strrchr(sysfs_group, '/'); + if (!p) { + error_setg(errp, "Failed to find iommu group number"); + goto out; + } + + path =3D g_strdup_printf("/dev/vfio/%s", p + 1); +out: + g_free(sysfs_link); + g_free(sysfs_group); + return path; +} + +static inline void assert_bar_index_valid(QEMUVFIOState *s, int index) +{ + assert(index >=3D 0 && index < ARRAY_SIZE(s->bar_region_info)); +} + +static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **err= p) +{ + assert_bar_index_valid(s, index); + s->bar_region_info[index] =3D (struct vfio_region_info) { + .index =3D VFIO_PCI_BAR0_REGION_INDEX + index, + .argsz =3D sizeof(struct vfio_region_info), + }; + if (ioctl(s->device, VFIO_DEVICE_GET_REGION_INFO, &s->bar_region_info[= index])) { + error_setg_errno(errp, errno, "Failed to get BAR region info"); + return -errno; + } + + return 0; +} + +/** + * Map a PCI bar area. + */ +void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index, + uint64_t offset, uint64_t size, + Error **errp) +{ + void *p; + assert_bar_index_valid(s, index); + p =3D mmap(NULL, MIN(size, s->bar_region_info[index].size - offset), + PROT_READ | PROT_WRITE, MAP_SHARED, + s->device, s->bar_region_info[index].offset + offset); + if (p =3D=3D MAP_FAILED) { + error_setg_errno(errp, errno, "Failed to map BAR region"); + p =3D NULL; + } + return p; +} + +/** + * Unmap a PCI bar area. + */ +void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar, + uint64_t offset, uint64_t size) +{ + if (bar) { + munmap(bar, MIN(size, s->bar_region_info[index].size - offset)); + } +} + +/** + * Initialize device IRQ with @irq_type and and register an event notifier. + */ +int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e, + int irq_type, Error **errp) +{ + int r; + struct vfio_irq_set *irq_set; + size_t irq_set_size; + struct vfio_irq_info irq_info =3D { .argsz =3D sizeof(irq_info) }; + + irq_info.index =3D irq_type; + if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, &irq_info)) { + error_setg_errno(errp, errno, "Failed to get device interrupt info= "); + return -errno; + } + if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) { + error_setg(errp, "Device interrupt doesn't support eventfd"); + return -EINVAL; + } + + irq_set_size =3D sizeof(*irq_set) + sizeof(int); + irq_set =3D g_malloc0(irq_set_size); + + /* Get to a known IRQ state */ + *irq_set =3D (struct vfio_irq_set) { + .argsz =3D irq_set_size, + .flags =3D VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER, + .index =3D irq_info.index, + .start =3D 0, + .count =3D 1, + }; + + *(int *)&irq_set->data =3D event_notifier_get_fd(e); + r =3D ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set); + g_free(irq_set); + if (r) { + error_setg_errno(errp, errno, "Failed to setup device interrupt"); + return -errno; + } + return 0; +} + +static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf, + int size, int ofs) +{ + int ret; + + do { + ret =3D pread(s->device, buf, size, s->config_region_info.offset += ofs); + } while (ret =3D=3D -1 && errno =3D=3D EINTR); + return ret =3D=3D size ? 0 : -errno; +} + +static int qemu_vfio_pci_write_config(QEMUVFIOState *s, void *buf, int siz= e, int ofs) +{ + int ret; + + do { + ret =3D pwrite(s->device, buf, size, s->config_region_info.offset = + ofs); + } while (ret =3D=3D -1 && errno =3D=3D EINTR); + return ret =3D=3D size ? 0 : -errno; +} + +static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device, + Error **errp) +{ + int ret; + int i; + uint16_t pci_cmd; + struct vfio_group_status group_status =3D { .argsz =3D sizeof(group_st= atus) }; + struct vfio_iommu_type1_info iommu_info =3D { .argsz =3D sizeof(iommu_= info) }; + struct vfio_device_info device_info =3D { .argsz =3D sizeof(device_inf= o) }; + char *group_file =3D NULL; + + /* Create a new container */ + s->container =3D open("/dev/vfio/vfio", O_RDWR); + + if (s->container =3D=3D -1) { + error_setg_errno(errp, errno, "Failed to open /dev/vfio/vfio"); + return -errno; + } + if (ioctl(s->container, VFIO_GET_API_VERSION) !=3D VFIO_API_VERSION) { + error_setg(errp, "Invalid VFIO version"); + ret =3D -EINVAL; + goto fail_container; + } + + if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) { + error_setg_errno(errp, errno, "VFIO IOMMU check failed"); + ret =3D -EINVAL; + goto fail_container; + } + + /* Open the group */ + group_file =3D sysfs_find_group_file(device, errp); + if (!group_file) { + ret =3D -EINVAL; + goto fail_container; + } + + s->group =3D open(group_file, O_RDWR); + if (s->group =3D=3D -1) { + error_setg_errno(errp, errno, "Failed to open VFIO group file: %s", + group_file); + g_free(group_file); + ret =3D -errno; + goto fail_container; + } + g_free(group_file); + + /* Test the group is viable and available */ + if (ioctl(s->group, VFIO_GROUP_GET_STATUS, &group_status)) { + error_setg_errno(errp, errno, "Failed to get VFIO group status"); + ret =3D -errno; + goto fail; + } + + if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) { + error_setg(errp, "VFIO group is not viable"); + ret =3D -EINVAL; + goto fail; + } + + /* Add the group to the container */ + if (ioctl(s->group, VFIO_GROUP_SET_CONTAINER, &s->container)) { + error_setg_errno(errp, errno, "Failed to add group to VFIO contain= er"); + ret =3D -errno; + goto fail; + } + + /* Enable the IOMMU model we want */ + if (ioctl(s->container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU)) { + error_setg_errno(errp, errno, "Failed to set VFIO IOMMU type"); + ret =3D -errno; + goto fail; + } + + /* Get additional IOMMU info */ + if (ioctl(s->container, VFIO_IOMMU_GET_INFO, &iommu_info)) { + error_setg_errno(errp, errno, "Failed to get IOMMU info"); + ret =3D -errno; + goto fail; + } + + s->device =3D ioctl(s->group, VFIO_GROUP_GET_DEVICE_FD, device); + + if (s->device < 0) { + error_setg_errno(errp, errno, "Failed to get device fd"); + ret =3D -errno; + goto fail; + } + + /* Test and setup the device */ + if (ioctl(s->device, VFIO_DEVICE_GET_INFO, &device_info)) { + error_setg_errno(errp, errno, "Failed to get device info"); + ret =3D -errno; + goto fail; + } + + if (device_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX) { + error_setg(errp, "Invalid device regions"); + ret =3D -EINVAL; + goto fail; + } + + s->config_region_info =3D (struct vfio_region_info) { + .index =3D VFIO_PCI_CONFIG_REGION_INDEX, + .argsz =3D sizeof(struct vfio_region_info), + }; + if (ioctl(s->device, VFIO_DEVICE_GET_REGION_INFO, &s->config_region_in= fo)) { + error_setg_errno(errp, errno, "Failed to get config region info"); + ret =3D -errno; + goto fail; + } + + for (i =3D 0; i < 6; i++) { + ret =3D qemu_vfio_pci_init_bar(s, i, errp); + if (ret) { + goto fail; + } + } + + /* Enable bus master */ + ret =3D qemu_vfio_pci_read_config(s, &pci_cmd, sizeof(pci_cmd), PCI_CO= MMAND); + if (ret) { + goto fail; + } + pci_cmd |=3D PCI_COMMAND_MASTER; + ret =3D qemu_vfio_pci_write_config(s, &pci_cmd, sizeof(pci_cmd), PCI_C= OMMAND); + if (ret) { + goto fail; + } + return 0; +fail: + close(s->group); +fail_container: + close(s->container); + return ret; +} + +static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, + void *host, size_t size) +{ + QEMUVFIOState *s =3D container_of(n, QEMUVFIOState, ram_notifier); + trace_qemu_vfio_ram_block_added(s, host, size); + qemu_vfio_dma_map(s, host, size, false, NULL); +} + +static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n, + void *host, size_t size) +{ + QEMUVFIOState *s =3D container_of(n, QEMUVFIOState, ram_notifier); + if (host) { + trace_qemu_vfio_ram_block_removed(s, host, size); + qemu_vfio_dma_unmap(s, host); + } +} + +static int qemu_vfio_init_ramblock(const char *block_name, void *host_addr, + ram_addr_t offset, ram_addr_t length, + void *opaque) +{ + int ret; + QEMUVFIOState *s =3D opaque; + + if (!host_addr) { + return 0; + } + ret =3D qemu_vfio_dma_map(s, host_addr, length, false, NULL); + if (ret) { + fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n", + host_addr, (uint64_t)length); + } + return 0; +} + +static void qemu_vfio_open_common(QEMUVFIOState *s) +{ + s->ram_notifier.ram_block_added =3D qemu_vfio_ram_block_added; + s->ram_notifier.ram_block_removed =3D qemu_vfio_ram_block_removed; + ram_block_notifier_add(&s->ram_notifier); + s->low_water_mark =3D QEMU_VFIO_IOVA_MIN; + s->high_water_mark =3D QEMU_VFIO_IOVA_MAX; + qemu_ram_foreach_block(qemu_vfio_init_ramblock, s); + qemu_mutex_init(&s->lock); +} + +/** + * Open a PCI device, e.g. "0000:00:01.0". + */ +QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp) +{ + int r; + QEMUVFIOState *s =3D g_new0(QEMUVFIOState, 1); + + r =3D qemu_vfio_init_pci(s, device, errp); + if (r) { + g_free(s); + return NULL; + } + qemu_vfio_open_common(s); + return s; +} + +static void qemu_vfio_dump_mapping(IOVAMapping *m) +{ + if (QEMU_VFIO_DEBUG) { + printf(" vfio mapping %p %" PRIx64 " to %" PRIx64 "\n", m->host, + (uint64_t)m->size, (uint64_t)m->iova); + } +} + +static void qemu_vfio_dump_mappings(QEMUVFIOState *s) +{ + int i; + + if (QEMU_VFIO_DEBUG) { + printf("vfio mappings\n"); + for (i =3D 0; i < s->nr_mappings; ++i) { + qemu_vfio_dump_mapping(&s->mappings[i]); + } + } +} + +/** + * Find the mapping entry that contains [host, host + size) and set @index= to + * the position. If no entry contains it, @index is the position _after_ w= hich + * to insert the new mapping. IOW, it is the index of the largest element = that + * is smaller than @host, or -1 if no entry is. + */ +static IOVAMapping *qemu_vfio_find_mapping(QEMUVFIOState *s, void *host, + int *index) +{ + IOVAMapping *p =3D s->mappings; + IOVAMapping *q =3D p ? p + s->nr_mappings - 1 : NULL; + IOVAMapping *mid; + trace_qemu_vfio_find_mapping(s, host); + if (!p) { + *index =3D -1; + return NULL; + } + while (true) { + mid =3D p + (q - p) / 2; + if (mid =3D=3D p) { + break; + } + if (mid->host > host) { + q =3D mid; + } else if (mid->host < host) { + p =3D mid; + } else { + break; + } + } + if (mid->host > host) { + mid--; + } else if (mid < &s->mappings[s->nr_mappings - 1] + && (mid + 1)->host <=3D host) { + mid++; + } + *index =3D mid - &s->mappings[0]; + if (mid >=3D &s->mappings[0] && + mid->host <=3D host && mid->host + mid->size > host) { + assert(mid < &s->mappings[s->nr_mappings]); + return mid; + } + /* At this point *index + 1 is the right position to insert the new + * mapping.*/ + return NULL; +} + +/** + * Allocate IOVA and and create a new mapping record and insert it in @s. + */ +static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s, + void *host, size_t size, + int index, uint64_t iova) +{ + int shift; + IOVAMapping m =3D {.host =3D host, .size =3D size, .iova =3D iova}; + IOVAMapping *insert; + + assert(QEMU_IS_ALIGNED(size, getpagesize())); + assert(QEMU_IS_ALIGNED(s->low_water_mark, getpagesize())); + assert(QEMU_IS_ALIGNED(s->high_water_mark, getpagesize())); + trace_qemu_vfio_new_mapping(s, host, size, index, iova); + + assert(index >=3D 0); + s->nr_mappings++; + s->mappings =3D g_realloc_n(s->mappings, sizeof(s->mappings[0]), + s->nr_mappings); + insert =3D &s->mappings[index]; + shift =3D s->nr_mappings - index - 1; + if (shift) { + memmove(insert + 1, insert, shift * sizeof(s->mappings[0])); + } + *insert =3D m; + return insert; +} + +/* Do the DMA mapping with VFIO. */ +static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size, + uint64_t iova) +{ + struct vfio_iommu_type1_dma_map dma_map =3D { + .argsz =3D sizeof(dma_map), + .flags =3D VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, + .iova =3D iova, + .vaddr =3D (uintptr_t)host, + .size =3D size, + }; + trace_qemu_vfio_do_mapping(s, host, size, iova); + + if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) { + error_report("VFIO_MAP_DMA: %d", -errno); + return -errno; + } + return 0; +} + +/** + * Undo the DMA mapping from @s with VFIO, and remove from mapping list. + */ +static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping, + Error **errp) +{ + int index; + struct vfio_iommu_type1_dma_unmap unmap =3D { + .argsz =3D sizeof(unmap), + .flags =3D 0, + .iova =3D mapping->iova, + .size =3D mapping->size, + }; + + index =3D mapping - s->mappings; + assert(mapping->size > 0); + assert(QEMU_IS_ALIGNED(mapping->size, getpagesize())); + assert(index >=3D 0 && index < s->nr_mappings); + if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno); + } + memmove(mapping, &s->mappings[index + 1], + sizeof(s->mappings[0]) * (s->nr_mappings - index - 1)); + s->nr_mappings--; + s->mappings =3D g_realloc_n(s->mappings, sizeof(s->mappings[0]), + s->nr_mappings); +} + +/* Check if the mapping list is (ascending) ordered. */ +static bool qemu_vfio_verify_mappings(QEMUVFIOState *s) +{ + int i; + if (QEMU_VFIO_DEBUG) { + for (i =3D 0; i < s->nr_mappings - 1; ++i) { + if (!(s->mappings[i].host < s->mappings[i + 1].host)) { + fprintf(stderr, "item %d not sorted!\n", i); + qemu_vfio_dump_mappings(s); + return false; + } + if (!(s->mappings[i].host + s->mappings[i].size <=3D + s->mappings[i + 1].host)) { + fprintf(stderr, "item %d overlap with next!\n", i); + qemu_vfio_dump_mappings(s); + return false; + } + } + } + return true; +} + +/* Map [host, host + size) area into a contiguous IOVA address space, and = store + * the result in @iova if not NULL. The caller need to make sure the area = is + * aligned to page size, and mustn't overlap with existing mapping areas (= split + * mapping status within this area is not allowed). + */ +int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size, + bool temporary, uint64_t *iova) +{ + int ret =3D 0; + int index; + IOVAMapping *mapping; + uint64_t iova0; + + assert(QEMU_PTR_IS_ALIGNED(host, getpagesize())); + assert(QEMU_IS_ALIGNED(size, getpagesize())); + trace_qemu_vfio_dma_map(s, host, size, temporary, iova); + qemu_mutex_lock(&s->lock); + mapping =3D qemu_vfio_find_mapping(s, host, &index); + if (mapping) { + iova0 =3D mapping->iova + ((uint8_t *)host - (uint8_t *)mapping->h= ost); + } else { + if (s->high_water_mark - s->low_water_mark + 1 < size) { + ret =3D -ENOMEM; + goto out; + } + if (!temporary) { + iova0 =3D s->low_water_mark; + mapping =3D qemu_vfio_add_mapping(s, host, size, index + 1, io= va0); + if (!mapping) { + ret =3D -ENOMEM; + goto out; + } + assert(qemu_vfio_verify_mappings(s)); + ret =3D qemu_vfio_do_mapping(s, host, size, iova0); + if (ret) { + qemu_vfio_undo_mapping(s, mapping, NULL); + goto out; + } + s->low_water_mark +=3D size; + qemu_vfio_dump_mappings(s); + } else { + iova0 =3D s->high_water_mark - size; + ret =3D qemu_vfio_do_mapping(s, host, size, iova0); + if (ret) { + goto out; + } + s->high_water_mark -=3D size; + } + } + if (iova) { + *iova =3D iova0; + } +out: + qemu_mutex_unlock(&s->lock); + return ret; +} + +/* Reset the high watermark and free all "temporary" mappings. */ +int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s) +{ + struct vfio_iommu_type1_dma_unmap unmap =3D { + .argsz =3D sizeof(unmap), + .flags =3D 0, + .iova =3D s->high_water_mark, + .size =3D QEMU_VFIO_IOVA_MAX - s->high_water_mark, + }; + trace_qemu_vfio_dma_reset_temporary(s); + qemu_mutex_lock(&s->lock); + if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + error_report("VFIO_UNMAP_DMA: %d", -errno); + qemu_mutex_unlock(&s->lock); + return -errno; + } + s->high_water_mark =3D QEMU_VFIO_IOVA_MAX; + qemu_mutex_unlock(&s->lock); + return 0; +} + +/* Unmapping the whole area that was previously mapped with + * qemu_vfio_dma_map(). */ +void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host) +{ + int index =3D 0; + IOVAMapping *m; + + if (!host) { + return; + } + + trace_qemu_vfio_dma_unmap(s, host); + qemu_mutex_lock(&s->lock); + m =3D qemu_vfio_find_mapping(s, host, &index); + if (!m) { + goto out; + } + qemu_vfio_undo_mapping(s, m, NULL); +out: + qemu_mutex_unlock(&s->lock); +} + +static void qemu_vfio_reset(QEMUVFIOState *s) +{ + ioctl(s->device, VFIO_DEVICE_RESET); +} + +/* Close and free the VFIO resources. */ +void qemu_vfio_close(QEMUVFIOState *s) +{ + int i; + + if (!s) { + return; + } + for (i =3D 0; i < s->nr_mappings; ++i) { + qemu_vfio_undo_mapping(s, &s->mappings[i], NULL); + } + ram_block_notifier_remove(&s->ram_notifier); + qemu_vfio_reset(s); + close(s->device); + close(s->group); + close(s->container); +} --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057305664441.0358391681599; Wed, 7 Feb 2018 18:35:05 -0800 (PST) Received: from localhost ([::1]:59847 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc3E-0002MX-Nt for importer@patchew.org; Wed, 07 Feb 2018 21:35:04 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41747) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp5-0006GZ-Op for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbp1-0004TO-CR for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:27 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53804 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp0-0004Sv-Tt for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:23 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 911EA818532A; Thu, 8 Feb 2018 02:20:22 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 884D8B0789; Thu, 8 Feb 2018 02:20:20 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:46 +0800 Message-Id: <20180208021953.7354-10-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:22 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:22 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 09/16] block: Add VFIO based NVMe driver X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This is a new protocol driver that exclusively opens a host NVMe controller through VFIO. It achieves better latency than linux-aio by completely bypassing host kernel vfs/block layer. $rw-$bs-$iodepth linux-aio nvme:// ---------------------------------------- randread-4k-1 10.5k 21.6k randread-512k-1 745 1591 randwrite-4k-1 30.7k 37.0k randwrite-512k-1 1945 1980 (unit: IOPS) The driver also integrates with the polling mechanism of iothread. This patch is co-authored by Paolo and me. Signed-off-by: Paolo Bonzini Signed-off-by: Fam Zheng Message-Id: <20180116060901.17413-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi Signed-off-by: Fam Zheng --- MAINTAINERS | 6 + block/Makefile.objs | 1 + block/nvme.c | 1182 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ block/trace-events | 21 + 4 files changed, 1210 insertions(+) create mode 100644 block/nvme.c diff --git a/MAINTAINERS b/MAINTAINERS index bbc3a617c2..301b6996e1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1888,6 +1888,12 @@ L: qemu-block@nongnu.org S: Supported F: block/null.c =20 +NVMe Block Driver +M: Fam Zheng +L: qemu-block@nongnu.org +S: Supported +F: block/nvme* + Bootdevice M: Gonglei S: Maintained diff --git a/block/Makefile.objs b/block/Makefile.objs index a73387f1bf..aede94f105 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -11,6 +11,7 @@ block-obj-$(CONFIG_POSIX) +=3D file-posix.o block-obj-$(CONFIG_LINUX_AIO) +=3D linux-aio.o block-obj-y +=3D null.o mirror.o commit.o io.o block-obj-y +=3D throttle-groups.o +block-obj-$(CONFIG_LINUX) +=3D nvme.o =20 block-obj-y +=3D nbd.o nbd-client.o sheepdog.o block-obj-$(CONFIG_LIBISCSI) +=3D iscsi.o diff --git a/block/nvme.c b/block/nvme.c new file mode 100644 index 0000000000..0bae185b88 --- /dev/null +++ b/block/nvme.c @@ -0,0 +1,1182 @@ +/* + * NVMe block driver based on vfio + * + * Copyright 2016 - 2018 Red Hat, Inc. + * + * Authors: + * Fam Zheng + * Paolo Bonzini + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include +#include "qapi/error.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qstring.h" +#include "qemu/error-report.h" +#include "qemu/cutils.h" +#include "qemu/vfio-helpers.h" +#include "block/block_int.h" +#include "trace.h" + +/* TODO: Move nvme spec definitions from hw/block/nvme.h into a separate f= ile + * that doesn't depend on dma/pci headers. */ +#include "sysemu/dma.h" +#include "hw/pci/pci.h" +#include "hw/block/block.h" +#include "hw/block/nvme.h" + +#define NVME_SQ_ENTRY_BYTES 64 +#define NVME_CQ_ENTRY_BYTES 16 +#define NVME_QUEUE_SIZE 128 +#define NVME_BAR_SIZE 8192 + +typedef struct { + int32_t head, tail; + uint8_t *queue; + uint64_t iova; + /* Hardware MMIO register */ + volatile uint32_t *doorbell; +} NVMeQueue; + +typedef struct { + BlockCompletionFunc *cb; + void *opaque; + int cid; + void *prp_list_page; + uint64_t prp_list_iova; + bool busy; +} NVMeRequest; + +typedef struct { + CoQueue free_req_queue; + QemuMutex lock; + + /* Fields protected by BQL */ + int index; + uint8_t *prp_list_pages; + + /* Fields protected by @lock */ + NVMeQueue sq, cq; + int cq_phase; + NVMeRequest reqs[NVME_QUEUE_SIZE]; + bool busy; + int need_kick; + int inflight; +} NVMeQueuePair; + +/* Memory mapped registers */ +typedef volatile struct { + uint64_t cap; + uint32_t vs; + uint32_t intms; + uint32_t intmc; + uint32_t cc; + uint32_t reserved0; + uint32_t csts; + uint32_t nssr; + uint32_t aqa; + uint64_t asq; + uint64_t acq; + uint32_t cmbloc; + uint32_t cmbsz; + uint8_t reserved1[0xec0]; + uint8_t cmd_set_specfic[0x100]; + uint32_t doorbells[]; +} QEMU_PACKED NVMeRegs; + +QEMU_BUILD_BUG_ON(offsetof(NVMeRegs, doorbells) !=3D 0x1000); + +typedef struct { + AioContext *aio_context; + QEMUVFIOState *vfio; + NVMeRegs *regs; + /* The submission/completion queue pairs. + * [0]: admin queue. + * [1..]: io queues. + */ + NVMeQueuePair **queues; + int nr_queues; + size_t page_size; + /* How many uint32_t elements does each doorbell entry take. */ + size_t doorbell_scale; + bool write_cache_supported; + EventNotifier irq_notifier; + uint64_t nsze; /* Namespace size reported by identify command */ + int nsid; /* The namespace id to read/write data. */ + uint64_t max_transfer; + int plugged; + + CoMutex dma_map_lock; + CoQueue dma_flush_queue; + + /* Total size of mapped qiov, accessed under dma_map_lock */ + int dma_map_count; +} BDRVNVMeState; + +#define NVME_BLOCK_OPT_DEVICE "device" +#define NVME_BLOCK_OPT_NAMESPACE "namespace" + +static QemuOptsList runtime_opts =3D { + .name =3D "nvme", + .head =3D QTAILQ_HEAD_INITIALIZER(runtime_opts.head), + .desc =3D { + { + .name =3D NVME_BLOCK_OPT_DEVICE, + .type =3D QEMU_OPT_STRING, + .help =3D "NVMe PCI device address", + }, + { + .name =3D NVME_BLOCK_OPT_NAMESPACE, + .type =3D QEMU_OPT_NUMBER, + .help =3D "NVMe namespace", + }, + { /* end of list */ } + }, +}; + +static void nvme_init_queue(BlockDriverState *bs, NVMeQueue *q, + int nentries, int entry_bytes, Error **errp) +{ + BDRVNVMeState *s =3D bs->opaque; + size_t bytes; + int r; + + bytes =3D ROUND_UP(nentries * entry_bytes, s->page_size); + q->head =3D q->tail =3D 0; + q->queue =3D qemu_try_blockalign0(bs, bytes); + + if (!q->queue) { + error_setg(errp, "Cannot allocate queue"); + return; + } + r =3D qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova); + if (r) { + error_setg(errp, "Cannot map queue"); + } +} + +static void nvme_free_queue_pair(BlockDriverState *bs, NVMeQueuePair *q) +{ + qemu_vfree(q->prp_list_pages); + qemu_vfree(q->sq.queue); + qemu_vfree(q->cq.queue); + qemu_mutex_destroy(&q->lock); + g_free(q); +} + +static void nvme_free_req_queue_cb(void *opaque) +{ + NVMeQueuePair *q =3D opaque; + + qemu_mutex_lock(&q->lock); + while (qemu_co_enter_next(&q->free_req_queue, &q->lock)) { + /* Retry all pending requests */ + } + qemu_mutex_unlock(&q->lock); +} + +static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs, + int idx, int size, + Error **errp) +{ + int i, r; + BDRVNVMeState *s =3D bs->opaque; + Error *local_err =3D NULL; + NVMeQueuePair *q =3D g_new0(NVMeQueuePair, 1); + uint64_t prp_list_iova; + + qemu_mutex_init(&q->lock); + q->index =3D idx; + qemu_co_queue_init(&q->free_req_queue); + q->prp_list_pages =3D qemu_blockalign0(bs, s->page_size * NVME_QUEUE_S= IZE); + r =3D qemu_vfio_dma_map(s->vfio, q->prp_list_pages, + s->page_size * NVME_QUEUE_SIZE, + false, &prp_list_iova); + if (r) { + goto fail; + } + for (i =3D 0; i < NVME_QUEUE_SIZE; i++) { + NVMeRequest *req =3D &q->reqs[i]; + req->cid =3D i + 1; + req->prp_list_page =3D q->prp_list_pages + i * s->page_size; + req->prp_list_iova =3D prp_list_iova + i * s->page_size; + } + nvme_init_queue(bs, &q->sq, size, NVME_SQ_ENTRY_BYTES, &local_err); + if (local_err) { + error_propagate(errp, local_err); + goto fail; + } + q->sq.doorbell =3D &s->regs->doorbells[idx * 2 * s->doorbell_scale]; + + nvme_init_queue(bs, &q->cq, size, NVME_CQ_ENTRY_BYTES, &local_err); + if (local_err) { + error_propagate(errp, local_err); + goto fail; + } + q->cq.doorbell =3D &s->regs->doorbells[idx * 2 * s->doorbell_scale + 1= ]; + + return q; +fail: + nvme_free_queue_pair(bs, q); + return NULL; +} + +/* With q->lock */ +static void nvme_kick(BDRVNVMeState *s, NVMeQueuePair *q) +{ + if (s->plugged || !q->need_kick) { + return; + } + trace_nvme_kick(s, q->index); + assert(!(q->sq.tail & 0xFF00)); + /* Fence the write to submission queue entry before notifying the devi= ce. */ + smp_wmb(); + *q->sq.doorbell =3D cpu_to_le32(q->sq.tail); + q->inflight +=3D q->need_kick; + q->need_kick =3D 0; +} + +/* Find a free request element if any, otherwise: + * a) if in coroutine context, try to wait for one to become available; + * b) if not in coroutine, return NULL; + */ +static NVMeRequest *nvme_get_free_req(NVMeQueuePair *q) +{ + int i; + NVMeRequest *req =3D NULL; + + qemu_mutex_lock(&q->lock); + while (q->inflight + q->need_kick > NVME_QUEUE_SIZE - 2) { + /* We have to leave one slot empty as that is the full queue case = (head + * =3D=3D tail + 1). */ + if (qemu_in_coroutine()) { + trace_nvme_free_req_queue_wait(q); + qemu_co_queue_wait(&q->free_req_queue, &q->lock); + } else { + qemu_mutex_unlock(&q->lock); + return NULL; + } + } + for (i =3D 0; i < NVME_QUEUE_SIZE; i++) { + if (!q->reqs[i].busy) { + q->reqs[i].busy =3D true; + req =3D &q->reqs[i]; + break; + } + } + /* We have checked inflight and need_kick while holding q->lock, so one + * free req must be available. */ + assert(req); + qemu_mutex_unlock(&q->lock); + return req; +} + +static inline int nvme_translate_error(const NvmeCqe *c) +{ + uint16_t status =3D (le16_to_cpu(c->status) >> 1) & 0xFF; + if (status) { + trace_nvme_error(le32_to_cpu(c->result), + le16_to_cpu(c->sq_head), + le16_to_cpu(c->sq_id), + le16_to_cpu(c->cid), + le16_to_cpu(status)); + } + switch (status) { + case 0: + return 0; + case 1: + return -ENOSYS; + case 2: + return -EINVAL; + default: + return -EIO; + } +} + +/* With q->lock */ +static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q) +{ + bool progress =3D false; + NVMeRequest *preq; + NVMeRequest req; + NvmeCqe *c; + + trace_nvme_process_completion(s, q->index, q->inflight); + if (q->busy || s->plugged) { + trace_nvme_process_completion_queue_busy(s, q->index); + return false; + } + q->busy =3D true; + assert(q->inflight >=3D 0); + while (q->inflight) { + int16_t cid; + c =3D (NvmeCqe *)&q->cq.queue[q->cq.head * NVME_CQ_ENTRY_BYTES]; + if (!c->cid || (le16_to_cpu(c->status) & 0x1) =3D=3D q->cq_phase) { + break; + } + q->cq.head =3D (q->cq.head + 1) % NVME_QUEUE_SIZE; + if (!q->cq.head) { + q->cq_phase =3D !q->cq_phase; + } + cid =3D le16_to_cpu(c->cid); + if (cid =3D=3D 0 || cid > NVME_QUEUE_SIZE) { + fprintf(stderr, "Unexpected CID in completion queue: %" PRIu32= "\n", + cid); + continue; + } + assert(cid <=3D NVME_QUEUE_SIZE); + trace_nvme_complete_command(s, q->index, cid); + preq =3D &q->reqs[cid - 1]; + req =3D *preq; + assert(req.cid =3D=3D cid); + assert(req.cb); + preq->busy =3D false; + preq->cb =3D preq->opaque =3D NULL; + qemu_mutex_unlock(&q->lock); + req.cb(req.opaque, nvme_translate_error(c)); + qemu_mutex_lock(&q->lock); + c->cid =3D cpu_to_le16(0); + q->inflight--; + /* Flip Phase Tag bit. */ + c->status =3D cpu_to_le16(le16_to_cpu(c->status) ^ 0x1); + progress =3D true; + } + if (progress) { + /* Notify the device so it can post more completions. */ + smp_mb_release(); + *q->cq.doorbell =3D cpu_to_le32(q->cq.head); + if (!qemu_co_queue_empty(&q->free_req_queue)) { + aio_bh_schedule_oneshot(s->aio_context, nvme_free_req_queue_cb= , q); + } + } + q->busy =3D false; + return progress; +} + +static void nvme_trace_command(const NvmeCmd *cmd) +{ + int i; + + for (i =3D 0; i < 8; ++i) { + uint8_t *cmdp =3D (uint8_t *)cmd + i * 8; + trace_nvme_submit_command_raw(cmdp[0], cmdp[1], cmdp[2], cmdp[3], + cmdp[4], cmdp[5], cmdp[6], cmdp[7]); + } +} + +static void nvme_submit_command(BDRVNVMeState *s, NVMeQueuePair *q, + NVMeRequest *req, + NvmeCmd *cmd, BlockCompletionFunc cb, + void *opaque) +{ + assert(!req->cb); + req->cb =3D cb; + req->opaque =3D opaque; + cmd->cid =3D cpu_to_le32(req->cid); + + trace_nvme_submit_command(s, q->index, req->cid); + nvme_trace_command(cmd); + qemu_mutex_lock(&q->lock); + memcpy((uint8_t *)q->sq.queue + + q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd)); + q->sq.tail =3D (q->sq.tail + 1) % NVME_QUEUE_SIZE; + q->need_kick++; + nvme_kick(s, q); + nvme_process_completion(s, q); + qemu_mutex_unlock(&q->lock); +} + +static void nvme_cmd_sync_cb(void *opaque, int ret) +{ + int *pret =3D opaque; + *pret =3D ret; +} + +static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q, + NvmeCmd *cmd) +{ + NVMeRequest *req; + BDRVNVMeState *s =3D bs->opaque; + int ret =3D -EINPROGRESS; + req =3D nvme_get_free_req(q); + if (!req) { + return -EBUSY; + } + nvme_submit_command(s, q, req, cmd, nvme_cmd_sync_cb, &ret); + + BDRV_POLL_WHILE(bs, ret =3D=3D -EINPROGRESS); + return ret; +} + +static void nvme_identify(BlockDriverState *bs, int namespace, Error **err= p) +{ + BDRVNVMeState *s =3D bs->opaque; + NvmeIdCtrl *idctrl; + NvmeIdNs *idns; + uint8_t *resp; + int r; + uint64_t iova; + NvmeCmd cmd =3D { + .opcode =3D NVME_ADM_CMD_IDENTIFY, + .cdw10 =3D cpu_to_le32(0x1), + }; + + resp =3D qemu_try_blockalign0(bs, sizeof(NvmeIdCtrl)); + if (!resp) { + error_setg(errp, "Cannot allocate buffer for identify response"); + goto out; + } + idctrl =3D (NvmeIdCtrl *)resp; + idns =3D (NvmeIdNs *)resp; + r =3D qemu_vfio_dma_map(s->vfio, resp, sizeof(NvmeIdCtrl), true, &iova= ); + if (r) { + error_setg(errp, "Cannot map buffer for DMA"); + goto out; + } + cmd.prp1 =3D cpu_to_le64(iova); + + if (nvme_cmd_sync(bs, s->queues[0], &cmd)) { + error_setg(errp, "Failed to identify controller"); + goto out; + } + + if (le32_to_cpu(idctrl->nn) < namespace) { + error_setg(errp, "Invalid namespace"); + goto out; + } + s->write_cache_supported =3D le32_to_cpu(idctrl->vwc) & 0x1; + s->max_transfer =3D (idctrl->mdts ? 1 << idctrl->mdts : 0) * s->page_s= ize; + /* For now the page list buffer per command is one page, to hold at mo= st + * s->page_size / sizeof(uint64_t) entries. */ + s->max_transfer =3D MIN_NON_ZERO(s->max_transfer, + s->page_size / sizeof(uint64_t) * s->page_size); + + memset(resp, 0, 4096); + + cmd.cdw10 =3D 0; + cmd.nsid =3D cpu_to_le32(namespace); + if (nvme_cmd_sync(bs, s->queues[0], &cmd)) { + error_setg(errp, "Failed to identify namespace"); + goto out; + } + + s->nsze =3D le64_to_cpu(idns->nsze); + +out: + qemu_vfio_dma_unmap(s->vfio, resp); + qemu_vfree(resp); +} + +static bool nvme_poll_queues(BDRVNVMeState *s) +{ + bool progress =3D false; + int i; + + for (i =3D 0; i < s->nr_queues; i++) { + NVMeQueuePair *q =3D s->queues[i]; + qemu_mutex_lock(&q->lock); + while (nvme_process_completion(s, q)) { + /* Keep polling */ + progress =3D true; + } + qemu_mutex_unlock(&q->lock); + } + return progress; +} + +static void nvme_handle_event(EventNotifier *n) +{ + BDRVNVMeState *s =3D container_of(n, BDRVNVMeState, irq_notifier); + + trace_nvme_handle_event(s); + aio_context_acquire(s->aio_context); + event_notifier_test_and_clear(n); + nvme_poll_queues(s); + aio_context_release(s->aio_context); +} + +static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp) +{ + BDRVNVMeState *s =3D bs->opaque; + int n =3D s->nr_queues; + NVMeQueuePair *q; + NvmeCmd cmd; + int queue_size =3D NVME_QUEUE_SIZE; + + q =3D nvme_create_queue_pair(bs, n, queue_size, errp); + if (!q) { + return false; + } + cmd =3D (NvmeCmd) { + .opcode =3D NVME_ADM_CMD_CREATE_CQ, + .prp1 =3D cpu_to_le64(q->cq.iova), + .cdw10 =3D cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)), + .cdw11 =3D cpu_to_le32(0x3), + }; + if (nvme_cmd_sync(bs, s->queues[0], &cmd)) { + error_setg(errp, "Failed to create io queue [%d]", n); + nvme_free_queue_pair(bs, q); + return false; + } + cmd =3D (NvmeCmd) { + .opcode =3D NVME_ADM_CMD_CREATE_SQ, + .prp1 =3D cpu_to_le64(q->sq.iova), + .cdw10 =3D cpu_to_le32(((queue_size - 1) << 16) | (n & 0xFFFF)), + .cdw11 =3D cpu_to_le32(0x1 | (n << 16)), + }; + if (nvme_cmd_sync(bs, s->queues[0], &cmd)) { + error_setg(errp, "Failed to create io queue [%d]", n); + nvme_free_queue_pair(bs, q); + return false; + } + s->queues =3D g_renew(NVMeQueuePair *, s->queues, n + 1); + s->queues[n] =3D q; + s->nr_queues++; + return true; +} + +static bool nvme_poll_cb(void *opaque) +{ + EventNotifier *e =3D opaque; + BDRVNVMeState *s =3D container_of(e, BDRVNVMeState, irq_notifier); + bool progress =3D false; + + trace_nvme_poll_cb(s); + progress =3D nvme_poll_queues(s); + return progress; +} + +static int nvme_init(BlockDriverState *bs, const char *device, int namespa= ce, + Error **errp) +{ + BDRVNVMeState *s =3D bs->opaque; + int ret; + uint64_t cap; + uint64_t timeout_ms; + uint64_t deadline, now; + Error *local_err =3D NULL; + + qemu_co_mutex_init(&s->dma_map_lock); + qemu_co_queue_init(&s->dma_flush_queue); + s->nsid =3D namespace; + s->aio_context =3D bdrv_get_aio_context(bs); + ret =3D event_notifier_init(&s->irq_notifier, 0); + if (ret) { + error_setg(errp, "Failed to init event notifier"); + return ret; + } + + s->vfio =3D qemu_vfio_open_pci(device, errp); + if (!s->vfio) { + ret =3D -EINVAL; + goto fail; + } + + s->regs =3D qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE, errp); + if (!s->regs) { + ret =3D -EINVAL; + goto fail; + } + + /* Perform initialize sequence as described in NVMe spec "7.6.1 + * Initialization". */ + + cap =3D le64_to_cpu(s->regs->cap); + if (!(cap & (1ULL << 37))) { + error_setg(errp, "Device doesn't support NVMe command set"); + ret =3D -EINVAL; + goto fail; + } + + s->page_size =3D MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); + s->doorbell_scale =3D (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); + bs->bl.opt_mem_alignment =3D s->page_size; + timeout_ms =3D MIN(500 * ((cap >> 24) & 0xFF), 30000); + + /* Reset device to get a clean state. */ + s->regs->cc =3D cpu_to_le32(le32_to_cpu(s->regs->cc) & 0xFE); + /* Wait for CSTS.RDY =3D 0. */ + deadline =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * 100= 0000ULL; + while (le32_to_cpu(s->regs->csts) & 0x1) { + if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) { + error_setg(errp, "Timeout while waiting for device to reset (%" + PRId64 " ms)", + timeout_ms); + ret =3D -ETIMEDOUT; + goto fail; + } + } + + /* Set up admin queue. */ + s->queues =3D g_new(NVMeQueuePair *, 1); + s->nr_queues =3D 1; + s->queues[0] =3D nvme_create_queue_pair(bs, 0, NVME_QUEUE_SIZE, errp); + if (!s->queues[0]) { + ret =3D -EINVAL; + goto fail; + } + QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000); + s->regs->aqa =3D cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE= ); + s->regs->asq =3D cpu_to_le64(s->queues[0]->sq.iova); + s->regs->acq =3D cpu_to_le64(s->queues[0]->cq.iova); + + /* After setting up all control registers we can enable device now. */ + s->regs->cc =3D cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) | + (ctz32(NVME_SQ_ENTRY_BYTES) << 16) | + 0x1); + /* Wait for CSTS.RDY =3D 1. */ + now =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); + deadline =3D now + timeout_ms * 1000000; + while (!(le32_to_cpu(s->regs->csts) & 0x1)) { + if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) { + error_setg(errp, "Timeout while waiting for device to start (%" + PRId64 " ms)", + timeout_ms); + ret =3D -ETIMEDOUT; + goto fail_queue; + } + } + + ret =3D qemu_vfio_pci_init_irq(s->vfio, &s->irq_notifier, + VFIO_PCI_MSIX_IRQ_INDEX, errp); + if (ret) { + goto fail_queue; + } + aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier, + false, nvme_handle_event, nvme_poll_cb); + + nvme_identify(bs, namespace, errp); + if (local_err) { + error_propagate(errp, local_err); + ret =3D -EIO; + goto fail_handler; + } + + /* Set up command queues. */ + if (!nvme_add_io_queue(bs, errp)) { + ret =3D -EIO; + goto fail_handler; + } + return 0; + +fail_handler: + aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier, + false, NULL, NULL); +fail_queue: + nvme_free_queue_pair(bs, s->queues[0]); +fail: + g_free(s->queues); + qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE); + qemu_vfio_close(s->vfio); + event_notifier_cleanup(&s->irq_notifier); + return ret; +} + +/* Parse a filename in the format of nvme://XXXX:XX:XX.X/X. Example: + * + * nvme://0000:44:00.0/1 + * + * where the "nvme://" is a fixed form of the protocol prefix, the middle = part + * is the PCI address, and the last part is the namespace number starting = from + * 1 according to the NVMe spec. */ +static void nvme_parse_filename(const char *filename, QDict *options, + Error **errp) +{ + int pref =3D strlen("nvme://"); + + if (strlen(filename) > pref && !strncmp(filename, "nvme://", pref)) { + const char *tmp =3D filename + pref; + char *device; + const char *namespace; + unsigned long ns; + const char *slash =3D strchr(tmp, '/'); + if (!slash) { + qdict_put(options, NVME_BLOCK_OPT_DEVICE, + qstring_from_str(tmp)); + return; + } + device =3D g_strndup(tmp, slash - tmp); + qdict_put(options, NVME_BLOCK_OPT_DEVICE, qstring_from_str(device)= ); + g_free(device); + namespace =3D slash + 1; + if (*namespace && qemu_strtoul(namespace, NULL, 10, &ns)) { + error_setg(errp, "Invalid namespace '%s', positive number expe= cted", + namespace); + return; + } + qdict_put(options, NVME_BLOCK_OPT_NAMESPACE, + qstring_from_str(*namespace ? namespace : "1")); + } +} + +static int nvme_enable_disable_write_cache(BlockDriverState *bs, bool enab= le, + Error **errp) +{ + int ret; + BDRVNVMeState *s =3D bs->opaque; + NvmeCmd cmd =3D { + .opcode =3D NVME_ADM_CMD_SET_FEATURES, + .nsid =3D cpu_to_le32(s->nsid), + .cdw10 =3D cpu_to_le32(0x06), + .cdw11 =3D cpu_to_le32(enable ? 0x01 : 0x00), + }; + + ret =3D nvme_cmd_sync(bs, s->queues[0], &cmd); + if (ret) { + error_setg(errp, "Failed to configure NVMe write cache"); + } + return ret; +} + +static void nvme_close(BlockDriverState *bs) +{ + int i; + BDRVNVMeState *s =3D bs->opaque; + + for (i =3D 0; i < s->nr_queues; ++i) { + nvme_free_queue_pair(bs, s->queues[i]); + } + aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier, + false, NULL, NULL); + qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE); + qemu_vfio_close(s->vfio); +} + +static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags, + Error **errp) +{ + const char *device; + QemuOpts *opts; + int namespace; + int ret; + BDRVNVMeState *s =3D bs->opaque; + + opts =3D qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); + qemu_opts_absorb_qdict(opts, options, &error_abort); + device =3D qemu_opt_get(opts, NVME_BLOCK_OPT_DEVICE); + if (!device) { + error_setg(errp, "'" NVME_BLOCK_OPT_DEVICE "' option is required"); + qemu_opts_del(opts); + return -EINVAL; + } + + namespace =3D qemu_opt_get_number(opts, NVME_BLOCK_OPT_NAMESPACE, 1); + ret =3D nvme_init(bs, device, namespace, errp); + qemu_opts_del(opts); + if (ret) { + goto fail; + } + if (flags & BDRV_O_NOCACHE) { + if (!s->write_cache_supported) { + error_setg(errp, + "NVMe controller doesn't support write cache config= uration"); + ret =3D -EINVAL; + } else { + ret =3D nvme_enable_disable_write_cache(bs, !(flags & BDRV_O_N= OCACHE), + errp); + } + if (ret) { + goto fail; + } + } + bs->supported_write_flags =3D BDRV_REQ_FUA; + return 0; +fail: + nvme_close(bs); + return ret; +} + +static int64_t nvme_getlength(BlockDriverState *bs) +{ + BDRVNVMeState *s =3D bs->opaque; + + return s->nsze << BDRV_SECTOR_BITS; +} + +/* Called with s->dma_map_lock */ +static coroutine_fn int nvme_cmd_unmap_qiov(BlockDriverState *bs, + QEMUIOVector *qiov) +{ + int r =3D 0; + BDRVNVMeState *s =3D bs->opaque; + + s->dma_map_count -=3D qiov->size; + if (!s->dma_map_count && !qemu_co_queue_empty(&s->dma_flush_queue)) { + r =3D qemu_vfio_dma_reset_temporary(s->vfio); + if (!r) { + qemu_co_queue_restart_all(&s->dma_flush_queue); + } + } + return r; +} + +/* Called with s->dma_map_lock */ +static coroutine_fn int nvme_cmd_map_qiov(BlockDriverState *bs, NvmeCmd *c= md, + NVMeRequest *req, QEMUIOVector *= qiov) +{ + BDRVNVMeState *s =3D bs->opaque; + uint64_t *pagelist =3D req->prp_list_page; + int i, j, r; + int entries =3D 0; + + assert(qiov->size); + assert(QEMU_IS_ALIGNED(qiov->size, s->page_size)); + assert(qiov->size / s->page_size <=3D s->page_size / sizeof(uint64_t)); + for (i =3D 0; i < qiov->niov; ++i) { + bool retry =3D true; + uint64_t iova; +try_map: + r =3D qemu_vfio_dma_map(s->vfio, + qiov->iov[i].iov_base, + qiov->iov[i].iov_len, + true, &iova); + if (r =3D=3D -ENOMEM && retry) { + retry =3D false; + trace_nvme_dma_flush_queue_wait(s); + if (s->dma_map_count) { + trace_nvme_dma_map_flush(s); + qemu_co_queue_wait(&s->dma_flush_queue, &s->dma_map_lock); + } else { + r =3D qemu_vfio_dma_reset_temporary(s->vfio); + if (r) { + goto fail; + } + } + goto try_map; + } + if (r) { + goto fail; + } + + for (j =3D 0; j < qiov->iov[i].iov_len / s->page_size; j++) { + pagelist[entries++] =3D iova + j * s->page_size; + } + trace_nvme_cmd_map_qiov_iov(s, i, qiov->iov[i].iov_base, + qiov->iov[i].iov_len / s->page_size); + } + + s->dma_map_count +=3D qiov->size; + + assert(entries <=3D s->page_size / sizeof(uint64_t)); + switch (entries) { + case 0: + abort(); + case 1: + cmd->prp1 =3D cpu_to_le64(pagelist[0]); + cmd->prp2 =3D 0; + break; + case 2: + cmd->prp1 =3D cpu_to_le64(pagelist[0]); + cmd->prp2 =3D cpu_to_le64(pagelist[1]);; + break; + default: + cmd->prp1 =3D cpu_to_le64(pagelist[0]); + cmd->prp2 =3D cpu_to_le64(req->prp_list_iova); + for (i =3D 0; i < entries - 1; ++i) { + pagelist[i] =3D cpu_to_le64(pagelist[i + 1]); + } + pagelist[entries - 1] =3D 0; + break; + } + trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries); + for (i =3D 0; i < entries; ++i) { + trace_nvme_cmd_map_qiov_pages(s, i, pagelist[i]); + } + return 0; +fail: + /* No need to unmap [0 - i) iovs even if we've failed, since we don't + * increment s->dma_map_count. This is okay for fixed mapping memory a= reas + * because they are already mapped before calling this function; for + * temporary mappings, a later nvme_cmd_(un)map_qiov will reclaim by + * calling qemu_vfio_dma_reset_temporary when necessary. */ + return r; +} + +typedef struct { + Coroutine *co; + int ret; + AioContext *ctx; +} NVMeCoData; + +static void nvme_rw_cb_bh(void *opaque) +{ + NVMeCoData *data =3D opaque; + qemu_coroutine_enter(data->co); +} + +static void nvme_rw_cb(void *opaque, int ret) +{ + NVMeCoData *data =3D opaque; + data->ret =3D ret; + if (!data->co) { + /* The rw coroutine hasn't yielded, don't try to enter. */ + return; + } + aio_bh_schedule_oneshot(data->ctx, nvme_rw_cb_bh, data); +} + +static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs, + uint64_t offset, uint64_t byte= s, + QEMUIOVector *qiov, + bool is_write, + int flags) +{ + int r; + BDRVNVMeState *s =3D bs->opaque; + NVMeQueuePair *ioq =3D s->queues[1]; + NVMeRequest *req; + uint32_t cdw12 =3D (((bytes >> BDRV_SECTOR_BITS) - 1) & 0xFFFF) | + (flags & BDRV_REQ_FUA ? 1 << 30 : 0); + NvmeCmd cmd =3D { + .opcode =3D is_write ? NVME_CMD_WRITE : NVME_CMD_READ, + .nsid =3D cpu_to_le32(s->nsid), + .cdw10 =3D cpu_to_le32((offset >> BDRV_SECTOR_BITS) & 0xFFFFFFFF), + .cdw11 =3D cpu_to_le32(((offset >> BDRV_SECTOR_BITS) >> 32) & 0xFF= FFFFFF), + .cdw12 =3D cpu_to_le32(cdw12), + }; + NVMeCoData data =3D { + .ctx =3D bdrv_get_aio_context(bs), + .ret =3D -EINPROGRESS, + }; + + trace_nvme_prw_aligned(s, is_write, offset, bytes, flags, qiov->niov); + assert(s->nr_queues > 1); + req =3D nvme_get_free_req(ioq); + assert(req); + + qemu_co_mutex_lock(&s->dma_map_lock); + r =3D nvme_cmd_map_qiov(bs, &cmd, req, qiov); + qemu_co_mutex_unlock(&s->dma_map_lock); + if (r) { + req->busy =3D false; + return r; + } + nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data); + + data.co =3D qemu_coroutine_self(); + while (data.ret =3D=3D -EINPROGRESS) { + qemu_coroutine_yield(); + } + + qemu_co_mutex_lock(&s->dma_map_lock); + r =3D nvme_cmd_unmap_qiov(bs, qiov); + qemu_co_mutex_unlock(&s->dma_map_lock); + if (r) { + return r; + } + + trace_nvme_rw_done(s, is_write, offset, bytes, data.ret); + return data.ret; +} + +static inline bool nvme_qiov_aligned(BlockDriverState *bs, + const QEMUIOVector *qiov) +{ + int i; + BDRVNVMeState *s =3D bs->opaque; + + for (i =3D 0; i < qiov->niov; ++i) { + if (!QEMU_PTR_IS_ALIGNED(qiov->iov[i].iov_base, s->page_size) || + !QEMU_IS_ALIGNED(qiov->iov[i].iov_len, s->page_size)) { + trace_nvme_qiov_unaligned(qiov, i, qiov->iov[i].iov_base, + qiov->iov[i].iov_len, s->page_size); + return false; + } + } + return true; +} + +static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t byt= es, + QEMUIOVector *qiov, bool is_write, int flags) +{ + BDRVNVMeState *s =3D bs->opaque; + int r; + uint8_t *buf =3D NULL; + QEMUIOVector local_qiov; + + assert(QEMU_IS_ALIGNED(offset, s->page_size)); + assert(QEMU_IS_ALIGNED(bytes, s->page_size)); + assert(bytes <=3D s->max_transfer); + if (nvme_qiov_aligned(bs, qiov)) { + return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flag= s); + } + trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write); + buf =3D qemu_try_blockalign(bs, bytes); + + if (!buf) { + return -ENOMEM; + } + qemu_iovec_init(&local_qiov, 1); + if (is_write) { + qemu_iovec_to_buf(qiov, 0, buf, bytes); + } + qemu_iovec_add(&local_qiov, buf, bytes); + r =3D nvme_co_prw_aligned(bs, offset, bytes, &local_qiov, is_write, fl= ags); + qemu_iovec_destroy(&local_qiov); + if (!r && !is_write) { + qemu_iovec_from_buf(qiov, 0, buf, bytes); + } + qemu_vfree(buf); + return r; +} + +static coroutine_fn int nvme_co_preadv(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + return nvme_co_prw(bs, offset, bytes, qiov, false, flags); +} + +static coroutine_fn int nvme_co_pwritev(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + return nvme_co_prw(bs, offset, bytes, qiov, true, flags); +} + +static coroutine_fn int nvme_co_flush(BlockDriverState *bs) +{ + BDRVNVMeState *s =3D bs->opaque; + NVMeQueuePair *ioq =3D s->queues[1]; + NVMeRequest *req; + NvmeCmd cmd =3D { + .opcode =3D NVME_CMD_FLUSH, + .nsid =3D cpu_to_le32(s->nsid), + }; + NVMeCoData data =3D { + .ctx =3D bdrv_get_aio_context(bs), + .ret =3D -EINPROGRESS, + }; + + assert(s->nr_queues > 1); + req =3D nvme_get_free_req(ioq); + assert(req); + nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data); + + data.co =3D qemu_coroutine_self(); + if (data.ret =3D=3D -EINPROGRESS) { + qemu_coroutine_yield(); + } + + return data.ret; +} + + +static int nvme_reopen_prepare(BDRVReopenState *reopen_state, + BlockReopenQueue *queue, Error **errp) +{ + return 0; +} + +static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs, + int64_t sector_num, + int nb_sectors, int *= pnum, + BlockDriverState **fi= le) +{ + *pnum =3D nb_sectors; + *file =3D bs; + + return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID | + (sector_num << BDRV_SECTOR_BITS); +} + +static void nvme_refresh_filename(BlockDriverState *bs, QDict *opts) +{ + QINCREF(opts); + qdict_del(opts, "filename"); + + if (!qdict_size(opts)) { + snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s://", + bs->drv->format_name); + } + + qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name)); + bs->full_open_options =3D opts; +} + +static void nvme_refresh_limits(BlockDriverState *bs, Error **errp) +{ + BDRVNVMeState *s =3D bs->opaque; + + bs->bl.opt_mem_alignment =3D s->page_size; + bs->bl.request_alignment =3D s->page_size; + bs->bl.max_transfer =3D s->max_transfer; +} + +static void nvme_detach_aio_context(BlockDriverState *bs) +{ + BDRVNVMeState *s =3D bs->opaque; + + aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier, + false, NULL, NULL); +} + +static void nvme_attach_aio_context(BlockDriverState *bs, + AioContext *new_context) +{ + BDRVNVMeState *s =3D bs->opaque; + + s->aio_context =3D new_context; + aio_set_event_notifier(new_context, &s->irq_notifier, + false, nvme_handle_event, nvme_poll_cb); +} + +static void nvme_aio_plug(BlockDriverState *bs) +{ + BDRVNVMeState *s =3D bs->opaque; + s->plugged++; +} + +static void nvme_aio_unplug(BlockDriverState *bs) +{ + int i; + BDRVNVMeState *s =3D bs->opaque; + assert(s->plugged); + if (!--s->plugged) { + for (i =3D 1; i < s->nr_queues; i++) { + NVMeQueuePair *q =3D s->queues[i]; + qemu_mutex_lock(&q->lock); + nvme_kick(s, q); + nvme_process_completion(s, q); + qemu_mutex_unlock(&q->lock); + } + } +} + +static BlockDriver bdrv_nvme =3D { + .format_name =3D "nvme", + .protocol_name =3D "nvme", + .instance_size =3D sizeof(BDRVNVMeState), + + .bdrv_parse_filename =3D nvme_parse_filename, + .bdrv_file_open =3D nvme_file_open, + .bdrv_close =3D nvme_close, + .bdrv_getlength =3D nvme_getlength, + + .bdrv_co_preadv =3D nvme_co_preadv, + .bdrv_co_pwritev =3D nvme_co_pwritev, + .bdrv_co_flush_to_disk =3D nvme_co_flush, + .bdrv_reopen_prepare =3D nvme_reopen_prepare, + + .bdrv_co_get_block_status =3D nvme_co_get_block_status, + + .bdrv_refresh_filename =3D nvme_refresh_filename, + .bdrv_refresh_limits =3D nvme_refresh_limits, + + .bdrv_detach_aio_context =3D nvme_detach_aio_context, + .bdrv_attach_aio_context =3D nvme_attach_aio_context, + + .bdrv_io_plug =3D nvme_aio_plug, + .bdrv_io_unplug =3D nvme_aio_unplug, +}; + +static void bdrv_nvme_init(void) +{ + bdrv_register(&bdrv_nvme); +} + +block_init(bdrv_nvme_init); diff --git a/block/trace-events b/block/trace-events index 11c8d5f590..02dd80ff0c 100644 --- a/block/trace-events +++ b/block/trace-events @@ -124,3 +124,24 @@ vxhs_open_iio_open(const char *host) "Failed to connec= t to storage agent on host vxhs_parse_uri_hostinfo(char *host, int port) "Host: IP %s, Port %d" vxhs_close(char *vdisk_guid) "Closing vdisk %s" vxhs_get_creds(const char *cacert, const char *client_key, const char *cli= ent_cert) "cacert %s, client_key %s, client_cert %s" + +# block/nvme.c +nvme_kick(void *s, int queue) "s %p queue %d" +nvme_dma_flush_queue_wait(void *s) "s %p" +nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "= cmd_specific %d sq_head %d sqid %d cid %d status 0x%x" +nvme_process_completion(void *s, int index, int inflight) "s %p queue %d i= nflight %d" +nvme_process_completion_queue_busy(void *s, int index) "s %p queue %d" +nvme_complete_command(void *s, int index, int cid) "s %p queue %d cid %d" +nvme_submit_command(void *s, int index, int cid) "s %p queue %d cid %d" +nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, in= t c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x" +nvme_handle_event(void *s) "s %p" +nvme_poll_cb(void *s) "s %p" +nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, i= nt flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flag= s %d niov %d" +nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int = align) "qiov %p n %d base %p size 0x%zx align 0x%x" +nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int = is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d" +nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int r= et) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d" +nvme_dma_map_flush(void *s) "s %p" +nvme_free_req_queue_wait(void *q) "q %p" +nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) = "s %p cmd %p req %p qiov %p entries %d" +nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 0x%"= PRIx64 +nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d]= %p pages %d" --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057287597105.37509720774369; Wed, 7 Feb 2018 18:34:47 -0800 (PST) Received: from localhost ([::1]:59836 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc2t-000287-KI for importer@patchew.org; Wed, 07 Feb 2018 21:34:43 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41715) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp4-0006ES-1y for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbp2-0004Tw-PK for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:26 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57210 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp2-0004Tm-Km for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:24 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 45FABEB6F8; Thu, 8 Feb 2018 02:20:24 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2289FB0794; Thu, 8 Feb 2018 02:20:22 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:47 +0800 Message-Id: <20180208021953.7354-11-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Feb 2018 02:20:24 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 08 Feb 2018 02:20:24 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 10/16] block: Introduce buf register API X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng --- block/block-backend.c | 10 ++++++++++ block/io.c | 24 ++++++++++++++++++++++++ include/block/block.h | 11 ++++++++++- include/block/block_int.h | 9 +++++++++ include/sysemu/block-backend.h | 3 +++ 5 files changed, 56 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index baef8e7abc..f66349c2c9 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2096,3 +2096,13 @@ static void blk_root_drained_end(BdrvChild *child) } } } + +void blk_register_buf(BlockBackend *blk, void *host, size_t size) +{ + bdrv_register_buf(blk_bs(blk), host, size); +} + +void blk_unregister_buf(BlockBackend *blk, void *host) +{ + bdrv_unregister_buf(blk_bs(blk), host); +} diff --git a/block/io.c b/block/io.c index 7ea402352e..89d0745e95 100644 --- a/block/io.c +++ b/block/io.c @@ -2825,3 +2825,27 @@ void bdrv_io_unplug(BlockDriverState *bs) bdrv_io_unplug(child->bs); } } + +void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size) +{ + BdrvChild *child; + + if (bs->drv && bs->drv->bdrv_register_buf) { + bs->drv->bdrv_register_buf(bs, host, size); + } + QLIST_FOREACH(child, &bs->children, next) { + bdrv_register_buf(child->bs, host, size); + } +} + +void bdrv_unregister_buf(BlockDriverState *bs, void *host) +{ + BdrvChild *child; + + if (bs->drv && bs->drv->bdrv_unregister_buf) { + bs->drv->bdrv_unregister_buf(bs, host); + } + QLIST_FOREACH(child, &bs->children, next) { + bdrv_unregister_buf(child->bs, host); + } +} diff --git a/include/block/block.h b/include/block/block.h index 9b12774ddf..2025d7ed19 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -631,5 +631,14 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChil= d *child, Error **errp); =20 bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *nam= e, uint32_t granularity, Error **errp); - +/** + * + * bdrv_register_buf/bdrv_unregister_buf: + * + * Register/unregister a buffer for I/O. For example, VFIO drivers are + * interested to know the memory areas that would later be used for I/O, so + * that they can prepare IOMMU mapping etc., to get better performance. + */ +void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size); +void bdrv_unregister_buf(BlockDriverState *bs, void *host); #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 29cafa4236..99b9190627 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -446,6 +446,15 @@ struct BlockDriver { const char *name, Error **errp); =20 + /** + * Register/unregister a buffer for I/O. For example, when the driver = is + * interested to know the memory areas that will later be used in iovs= , so + * that it can do IOMMU mapping with VFIO etc., in order to get better + * performance. In the case of VFIO drivers, this callback is used to = do + * DMA mapping for hot buffers. + */ + void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t siz= e); + void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host); QLIST_ENTRY(BlockDriver) list; }; =20 diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index c4e52a5fa3..92ab624fac 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -229,4 +229,7 @@ void blk_io_limits_enable(BlockBackend *blk, const char= *group); void blk_io_limits_update_group(BlockBackend *blk, const char *group); void blk_set_force_allow_inactivate(BlockBackend *blk); =20 +void blk_register_buf(BlockBackend *blk, void *host, size_t size); +void blk_unregister_buf(BlockBackend *blk, void *host); + #endif --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056807721935.0818082211481; Wed, 7 Feb 2018 18:26:47 -0800 (PST) Received: from localhost ([::1]:59244 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbv8-0003Ph-HC for importer@patchew.org; Wed, 07 Feb 2018 21:26:42 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41742) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp5-0006G8-Ci for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbp4-0004UW-FT for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:27 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53808 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp4-0004UI-Aq for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:26 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 03535818532A; Thu, 8 Feb 2018 02:20:26 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id D1733B0797; Thu, 8 Feb 2018 02:20:24 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:48 +0800 Message-Id: <20180208021953.7354-12-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:26 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:26 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 11/16] block/nvme: Implement .bdrv_(un)register_buf X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Forward these two calls to the IOVA manager. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-6-famz@redhat.com> Signed-off-by: Fam Zheng --- block/nvme.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/block/nvme.c b/block/nvme.c index 0bae185b88..a487b4d381 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -1147,6 +1147,27 @@ static void nvme_aio_unplug(BlockDriverState *bs) } } =20 +static void nvme_register_buf(BlockDriverState *bs, void *host, size_t siz= e) +{ + int ret; + BDRVNVMeState *s =3D bs->opaque; + + ret =3D qemu_vfio_dma_map(s->vfio, host, size, false, NULL); + if (ret) { + /* FIXME: we may run out of IOVA addresses after repeated + * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_un= map + * doesn't reclaim addresses for fixed mappings. */ + error_report("nvme_register_buf failed: %s", strerror(-ret)); + } +} + +static void nvme_unregister_buf(BlockDriverState *bs, void *host) +{ + BDRVNVMeState *s =3D bs->opaque; + + qemu_vfio_dma_unmap(s->vfio, host); +} + static BlockDriver bdrv_nvme =3D { .format_name =3D "nvme", .protocol_name =3D "nvme", @@ -1172,6 +1193,9 @@ static BlockDriver bdrv_nvme =3D { =20 .bdrv_io_plug =3D nvme_aio_plug, .bdrv_io_unplug =3D nvme_aio_unplug, + + .bdrv_register_buf =3D nvme_register_buf, + .bdrv_unregister_buf =3D nvme_unregister_buf, }; =20 static void bdrv_nvme_init(void) --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057435107175.75693411091493; Wed, 7 Feb 2018 18:37:15 -0800 (PST) Received: from localhost ([::1]:60307 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc5I-0004B0-4h for importer@patchew.org; Wed, 07 Feb 2018 21:37:12 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41772) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbp9-0006MB-EA for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbp6-0004VH-5t for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:31 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53814 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp6-0004V3-1R for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:28 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A945E818532A; Thu, 8 Feb 2018 02:20:27 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 87179B0789; Thu, 8 Feb 2018 02:20:26 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:49 +0800 Message-Id: <20180208021953.7354-13-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:27 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:27 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 12/16] qemu-img: Map bench buffer X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-7-famz@redhat.com> Signed-off-by: Fam Zheng --- qemu-img.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index 68b375f998..28d0e4e9f8 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -3862,6 +3862,7 @@ static int img_bench(int argc, char **argv) struct timeval t1, t2; int i; bool force_share =3D false; + size_t buf_size; =20 for (;;) { static const struct option long_options[] =3D { @@ -4050,9 +4051,12 @@ static int img_bench(int argc, char **argv) printf("Sending flush every %d requests\n", flush_interval); } =20 - data.buf =3D blk_blockalign(blk, data.nrreq * data.bufsize); + buf_size =3D data.nrreq * data.bufsize; + data.buf =3D blk_blockalign(blk, buf_size); memset(data.buf, pattern, data.nrreq * data.bufsize); =20 + blk_register_buf(blk, data.buf, buf_size); + data.qiov =3D g_new(QEMUIOVector, data.nrreq); for (i =3D 0; i < data.nrreq; i++) { qemu_iovec_init(&data.qiov[i], 1); @@ -4073,6 +4077,9 @@ static int img_bench(int argc, char **argv) + ((double)(t2.tv_usec - t1.tv_usec) / 1000000)); =20 out: + if (data.buf) { + blk_unregister_buf(blk, data.buf); + } qemu_vfree(data.buf); blk_unref(blk); =20 --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057010452660.5033882582603; Wed, 7 Feb 2018 18:30:10 -0800 (PST) Received: from localhost ([::1]:59310 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbyT-0006PI-9x for importer@patchew.org; Wed, 07 Feb 2018 21:30:09 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41823) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbpC-0006QN-HL for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbp8-0004WB-G5 for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:34 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51478 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp8-0004W4-8n for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:30 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E736440201A0; Thu, 8 Feb 2018 02:20:29 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 39A3CB0794; Thu, 8 Feb 2018 02:20:27 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:50 +0800 Message-Id: <20180208021953.7354-14-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 08 Feb 2018 02:20:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 08 Feb 2018 02:20:29 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 13/16] block: Move NVMe constants to a separate header X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng --- block/nvme.c | 7 +- hw/block/nvme.h | 698 +----------------------------------------------= --- include/block/nvme.h | 700 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ 3 files changed, 702 insertions(+), 703 deletions(-) create mode 100644 include/block/nvme.h diff --git a/block/nvme.c b/block/nvme.c index a487b4d381..e9d0e218fc 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -22,12 +22,7 @@ #include "block/block_int.h" #include "trace.h" =20 -/* TODO: Move nvme spec definitions from hw/block/nvme.h into a separate f= ile - * that doesn't depend on dma/pci headers. */ -#include "sysemu/dma.h" -#include "hw/pci/pci.h" -#include "hw/block/block.h" -#include "hw/block/nvme.h" +#include "block/nvme.h" =20 #define NVME_SQ_ENTRY_BYTES 64 #define NVME_CQ_ENTRY_BYTES 16 diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 7b62dad072..8f3981121d 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -1,703 +1,7 @@ #ifndef HW_NVME_H #define HW_NVME_H #include "qemu/cutils.h" - -typedef struct NvmeBar { - uint64_t cap; - uint32_t vs; - uint32_t intms; - uint32_t intmc; - uint32_t cc; - uint32_t rsvd1; - uint32_t csts; - uint32_t nssrc; - uint32_t aqa; - uint64_t asq; - uint64_t acq; - uint32_t cmbloc; - uint32_t cmbsz; -} NvmeBar; - -enum NvmeCapShift { - CAP_MQES_SHIFT =3D 0, - CAP_CQR_SHIFT =3D 16, - CAP_AMS_SHIFT =3D 17, - CAP_TO_SHIFT =3D 24, - CAP_DSTRD_SHIFT =3D 32, - CAP_NSSRS_SHIFT =3D 33, - CAP_CSS_SHIFT =3D 37, - CAP_MPSMIN_SHIFT =3D 48, - CAP_MPSMAX_SHIFT =3D 52, -}; - -enum NvmeCapMask { - CAP_MQES_MASK =3D 0xffff, - CAP_CQR_MASK =3D 0x1, - CAP_AMS_MASK =3D 0x3, - CAP_TO_MASK =3D 0xff, - CAP_DSTRD_MASK =3D 0xf, - CAP_NSSRS_MASK =3D 0x1, - CAP_CSS_MASK =3D 0xff, - CAP_MPSMIN_MASK =3D 0xf, - CAP_MPSMAX_MASK =3D 0xf, -}; - -#define NVME_CAP_MQES(cap) (((cap) >> CAP_MQES_SHIFT) & CAP_MQES_MASK) -#define NVME_CAP_CQR(cap) (((cap) >> CAP_CQR_SHIFT) & CAP_CQR_MASK) -#define NVME_CAP_AMS(cap) (((cap) >> CAP_AMS_SHIFT) & CAP_AMS_MASK) -#define NVME_CAP_TO(cap) (((cap) >> CAP_TO_SHIFT) & CAP_TO_MASK) -#define NVME_CAP_DSTRD(cap) (((cap) >> CAP_DSTRD_SHIFT) & CAP_DSTRD_MASK) -#define NVME_CAP_NSSRS(cap) (((cap) >> CAP_NSSRS_SHIFT) & CAP_NSSRS_MASK) -#define NVME_CAP_CSS(cap) (((cap) >> CAP_CSS_SHIFT) & CAP_CSS_MASK) -#define NVME_CAP_MPSMIN(cap)(((cap) >> CAP_MPSMIN_SHIFT) & CAP_MPSMIN_MASK) -#define NVME_CAP_MPSMAX(cap)(((cap) >> CAP_MPSMAX_SHIFT) & CAP_MPSMAX_MASK) - -#define NVME_CAP_SET_MQES(cap, val) (cap |=3D (uint64_t)(val & CAP_MQES_= MASK) \ - << CAP_MQES_SHI= FT) -#define NVME_CAP_SET_CQR(cap, val) (cap |=3D (uint64_t)(val & CAP_CQR_M= ASK) \ - << CAP_CQR_SHIF= T) -#define NVME_CAP_SET_AMS(cap, val) (cap |=3D (uint64_t)(val & CAP_AMS_M= ASK) \ - << CAP_AMS_SHIF= T) -#define NVME_CAP_SET_TO(cap, val) (cap |=3D (uint64_t)(val & CAP_TO_MA= SK) \ - << CAP_TO_SHIFT) -#define NVME_CAP_SET_DSTRD(cap, val) (cap |=3D (uint64_t)(val & CAP_DSTRD= _MASK) \ - << CAP_DSTRD_SH= IFT) -#define NVME_CAP_SET_NSSRS(cap, val) (cap |=3D (uint64_t)(val & CAP_NSSRS= _MASK) \ - << CAP_NSSRS_SH= IFT) -#define NVME_CAP_SET_CSS(cap, val) (cap |=3D (uint64_t)(val & CAP_CSS_M= ASK) \ - << CAP_CSS_SHIF= T) -#define NVME_CAP_SET_MPSMIN(cap, val) (cap |=3D (uint64_t)(val & CAP_MPSMI= N_MASK)\ - << CAP_MPSMIN_S= HIFT) -#define NVME_CAP_SET_MPSMAX(cap, val) (cap |=3D (uint64_t)(val & CAP_MPSMA= X_MASK)\ - << CAP_MPSMAX_= SHIFT) - -enum NvmeCcShift { - CC_EN_SHIFT =3D 0, - CC_CSS_SHIFT =3D 4, - CC_MPS_SHIFT =3D 7, - CC_AMS_SHIFT =3D 11, - CC_SHN_SHIFT =3D 14, - CC_IOSQES_SHIFT =3D 16, - CC_IOCQES_SHIFT =3D 20, -}; - -enum NvmeCcMask { - CC_EN_MASK =3D 0x1, - CC_CSS_MASK =3D 0x7, - CC_MPS_MASK =3D 0xf, - CC_AMS_MASK =3D 0x7, - CC_SHN_MASK =3D 0x3, - CC_IOSQES_MASK =3D 0xf, - CC_IOCQES_MASK =3D 0xf, -}; - -#define NVME_CC_EN(cc) ((cc >> CC_EN_SHIFT) & CC_EN_MASK) -#define NVME_CC_CSS(cc) ((cc >> CC_CSS_SHIFT) & CC_CSS_MASK) -#define NVME_CC_MPS(cc) ((cc >> CC_MPS_SHIFT) & CC_MPS_MASK) -#define NVME_CC_AMS(cc) ((cc >> CC_AMS_SHIFT) & CC_AMS_MASK) -#define NVME_CC_SHN(cc) ((cc >> CC_SHN_SHIFT) & CC_SHN_MASK) -#define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK) -#define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK) - -enum NvmeCstsShift { - CSTS_RDY_SHIFT =3D 0, - CSTS_CFS_SHIFT =3D 1, - CSTS_SHST_SHIFT =3D 2, - CSTS_NSSRO_SHIFT =3D 4, -}; - -enum NvmeCstsMask { - CSTS_RDY_MASK =3D 0x1, - CSTS_CFS_MASK =3D 0x1, - CSTS_SHST_MASK =3D 0x3, - CSTS_NSSRO_MASK =3D 0x1, -}; - -enum NvmeCsts { - NVME_CSTS_READY =3D 1 << CSTS_RDY_SHIFT, - NVME_CSTS_FAILED =3D 1 << CSTS_CFS_SHIFT, - NVME_CSTS_SHST_NORMAL =3D 0 << CSTS_SHST_SHIFT, - NVME_CSTS_SHST_PROGRESS =3D 1 << CSTS_SHST_SHIFT, - NVME_CSTS_SHST_COMPLETE =3D 2 << CSTS_SHST_SHIFT, - NVME_CSTS_NSSRO =3D 1 << CSTS_NSSRO_SHIFT, -}; - -#define NVME_CSTS_RDY(csts) ((csts >> CSTS_RDY_SHIFT) & CSTS_RDY_MAS= K) -#define NVME_CSTS_CFS(csts) ((csts >> CSTS_CFS_SHIFT) & CSTS_CFS_MAS= K) -#define NVME_CSTS_SHST(csts) ((csts >> CSTS_SHST_SHIFT) & CSTS_SHST_MA= SK) -#define NVME_CSTS_NSSRO(csts) ((csts >> CSTS_NSSRO_SHIFT) & CSTS_NSSRO_M= ASK) - -enum NvmeAqaShift { - AQA_ASQS_SHIFT =3D 0, - AQA_ACQS_SHIFT =3D 16, -}; - -enum NvmeAqaMask { - AQA_ASQS_MASK =3D 0xfff, - AQA_ACQS_MASK =3D 0xfff, -}; - -#define NVME_AQA_ASQS(aqa) ((aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) -#define NVME_AQA_ACQS(aqa) ((aqa >> AQA_ACQS_SHIFT) & AQA_ACQS_MASK) - -enum NvmeCmblocShift { - CMBLOC_BIR_SHIFT =3D 0, - CMBLOC_OFST_SHIFT =3D 12, -}; - -enum NvmeCmblocMask { - CMBLOC_BIR_MASK =3D 0x7, - CMBLOC_OFST_MASK =3D 0xfffff, -}; - -#define NVME_CMBLOC_BIR(cmbloc) ((cmbloc >> CMBLOC_BIR_SHIFT) & \ - CMBLOC_BIR_MASK) -#define NVME_CMBLOC_OFST(cmbloc)((cmbloc >> CMBLOC_OFST_SHIFT) & \ - CMBLOC_OFST_MASK) - -#define NVME_CMBLOC_SET_BIR(cmbloc, val) \ - (cmbloc |=3D (uint64_t)(val & CMBLOC_BIR_MASK) << CMBLOC_BIR_SHIFT) -#define NVME_CMBLOC_SET_OFST(cmbloc, val) \ - (cmbloc |=3D (uint64_t)(val & CMBLOC_OFST_MASK) << CMBLOC_OFST_SHIFT) - -enum NvmeCmbszShift { - CMBSZ_SQS_SHIFT =3D 0, - CMBSZ_CQS_SHIFT =3D 1, - CMBSZ_LISTS_SHIFT =3D 2, - CMBSZ_RDS_SHIFT =3D 3, - CMBSZ_WDS_SHIFT =3D 4, - CMBSZ_SZU_SHIFT =3D 8, - CMBSZ_SZ_SHIFT =3D 12, -}; - -enum NvmeCmbszMask { - CMBSZ_SQS_MASK =3D 0x1, - CMBSZ_CQS_MASK =3D 0x1, - CMBSZ_LISTS_MASK =3D 0x1, - CMBSZ_RDS_MASK =3D 0x1, - CMBSZ_WDS_MASK =3D 0x1, - CMBSZ_SZU_MASK =3D 0xf, - CMBSZ_SZ_MASK =3D 0xfffff, -}; - -#define NVME_CMBSZ_SQS(cmbsz) ((cmbsz >> CMBSZ_SQS_SHIFT) & CMBSZ_SQS_M= ASK) -#define NVME_CMBSZ_CQS(cmbsz) ((cmbsz >> CMBSZ_CQS_SHIFT) & CMBSZ_CQS_M= ASK) -#define NVME_CMBSZ_LISTS(cmbsz)((cmbsz >> CMBSZ_LISTS_SHIFT) & CMBSZ_LISTS= _MASK) -#define NVME_CMBSZ_RDS(cmbsz) ((cmbsz >> CMBSZ_RDS_SHIFT) & CMBSZ_RDS_M= ASK) -#define NVME_CMBSZ_WDS(cmbsz) ((cmbsz >> CMBSZ_WDS_SHIFT) & CMBSZ_WDS_M= ASK) -#define NVME_CMBSZ_SZU(cmbsz) ((cmbsz >> CMBSZ_SZU_SHIFT) & CMBSZ_SZU_M= ASK) -#define NVME_CMBSZ_SZ(cmbsz) ((cmbsz >> CMBSZ_SZ_SHIFT) & CMBSZ_SZ_MA= SK) - -#define NVME_CMBSZ_SET_SQS(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_SQS_MASK) << CMBSZ_SQS_SHIFT) -#define NVME_CMBSZ_SET_CQS(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_CQS_MASK) << CMBSZ_CQS_SHIFT) -#define NVME_CMBSZ_SET_LISTS(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_LISTS_MASK) << CMBSZ_LISTS_SHIFT) -#define NVME_CMBSZ_SET_RDS(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_RDS_MASK) << CMBSZ_RDS_SHIFT) -#define NVME_CMBSZ_SET_WDS(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_WDS_MASK) << CMBSZ_WDS_SHIFT) -#define NVME_CMBSZ_SET_SZU(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_SZU_MASK) << CMBSZ_SZU_SHIFT) -#define NVME_CMBSZ_SET_SZ(cmbsz, val) \ - (cmbsz |=3D (uint64_t)(val & CMBSZ_SZ_MASK) << CMBSZ_SZ_SHIFT) - -#define NVME_CMBSZ_GETSIZE(cmbsz) \ - (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz)))) - -typedef struct NvmeCmd { - uint8_t opcode; - uint8_t fuse; - uint16_t cid; - uint32_t nsid; - uint64_t res1; - uint64_t mptr; - uint64_t prp1; - uint64_t prp2; - uint32_t cdw10; - uint32_t cdw11; - uint32_t cdw12; - uint32_t cdw13; - uint32_t cdw14; - uint32_t cdw15; -} NvmeCmd; - -enum NvmeAdminCommands { - NVME_ADM_CMD_DELETE_SQ =3D 0x00, - NVME_ADM_CMD_CREATE_SQ =3D 0x01, - NVME_ADM_CMD_GET_LOG_PAGE =3D 0x02, - NVME_ADM_CMD_DELETE_CQ =3D 0x04, - NVME_ADM_CMD_CREATE_CQ =3D 0x05, - NVME_ADM_CMD_IDENTIFY =3D 0x06, - NVME_ADM_CMD_ABORT =3D 0x08, - NVME_ADM_CMD_SET_FEATURES =3D 0x09, - NVME_ADM_CMD_GET_FEATURES =3D 0x0a, - NVME_ADM_CMD_ASYNC_EV_REQ =3D 0x0c, - NVME_ADM_CMD_ACTIVATE_FW =3D 0x10, - NVME_ADM_CMD_DOWNLOAD_FW =3D 0x11, - NVME_ADM_CMD_FORMAT_NVM =3D 0x80, - NVME_ADM_CMD_SECURITY_SEND =3D 0x81, - NVME_ADM_CMD_SECURITY_RECV =3D 0x82, -}; - -enum NvmeIoCommands { - NVME_CMD_FLUSH =3D 0x00, - NVME_CMD_WRITE =3D 0x01, - NVME_CMD_READ =3D 0x02, - NVME_CMD_WRITE_UNCOR =3D 0x04, - NVME_CMD_COMPARE =3D 0x05, - NVME_CMD_WRITE_ZEROS =3D 0x08, - NVME_CMD_DSM =3D 0x09, -}; - -typedef struct NvmeDeleteQ { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t rsvd1[9]; - uint16_t qid; - uint16_t rsvd10; - uint32_t rsvd11[5]; -} NvmeDeleteQ; - -typedef struct NvmeCreateCq { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t rsvd1[5]; - uint64_t prp1; - uint64_t rsvd8; - uint16_t cqid; - uint16_t qsize; - uint16_t cq_flags; - uint16_t irq_vector; - uint32_t rsvd12[4]; -} NvmeCreateCq; - -#define NVME_CQ_FLAGS_PC(cq_flags) (cq_flags & 0x1) -#define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1) - -typedef struct NvmeCreateSq { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t rsvd1[5]; - uint64_t prp1; - uint64_t rsvd8; - uint16_t sqid; - uint16_t qsize; - uint16_t sq_flags; - uint16_t cqid; - uint32_t rsvd12[4]; -} NvmeCreateSq; - -#define NVME_SQ_FLAGS_PC(sq_flags) (sq_flags & 0x1) -#define NVME_SQ_FLAGS_QPRIO(sq_flags) ((sq_flags >> 1) & 0x3) - -enum NvmeQueueFlags { - NVME_Q_PC =3D 1, - NVME_Q_PRIO_URGENT =3D 0, - NVME_Q_PRIO_HIGH =3D 1, - NVME_Q_PRIO_NORMAL =3D 2, - NVME_Q_PRIO_LOW =3D 3, -}; - -typedef struct NvmeIdentify { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t nsid; - uint64_t rsvd2[2]; - uint64_t prp1; - uint64_t prp2; - uint32_t cns; - uint32_t rsvd11[5]; -} NvmeIdentify; - -typedef struct NvmeRwCmd { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t nsid; - uint64_t rsvd2; - uint64_t mptr; - uint64_t prp1; - uint64_t prp2; - uint64_t slba; - uint16_t nlb; - uint16_t control; - uint32_t dsmgmt; - uint32_t reftag; - uint16_t apptag; - uint16_t appmask; -} NvmeRwCmd; - -enum { - NVME_RW_LR =3D 1 << 15, - NVME_RW_FUA =3D 1 << 14, - NVME_RW_DSM_FREQ_UNSPEC =3D 0, - NVME_RW_DSM_FREQ_TYPICAL =3D 1, - NVME_RW_DSM_FREQ_RARE =3D 2, - NVME_RW_DSM_FREQ_READS =3D 3, - NVME_RW_DSM_FREQ_WRITES =3D 4, - NVME_RW_DSM_FREQ_RW =3D 5, - NVME_RW_DSM_FREQ_ONCE =3D 6, - NVME_RW_DSM_FREQ_PREFETCH =3D 7, - NVME_RW_DSM_FREQ_TEMP =3D 8, - NVME_RW_DSM_LATENCY_NONE =3D 0 << 4, - NVME_RW_DSM_LATENCY_IDLE =3D 1 << 4, - NVME_RW_DSM_LATENCY_NORM =3D 2 << 4, - NVME_RW_DSM_LATENCY_LOW =3D 3 << 4, - NVME_RW_DSM_SEQ_REQ =3D 1 << 6, - NVME_RW_DSM_COMPRESSED =3D 1 << 7, - NVME_RW_PRINFO_PRACT =3D 1 << 13, - NVME_RW_PRINFO_PRCHK_GUARD =3D 1 << 12, - NVME_RW_PRINFO_PRCHK_APP =3D 1 << 11, - NVME_RW_PRINFO_PRCHK_REF =3D 1 << 10, -}; - -typedef struct NvmeDsmCmd { - uint8_t opcode; - uint8_t flags; - uint16_t cid; - uint32_t nsid; - uint64_t rsvd2[2]; - uint64_t prp1; - uint64_t prp2; - uint32_t nr; - uint32_t attributes; - uint32_t rsvd12[4]; -} NvmeDsmCmd; - -enum { - NVME_DSMGMT_IDR =3D 1 << 0, - NVME_DSMGMT_IDW =3D 1 << 1, - NVME_DSMGMT_AD =3D 1 << 2, -}; - -typedef struct NvmeDsmRange { - uint32_t cattr; - uint32_t nlb; - uint64_t slba; -} NvmeDsmRange; - -enum NvmeAsyncEventRequest { - NVME_AER_TYPE_ERROR =3D 0, - NVME_AER_TYPE_SMART =3D 1, - NVME_AER_TYPE_IO_SPECIFIC =3D 6, - NVME_AER_TYPE_VENDOR_SPECIFIC =3D 7, - NVME_AER_INFO_ERR_INVALID_SQ =3D 0, - NVME_AER_INFO_ERR_INVALID_DB =3D 1, - NVME_AER_INFO_ERR_DIAG_FAIL =3D 2, - NVME_AER_INFO_ERR_PERS_INTERNAL_ERR =3D 3, - NVME_AER_INFO_ERR_TRANS_INTERNAL_ERR =3D 4, - NVME_AER_INFO_ERR_FW_IMG_LOAD_ERR =3D 5, - NVME_AER_INFO_SMART_RELIABILITY =3D 0, - NVME_AER_INFO_SMART_TEMP_THRESH =3D 1, - NVME_AER_INFO_SMART_SPARE_THRESH =3D 2, -}; - -typedef struct NvmeAerResult { - uint8_t event_type; - uint8_t event_info; - uint8_t log_page; - uint8_t resv; -} NvmeAerResult; - -typedef struct NvmeCqe { - uint32_t result; - uint32_t rsvd; - uint16_t sq_head; - uint16_t sq_id; - uint16_t cid; - uint16_t status; -} NvmeCqe; - -enum NvmeStatusCodes { - NVME_SUCCESS =3D 0x0000, - NVME_INVALID_OPCODE =3D 0x0001, - NVME_INVALID_FIELD =3D 0x0002, - NVME_CID_CONFLICT =3D 0x0003, - NVME_DATA_TRAS_ERROR =3D 0x0004, - NVME_POWER_LOSS_ABORT =3D 0x0005, - NVME_INTERNAL_DEV_ERROR =3D 0x0006, - NVME_CMD_ABORT_REQ =3D 0x0007, - NVME_CMD_ABORT_SQ_DEL =3D 0x0008, - NVME_CMD_ABORT_FAILED_FUSE =3D 0x0009, - NVME_CMD_ABORT_MISSING_FUSE =3D 0x000a, - NVME_INVALID_NSID =3D 0x000b, - NVME_CMD_SEQ_ERROR =3D 0x000c, - NVME_LBA_RANGE =3D 0x0080, - NVME_CAP_EXCEEDED =3D 0x0081, - NVME_NS_NOT_READY =3D 0x0082, - NVME_NS_RESV_CONFLICT =3D 0x0083, - NVME_INVALID_CQID =3D 0x0100, - NVME_INVALID_QID =3D 0x0101, - NVME_MAX_QSIZE_EXCEEDED =3D 0x0102, - NVME_ACL_EXCEEDED =3D 0x0103, - NVME_RESERVED =3D 0x0104, - NVME_AER_LIMIT_EXCEEDED =3D 0x0105, - NVME_INVALID_FW_SLOT =3D 0x0106, - NVME_INVALID_FW_IMAGE =3D 0x0107, - NVME_INVALID_IRQ_VECTOR =3D 0x0108, - NVME_INVALID_LOG_ID =3D 0x0109, - NVME_INVALID_FORMAT =3D 0x010a, - NVME_FW_REQ_RESET =3D 0x010b, - NVME_INVALID_QUEUE_DEL =3D 0x010c, - NVME_FID_NOT_SAVEABLE =3D 0x010d, - NVME_FID_NOT_NSID_SPEC =3D 0x010f, - NVME_FW_REQ_SUSYSTEM_RESET =3D 0x0110, - NVME_CONFLICTING_ATTRS =3D 0x0180, - NVME_INVALID_PROT_INFO =3D 0x0181, - NVME_WRITE_TO_RO =3D 0x0182, - NVME_WRITE_FAULT =3D 0x0280, - NVME_UNRECOVERED_READ =3D 0x0281, - NVME_E2E_GUARD_ERROR =3D 0x0282, - NVME_E2E_APP_ERROR =3D 0x0283, - NVME_E2E_REF_ERROR =3D 0x0284, - NVME_CMP_FAILURE =3D 0x0285, - NVME_ACCESS_DENIED =3D 0x0286, - NVME_MORE =3D 0x2000, - NVME_DNR =3D 0x4000, - NVME_NO_COMPLETE =3D 0xffff, -}; - -typedef struct NvmeFwSlotInfoLog { - uint8_t afi; - uint8_t reserved1[7]; - uint8_t frs1[8]; - uint8_t frs2[8]; - uint8_t frs3[8]; - uint8_t frs4[8]; - uint8_t frs5[8]; - uint8_t frs6[8]; - uint8_t frs7[8]; - uint8_t reserved2[448]; -} NvmeFwSlotInfoLog; - -typedef struct NvmeErrorLog { - uint64_t error_count; - uint16_t sqid; - uint16_t cid; - uint16_t status_field; - uint16_t param_error_location; - uint64_t lba; - uint32_t nsid; - uint8_t vs; - uint8_t resv[35]; -} NvmeErrorLog; - -typedef struct NvmeSmartLog { - uint8_t critical_warning; - uint8_t temperature[2]; - uint8_t available_spare; - uint8_t available_spare_threshold; - uint8_t percentage_used; - uint8_t reserved1[26]; - uint64_t data_units_read[2]; - uint64_t data_units_written[2]; - uint64_t host_read_commands[2]; - uint64_t host_write_commands[2]; - uint64_t controller_busy_time[2]; - uint64_t power_cycles[2]; - uint64_t power_on_hours[2]; - uint64_t unsafe_shutdowns[2]; - uint64_t media_errors[2]; - uint64_t number_of_error_log_entries[2]; - uint8_t reserved2[320]; -} NvmeSmartLog; - -enum NvmeSmartWarn { - NVME_SMART_SPARE =3D 1 << 0, - NVME_SMART_TEMPERATURE =3D 1 << 1, - NVME_SMART_RELIABILITY =3D 1 << 2, - NVME_SMART_MEDIA_READ_ONLY =3D 1 << 3, - NVME_SMART_FAILED_VOLATILE_MEDIA =3D 1 << 4, -}; - -enum LogIdentifier { - NVME_LOG_ERROR_INFO =3D 0x01, - NVME_LOG_SMART_INFO =3D 0x02, - NVME_LOG_FW_SLOT_INFO =3D 0x03, -}; - -typedef struct NvmePSD { - uint16_t mp; - uint16_t reserved; - uint32_t enlat; - uint32_t exlat; - uint8_t rrt; - uint8_t rrl; - uint8_t rwt; - uint8_t rwl; - uint8_t resv[16]; -} NvmePSD; - -typedef struct NvmeIdCtrl { - uint16_t vid; - uint16_t ssvid; - uint8_t sn[20]; - uint8_t mn[40]; - uint8_t fr[8]; - uint8_t rab; - uint8_t ieee[3]; - uint8_t cmic; - uint8_t mdts; - uint8_t rsvd255[178]; - uint16_t oacs; - uint8_t acl; - uint8_t aerl; - uint8_t frmw; - uint8_t lpa; - uint8_t elpe; - uint8_t npss; - uint8_t rsvd511[248]; - uint8_t sqes; - uint8_t cqes; - uint16_t rsvd515; - uint32_t nn; - uint16_t oncs; - uint16_t fuses; - uint8_t fna; - uint8_t vwc; - uint16_t awun; - uint16_t awupf; - uint8_t rsvd703[174]; - uint8_t rsvd2047[1344]; - NvmePSD psd[32]; - uint8_t vs[1024]; -} NvmeIdCtrl; - -enum NvmeIdCtrlOacs { - NVME_OACS_SECURITY =3D 1 << 0, - NVME_OACS_FORMAT =3D 1 << 1, - NVME_OACS_FW =3D 1 << 2, -}; - -enum NvmeIdCtrlOncs { - NVME_ONCS_COMPARE =3D 1 << 0, - NVME_ONCS_WRITE_UNCORR =3D 1 << 1, - NVME_ONCS_DSM =3D 1 << 2, - NVME_ONCS_WRITE_ZEROS =3D 1 << 3, - NVME_ONCS_FEATURES =3D 1 << 4, - NVME_ONCS_RESRVATIONS =3D 1 << 5, -}; - -#define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) -#define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) -#define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) -#define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf) - -typedef struct NvmeFeatureVal { - uint32_t arbitration; - uint32_t power_mgmt; - uint32_t temp_thresh; - uint32_t err_rec; - uint32_t volatile_wc; - uint32_t num_queues; - uint32_t int_coalescing; - uint32_t *int_vector_config; - uint32_t write_atomicity; - uint32_t async_config; - uint32_t sw_prog_marker; -} NvmeFeatureVal; - -#define NVME_ARB_AB(arb) (arb & 0x7) -#define NVME_ARB_LPW(arb) ((arb >> 8) & 0xff) -#define NVME_ARB_MPW(arb) ((arb >> 16) & 0xff) -#define NVME_ARB_HPW(arb) ((arb >> 24) & 0xff) - -#define NVME_INTC_THR(intc) (intc & 0xff) -#define NVME_INTC_TIME(intc) ((intc >> 8) & 0xff) - -enum NvmeFeatureIds { - NVME_ARBITRATION =3D 0x1, - NVME_POWER_MANAGEMENT =3D 0x2, - NVME_LBA_RANGE_TYPE =3D 0x3, - NVME_TEMPERATURE_THRESHOLD =3D 0x4, - NVME_ERROR_RECOVERY =3D 0x5, - NVME_VOLATILE_WRITE_CACHE =3D 0x6, - NVME_NUMBER_OF_QUEUES =3D 0x7, - NVME_INTERRUPT_COALESCING =3D 0x8, - NVME_INTERRUPT_VECTOR_CONF =3D 0x9, - NVME_WRITE_ATOMICITY =3D 0xa, - NVME_ASYNCHRONOUS_EVENT_CONF =3D 0xb, - NVME_SOFTWARE_PROGRESS_MARKER =3D 0x80 -}; - -typedef struct NvmeRangeType { - uint8_t type; - uint8_t attributes; - uint8_t rsvd2[14]; - uint64_t slba; - uint64_t nlb; - uint8_t guid[16]; - uint8_t rsvd48[16]; -} NvmeRangeType; - -typedef struct NvmeLBAF { - uint16_t ms; - uint8_t ds; - uint8_t rp; -} NvmeLBAF; - -typedef struct NvmeIdNs { - uint64_t nsze; - uint64_t ncap; - uint64_t nuse; - uint8_t nsfeat; - uint8_t nlbaf; - uint8_t flbas; - uint8_t mc; - uint8_t dpc; - uint8_t dps; - uint8_t res30[98]; - NvmeLBAF lbaf[16]; - uint8_t res192[192]; - uint8_t vs[3712]; -} NvmeIdNs; - -#define NVME_ID_NS_NSFEAT_THIN(nsfeat) ((nsfeat & 0x1)) -#define NVME_ID_NS_FLBAS_EXTENDED(flbas) ((flbas >> 4) & 0x1) -#define NVME_ID_NS_FLBAS_INDEX(flbas) ((flbas & 0xf)) -#define NVME_ID_NS_MC_SEPARATE(mc) ((mc >> 1) & 0x1) -#define NVME_ID_NS_MC_EXTENDED(mc) ((mc & 0x1)) -#define NVME_ID_NS_DPC_LAST_EIGHT(dpc) ((dpc >> 4) & 0x1) -#define NVME_ID_NS_DPC_FIRST_EIGHT(dpc) ((dpc >> 3) & 0x1) -#define NVME_ID_NS_DPC_TYPE_3(dpc) ((dpc >> 2) & 0x1) -#define NVME_ID_NS_DPC_TYPE_2(dpc) ((dpc >> 1) & 0x1) -#define NVME_ID_NS_DPC_TYPE_1(dpc) ((dpc & 0x1)) -#define NVME_ID_NS_DPC_TYPE_MASK 0x7 - -enum NvmeIdNsDps { - DPS_TYPE_NONE =3D 0, - DPS_TYPE_1 =3D 1, - DPS_TYPE_2 =3D 2, - DPS_TYPE_3 =3D 3, - DPS_TYPE_MASK =3D 0x7, - DPS_FIRST_EIGHT =3D 8, -}; - -static inline void _nvme_check_size(void) -{ - QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) !=3D 4); - QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) !=3D 16); - QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) !=3D 16); - QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeCreateSq) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) !=3D 64); - QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) !=3D 512); - QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) !=3D 512); - QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) !=3D 4096); - QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) !=3D 4096); -} +#include "block/nvme.h" =20 typedef struct NvmeAsyncEvent { QSIMPLEQ_ENTRY(NvmeAsyncEvent) entry; diff --git a/include/block/nvme.h b/include/block/nvme.h new file mode 100644 index 0000000000..849a6f3fa3 --- /dev/null +++ b/include/block/nvme.h @@ -0,0 +1,700 @@ +#ifndef BLOCK_NVME_H +#define BLOCK_NVME_H + +typedef struct NvmeBar { + uint64_t cap; + uint32_t vs; + uint32_t intms; + uint32_t intmc; + uint32_t cc; + uint32_t rsvd1; + uint32_t csts; + uint32_t nssrc; + uint32_t aqa; + uint64_t asq; + uint64_t acq; + uint32_t cmbloc; + uint32_t cmbsz; +} NvmeBar; + +enum NvmeCapShift { + CAP_MQES_SHIFT =3D 0, + CAP_CQR_SHIFT =3D 16, + CAP_AMS_SHIFT =3D 17, + CAP_TO_SHIFT =3D 24, + CAP_DSTRD_SHIFT =3D 32, + CAP_NSSRS_SHIFT =3D 33, + CAP_CSS_SHIFT =3D 37, + CAP_MPSMIN_SHIFT =3D 48, + CAP_MPSMAX_SHIFT =3D 52, +}; + +enum NvmeCapMask { + CAP_MQES_MASK =3D 0xffff, + CAP_CQR_MASK =3D 0x1, + CAP_AMS_MASK =3D 0x3, + CAP_TO_MASK =3D 0xff, + CAP_DSTRD_MASK =3D 0xf, + CAP_NSSRS_MASK =3D 0x1, + CAP_CSS_MASK =3D 0xff, + CAP_MPSMIN_MASK =3D 0xf, + CAP_MPSMAX_MASK =3D 0xf, +}; + +#define NVME_CAP_MQES(cap) (((cap) >> CAP_MQES_SHIFT) & CAP_MQES_MASK) +#define NVME_CAP_CQR(cap) (((cap) >> CAP_CQR_SHIFT) & CAP_CQR_MASK) +#define NVME_CAP_AMS(cap) (((cap) >> CAP_AMS_SHIFT) & CAP_AMS_MASK) +#define NVME_CAP_TO(cap) (((cap) >> CAP_TO_SHIFT) & CAP_TO_MASK) +#define NVME_CAP_DSTRD(cap) (((cap) >> CAP_DSTRD_SHIFT) & CAP_DSTRD_MASK) +#define NVME_CAP_NSSRS(cap) (((cap) >> CAP_NSSRS_SHIFT) & CAP_NSSRS_MASK) +#define NVME_CAP_CSS(cap) (((cap) >> CAP_CSS_SHIFT) & CAP_CSS_MASK) +#define NVME_CAP_MPSMIN(cap)(((cap) >> CAP_MPSMIN_SHIFT) & CAP_MPSMIN_MASK) +#define NVME_CAP_MPSMAX(cap)(((cap) >> CAP_MPSMAX_SHIFT) & CAP_MPSMAX_MASK) + +#define NVME_CAP_SET_MQES(cap, val) (cap |=3D (uint64_t)(val & CAP_MQES_= MASK) \ + << CAP_MQES_SHI= FT) +#define NVME_CAP_SET_CQR(cap, val) (cap |=3D (uint64_t)(val & CAP_CQR_M= ASK) \ + << CAP_CQR_SHIF= T) +#define NVME_CAP_SET_AMS(cap, val) (cap |=3D (uint64_t)(val & CAP_AMS_M= ASK) \ + << CAP_AMS_SHIF= T) +#define NVME_CAP_SET_TO(cap, val) (cap |=3D (uint64_t)(val & CAP_TO_MA= SK) \ + << CAP_TO_SHIFT) +#define NVME_CAP_SET_DSTRD(cap, val) (cap |=3D (uint64_t)(val & CAP_DSTRD= _MASK) \ + << CAP_DSTRD_SH= IFT) +#define NVME_CAP_SET_NSSRS(cap, val) (cap |=3D (uint64_t)(val & CAP_NSSRS= _MASK) \ + << CAP_NSSRS_SH= IFT) +#define NVME_CAP_SET_CSS(cap, val) (cap |=3D (uint64_t)(val & CAP_CSS_M= ASK) \ + << CAP_CSS_SHIF= T) +#define NVME_CAP_SET_MPSMIN(cap, val) (cap |=3D (uint64_t)(val & CAP_MPSMI= N_MASK)\ + << CAP_MPSMIN_S= HIFT) +#define NVME_CAP_SET_MPSMAX(cap, val) (cap |=3D (uint64_t)(val & CAP_MPSMA= X_MASK)\ + << CAP_MPSMAX_= SHIFT) + +enum NvmeCcShift { + CC_EN_SHIFT =3D 0, + CC_CSS_SHIFT =3D 4, + CC_MPS_SHIFT =3D 7, + CC_AMS_SHIFT =3D 11, + CC_SHN_SHIFT =3D 14, + CC_IOSQES_SHIFT =3D 16, + CC_IOCQES_SHIFT =3D 20, +}; + +enum NvmeCcMask { + CC_EN_MASK =3D 0x1, + CC_CSS_MASK =3D 0x7, + CC_MPS_MASK =3D 0xf, + CC_AMS_MASK =3D 0x7, + CC_SHN_MASK =3D 0x3, + CC_IOSQES_MASK =3D 0xf, + CC_IOCQES_MASK =3D 0xf, +}; + +#define NVME_CC_EN(cc) ((cc >> CC_EN_SHIFT) & CC_EN_MASK) +#define NVME_CC_CSS(cc) ((cc >> CC_CSS_SHIFT) & CC_CSS_MASK) +#define NVME_CC_MPS(cc) ((cc >> CC_MPS_SHIFT) & CC_MPS_MASK) +#define NVME_CC_AMS(cc) ((cc >> CC_AMS_SHIFT) & CC_AMS_MASK) +#define NVME_CC_SHN(cc) ((cc >> CC_SHN_SHIFT) & CC_SHN_MASK) +#define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK) +#define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK) + +enum NvmeCstsShift { + CSTS_RDY_SHIFT =3D 0, + CSTS_CFS_SHIFT =3D 1, + CSTS_SHST_SHIFT =3D 2, + CSTS_NSSRO_SHIFT =3D 4, +}; + +enum NvmeCstsMask { + CSTS_RDY_MASK =3D 0x1, + CSTS_CFS_MASK =3D 0x1, + CSTS_SHST_MASK =3D 0x3, + CSTS_NSSRO_MASK =3D 0x1, +}; + +enum NvmeCsts { + NVME_CSTS_READY =3D 1 << CSTS_RDY_SHIFT, + NVME_CSTS_FAILED =3D 1 << CSTS_CFS_SHIFT, + NVME_CSTS_SHST_NORMAL =3D 0 << CSTS_SHST_SHIFT, + NVME_CSTS_SHST_PROGRESS =3D 1 << CSTS_SHST_SHIFT, + NVME_CSTS_SHST_COMPLETE =3D 2 << CSTS_SHST_SHIFT, + NVME_CSTS_NSSRO =3D 1 << CSTS_NSSRO_SHIFT, +}; + +#define NVME_CSTS_RDY(csts) ((csts >> CSTS_RDY_SHIFT) & CSTS_RDY_MAS= K) +#define NVME_CSTS_CFS(csts) ((csts >> CSTS_CFS_SHIFT) & CSTS_CFS_MAS= K) +#define NVME_CSTS_SHST(csts) ((csts >> CSTS_SHST_SHIFT) & CSTS_SHST_MA= SK) +#define NVME_CSTS_NSSRO(csts) ((csts >> CSTS_NSSRO_SHIFT) & CSTS_NSSRO_M= ASK) + +enum NvmeAqaShift { + AQA_ASQS_SHIFT =3D 0, + AQA_ACQS_SHIFT =3D 16, +}; + +enum NvmeAqaMask { + AQA_ASQS_MASK =3D 0xfff, + AQA_ACQS_MASK =3D 0xfff, +}; + +#define NVME_AQA_ASQS(aqa) ((aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) +#define NVME_AQA_ACQS(aqa) ((aqa >> AQA_ACQS_SHIFT) & AQA_ACQS_MASK) + +enum NvmeCmblocShift { + CMBLOC_BIR_SHIFT =3D 0, + CMBLOC_OFST_SHIFT =3D 12, +}; + +enum NvmeCmblocMask { + CMBLOC_BIR_MASK =3D 0x7, + CMBLOC_OFST_MASK =3D 0xfffff, +}; + +#define NVME_CMBLOC_BIR(cmbloc) ((cmbloc >> CMBLOC_BIR_SHIFT) & \ + CMBLOC_BIR_MASK) +#define NVME_CMBLOC_OFST(cmbloc)((cmbloc >> CMBLOC_OFST_SHIFT) & \ + CMBLOC_OFST_MASK) + +#define NVME_CMBLOC_SET_BIR(cmbloc, val) \ + (cmbloc |=3D (uint64_t)(val & CMBLOC_BIR_MASK) << CMBLOC_BIR_SHIFT) +#define NVME_CMBLOC_SET_OFST(cmbloc, val) \ + (cmbloc |=3D (uint64_t)(val & CMBLOC_OFST_MASK) << CMBLOC_OFST_SHIFT) + +enum NvmeCmbszShift { + CMBSZ_SQS_SHIFT =3D 0, + CMBSZ_CQS_SHIFT =3D 1, + CMBSZ_LISTS_SHIFT =3D 2, + CMBSZ_RDS_SHIFT =3D 3, + CMBSZ_WDS_SHIFT =3D 4, + CMBSZ_SZU_SHIFT =3D 8, + CMBSZ_SZ_SHIFT =3D 12, +}; + +enum NvmeCmbszMask { + CMBSZ_SQS_MASK =3D 0x1, + CMBSZ_CQS_MASK =3D 0x1, + CMBSZ_LISTS_MASK =3D 0x1, + CMBSZ_RDS_MASK =3D 0x1, + CMBSZ_WDS_MASK =3D 0x1, + CMBSZ_SZU_MASK =3D 0xf, + CMBSZ_SZ_MASK =3D 0xfffff, +}; + +#define NVME_CMBSZ_SQS(cmbsz) ((cmbsz >> CMBSZ_SQS_SHIFT) & CMBSZ_SQS_M= ASK) +#define NVME_CMBSZ_CQS(cmbsz) ((cmbsz >> CMBSZ_CQS_SHIFT) & CMBSZ_CQS_M= ASK) +#define NVME_CMBSZ_LISTS(cmbsz)((cmbsz >> CMBSZ_LISTS_SHIFT) & CMBSZ_LISTS= _MASK) +#define NVME_CMBSZ_RDS(cmbsz) ((cmbsz >> CMBSZ_RDS_SHIFT) & CMBSZ_RDS_M= ASK) +#define NVME_CMBSZ_WDS(cmbsz) ((cmbsz >> CMBSZ_WDS_SHIFT) & CMBSZ_WDS_M= ASK) +#define NVME_CMBSZ_SZU(cmbsz) ((cmbsz >> CMBSZ_SZU_SHIFT) & CMBSZ_SZU_M= ASK) +#define NVME_CMBSZ_SZ(cmbsz) ((cmbsz >> CMBSZ_SZ_SHIFT) & CMBSZ_SZ_MA= SK) + +#define NVME_CMBSZ_SET_SQS(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_SQS_MASK) << CMBSZ_SQS_SHIFT) +#define NVME_CMBSZ_SET_CQS(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_CQS_MASK) << CMBSZ_CQS_SHIFT) +#define NVME_CMBSZ_SET_LISTS(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_LISTS_MASK) << CMBSZ_LISTS_SHIFT) +#define NVME_CMBSZ_SET_RDS(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_RDS_MASK) << CMBSZ_RDS_SHIFT) +#define NVME_CMBSZ_SET_WDS(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_WDS_MASK) << CMBSZ_WDS_SHIFT) +#define NVME_CMBSZ_SET_SZU(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_SZU_MASK) << CMBSZ_SZU_SHIFT) +#define NVME_CMBSZ_SET_SZ(cmbsz, val) \ + (cmbsz |=3D (uint64_t)(val & CMBSZ_SZ_MASK) << CMBSZ_SZ_SHIFT) + +#define NVME_CMBSZ_GETSIZE(cmbsz) \ + (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz)))) + +typedef struct NvmeCmd { + uint8_t opcode; + uint8_t fuse; + uint16_t cid; + uint32_t nsid; + uint64_t res1; + uint64_t mptr; + uint64_t prp1; + uint64_t prp2; + uint32_t cdw10; + uint32_t cdw11; + uint32_t cdw12; + uint32_t cdw13; + uint32_t cdw14; + uint32_t cdw15; +} NvmeCmd; + +enum NvmeAdminCommands { + NVME_ADM_CMD_DELETE_SQ =3D 0x00, + NVME_ADM_CMD_CREATE_SQ =3D 0x01, + NVME_ADM_CMD_GET_LOG_PAGE =3D 0x02, + NVME_ADM_CMD_DELETE_CQ =3D 0x04, + NVME_ADM_CMD_CREATE_CQ =3D 0x05, + NVME_ADM_CMD_IDENTIFY =3D 0x06, + NVME_ADM_CMD_ABORT =3D 0x08, + NVME_ADM_CMD_SET_FEATURES =3D 0x09, + NVME_ADM_CMD_GET_FEATURES =3D 0x0a, + NVME_ADM_CMD_ASYNC_EV_REQ =3D 0x0c, + NVME_ADM_CMD_ACTIVATE_FW =3D 0x10, + NVME_ADM_CMD_DOWNLOAD_FW =3D 0x11, + NVME_ADM_CMD_FORMAT_NVM =3D 0x80, + NVME_ADM_CMD_SECURITY_SEND =3D 0x81, + NVME_ADM_CMD_SECURITY_RECV =3D 0x82, +}; + +enum NvmeIoCommands { + NVME_CMD_FLUSH =3D 0x00, + NVME_CMD_WRITE =3D 0x01, + NVME_CMD_READ =3D 0x02, + NVME_CMD_WRITE_UNCOR =3D 0x04, + NVME_CMD_COMPARE =3D 0x05, + NVME_CMD_WRITE_ZEROS =3D 0x08, + NVME_CMD_DSM =3D 0x09, +}; + +typedef struct NvmeDeleteQ { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[9]; + uint16_t qid; + uint16_t rsvd10; + uint32_t rsvd11[5]; +} NvmeDeleteQ; + +typedef struct NvmeCreateCq { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[5]; + uint64_t prp1; + uint64_t rsvd8; + uint16_t cqid; + uint16_t qsize; + uint16_t cq_flags; + uint16_t irq_vector; + uint32_t rsvd12[4]; +} NvmeCreateCq; + +#define NVME_CQ_FLAGS_PC(cq_flags) (cq_flags & 0x1) +#define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1) + +typedef struct NvmeCreateSq { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t rsvd1[5]; + uint64_t prp1; + uint64_t rsvd8; + uint16_t sqid; + uint16_t qsize; + uint16_t sq_flags; + uint16_t cqid; + uint32_t rsvd12[4]; +} NvmeCreateSq; + +#define NVME_SQ_FLAGS_PC(sq_flags) (sq_flags & 0x1) +#define NVME_SQ_FLAGS_QPRIO(sq_flags) ((sq_flags >> 1) & 0x3) + +enum NvmeQueueFlags { + NVME_Q_PC =3D 1, + NVME_Q_PRIO_URGENT =3D 0, + NVME_Q_PRIO_HIGH =3D 1, + NVME_Q_PRIO_NORMAL =3D 2, + NVME_Q_PRIO_LOW =3D 3, +}; + +typedef struct NvmeIdentify { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2[2]; + uint64_t prp1; + uint64_t prp2; + uint32_t cns; + uint32_t rsvd11[5]; +} NvmeIdentify; + +typedef struct NvmeRwCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2; + uint64_t mptr; + uint64_t prp1; + uint64_t prp2; + uint64_t slba; + uint16_t nlb; + uint16_t control; + uint32_t dsmgmt; + uint32_t reftag; + uint16_t apptag; + uint16_t appmask; +} NvmeRwCmd; + +enum { + NVME_RW_LR =3D 1 << 15, + NVME_RW_FUA =3D 1 << 14, + NVME_RW_DSM_FREQ_UNSPEC =3D 0, + NVME_RW_DSM_FREQ_TYPICAL =3D 1, + NVME_RW_DSM_FREQ_RARE =3D 2, + NVME_RW_DSM_FREQ_READS =3D 3, + NVME_RW_DSM_FREQ_WRITES =3D 4, + NVME_RW_DSM_FREQ_RW =3D 5, + NVME_RW_DSM_FREQ_ONCE =3D 6, + NVME_RW_DSM_FREQ_PREFETCH =3D 7, + NVME_RW_DSM_FREQ_TEMP =3D 8, + NVME_RW_DSM_LATENCY_NONE =3D 0 << 4, + NVME_RW_DSM_LATENCY_IDLE =3D 1 << 4, + NVME_RW_DSM_LATENCY_NORM =3D 2 << 4, + NVME_RW_DSM_LATENCY_LOW =3D 3 << 4, + NVME_RW_DSM_SEQ_REQ =3D 1 << 6, + NVME_RW_DSM_COMPRESSED =3D 1 << 7, + NVME_RW_PRINFO_PRACT =3D 1 << 13, + NVME_RW_PRINFO_PRCHK_GUARD =3D 1 << 12, + NVME_RW_PRINFO_PRCHK_APP =3D 1 << 11, + NVME_RW_PRINFO_PRCHK_REF =3D 1 << 10, +}; + +typedef struct NvmeDsmCmd { + uint8_t opcode; + uint8_t flags; + uint16_t cid; + uint32_t nsid; + uint64_t rsvd2[2]; + uint64_t prp1; + uint64_t prp2; + uint32_t nr; + uint32_t attributes; + uint32_t rsvd12[4]; +} NvmeDsmCmd; + +enum { + NVME_DSMGMT_IDR =3D 1 << 0, + NVME_DSMGMT_IDW =3D 1 << 1, + NVME_DSMGMT_AD =3D 1 << 2, +}; + +typedef struct NvmeDsmRange { + uint32_t cattr; + uint32_t nlb; + uint64_t slba; +} NvmeDsmRange; + +enum NvmeAsyncEventRequest { + NVME_AER_TYPE_ERROR =3D 0, + NVME_AER_TYPE_SMART =3D 1, + NVME_AER_TYPE_IO_SPECIFIC =3D 6, + NVME_AER_TYPE_VENDOR_SPECIFIC =3D 7, + NVME_AER_INFO_ERR_INVALID_SQ =3D 0, + NVME_AER_INFO_ERR_INVALID_DB =3D 1, + NVME_AER_INFO_ERR_DIAG_FAIL =3D 2, + NVME_AER_INFO_ERR_PERS_INTERNAL_ERR =3D 3, + NVME_AER_INFO_ERR_TRANS_INTERNAL_ERR =3D 4, + NVME_AER_INFO_ERR_FW_IMG_LOAD_ERR =3D 5, + NVME_AER_INFO_SMART_RELIABILITY =3D 0, + NVME_AER_INFO_SMART_TEMP_THRESH =3D 1, + NVME_AER_INFO_SMART_SPARE_THRESH =3D 2, +}; + +typedef struct NvmeAerResult { + uint8_t event_type; + uint8_t event_info; + uint8_t log_page; + uint8_t resv; +} NvmeAerResult; + +typedef struct NvmeCqe { + uint32_t result; + uint32_t rsvd; + uint16_t sq_head; + uint16_t sq_id; + uint16_t cid; + uint16_t status; +} NvmeCqe; + +enum NvmeStatusCodes { + NVME_SUCCESS =3D 0x0000, + NVME_INVALID_OPCODE =3D 0x0001, + NVME_INVALID_FIELD =3D 0x0002, + NVME_CID_CONFLICT =3D 0x0003, + NVME_DATA_TRAS_ERROR =3D 0x0004, + NVME_POWER_LOSS_ABORT =3D 0x0005, + NVME_INTERNAL_DEV_ERROR =3D 0x0006, + NVME_CMD_ABORT_REQ =3D 0x0007, + NVME_CMD_ABORT_SQ_DEL =3D 0x0008, + NVME_CMD_ABORT_FAILED_FUSE =3D 0x0009, + NVME_CMD_ABORT_MISSING_FUSE =3D 0x000a, + NVME_INVALID_NSID =3D 0x000b, + NVME_CMD_SEQ_ERROR =3D 0x000c, + NVME_LBA_RANGE =3D 0x0080, + NVME_CAP_EXCEEDED =3D 0x0081, + NVME_NS_NOT_READY =3D 0x0082, + NVME_NS_RESV_CONFLICT =3D 0x0083, + NVME_INVALID_CQID =3D 0x0100, + NVME_INVALID_QID =3D 0x0101, + NVME_MAX_QSIZE_EXCEEDED =3D 0x0102, + NVME_ACL_EXCEEDED =3D 0x0103, + NVME_RESERVED =3D 0x0104, + NVME_AER_LIMIT_EXCEEDED =3D 0x0105, + NVME_INVALID_FW_SLOT =3D 0x0106, + NVME_INVALID_FW_IMAGE =3D 0x0107, + NVME_INVALID_IRQ_VECTOR =3D 0x0108, + NVME_INVALID_LOG_ID =3D 0x0109, + NVME_INVALID_FORMAT =3D 0x010a, + NVME_FW_REQ_RESET =3D 0x010b, + NVME_INVALID_QUEUE_DEL =3D 0x010c, + NVME_FID_NOT_SAVEABLE =3D 0x010d, + NVME_FID_NOT_NSID_SPEC =3D 0x010f, + NVME_FW_REQ_SUSYSTEM_RESET =3D 0x0110, + NVME_CONFLICTING_ATTRS =3D 0x0180, + NVME_INVALID_PROT_INFO =3D 0x0181, + NVME_WRITE_TO_RO =3D 0x0182, + NVME_WRITE_FAULT =3D 0x0280, + NVME_UNRECOVERED_READ =3D 0x0281, + NVME_E2E_GUARD_ERROR =3D 0x0282, + NVME_E2E_APP_ERROR =3D 0x0283, + NVME_E2E_REF_ERROR =3D 0x0284, + NVME_CMP_FAILURE =3D 0x0285, + NVME_ACCESS_DENIED =3D 0x0286, + NVME_MORE =3D 0x2000, + NVME_DNR =3D 0x4000, + NVME_NO_COMPLETE =3D 0xffff, +}; + +typedef struct NvmeFwSlotInfoLog { + uint8_t afi; + uint8_t reserved1[7]; + uint8_t frs1[8]; + uint8_t frs2[8]; + uint8_t frs3[8]; + uint8_t frs4[8]; + uint8_t frs5[8]; + uint8_t frs6[8]; + uint8_t frs7[8]; + uint8_t reserved2[448]; +} NvmeFwSlotInfoLog; + +typedef struct NvmeErrorLog { + uint64_t error_count; + uint16_t sqid; + uint16_t cid; + uint16_t status_field; + uint16_t param_error_location; + uint64_t lba; + uint32_t nsid; + uint8_t vs; + uint8_t resv[35]; +} NvmeErrorLog; + +typedef struct NvmeSmartLog { + uint8_t critical_warning; + uint8_t temperature[2]; + uint8_t available_spare; + uint8_t available_spare_threshold; + uint8_t percentage_used; + uint8_t reserved1[26]; + uint64_t data_units_read[2]; + uint64_t data_units_written[2]; + uint64_t host_read_commands[2]; + uint64_t host_write_commands[2]; + uint64_t controller_busy_time[2]; + uint64_t power_cycles[2]; + uint64_t power_on_hours[2]; + uint64_t unsafe_shutdowns[2]; + uint64_t media_errors[2]; + uint64_t number_of_error_log_entries[2]; + uint8_t reserved2[320]; +} NvmeSmartLog; + +enum NvmeSmartWarn { + NVME_SMART_SPARE =3D 1 << 0, + NVME_SMART_TEMPERATURE =3D 1 << 1, + NVME_SMART_RELIABILITY =3D 1 << 2, + NVME_SMART_MEDIA_READ_ONLY =3D 1 << 3, + NVME_SMART_FAILED_VOLATILE_MEDIA =3D 1 << 4, +}; + +enum LogIdentifier { + NVME_LOG_ERROR_INFO =3D 0x01, + NVME_LOG_SMART_INFO =3D 0x02, + NVME_LOG_FW_SLOT_INFO =3D 0x03, +}; + +typedef struct NvmePSD { + uint16_t mp; + uint16_t reserved; + uint32_t enlat; + uint32_t exlat; + uint8_t rrt; + uint8_t rrl; + uint8_t rwt; + uint8_t rwl; + uint8_t resv[16]; +} NvmePSD; + +typedef struct NvmeIdCtrl { + uint16_t vid; + uint16_t ssvid; + uint8_t sn[20]; + uint8_t mn[40]; + uint8_t fr[8]; + uint8_t rab; + uint8_t ieee[3]; + uint8_t cmic; + uint8_t mdts; + uint8_t rsvd255[178]; + uint16_t oacs; + uint8_t acl; + uint8_t aerl; + uint8_t frmw; + uint8_t lpa; + uint8_t elpe; + uint8_t npss; + uint8_t rsvd511[248]; + uint8_t sqes; + uint8_t cqes; + uint16_t rsvd515; + uint32_t nn; + uint16_t oncs; + uint16_t fuses; + uint8_t fna; + uint8_t vwc; + uint16_t awun; + uint16_t awupf; + uint8_t rsvd703[174]; + uint8_t rsvd2047[1344]; + NvmePSD psd[32]; + uint8_t vs[1024]; +} NvmeIdCtrl; + +enum NvmeIdCtrlOacs { + NVME_OACS_SECURITY =3D 1 << 0, + NVME_OACS_FORMAT =3D 1 << 1, + NVME_OACS_FW =3D 1 << 2, +}; + +enum NvmeIdCtrlOncs { + NVME_ONCS_COMPARE =3D 1 << 0, + NVME_ONCS_WRITE_UNCORR =3D 1 << 1, + NVME_ONCS_DSM =3D 1 << 2, + NVME_ONCS_WRITE_ZEROS =3D 1 << 3, + NVME_ONCS_FEATURES =3D 1 << 4, + NVME_ONCS_RESRVATIONS =3D 1 << 5, +}; + +#define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf) +#define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf) +#define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf) +#define NVME_CTRL_CQES_MAX(cqes) (((cqes) >> 4) & 0xf) + +typedef struct NvmeFeatureVal { + uint32_t arbitration; + uint32_t power_mgmt; + uint32_t temp_thresh; + uint32_t err_rec; + uint32_t volatile_wc; + uint32_t num_queues; + uint32_t int_coalescing; + uint32_t *int_vector_config; + uint32_t write_atomicity; + uint32_t async_config; + uint32_t sw_prog_marker; +} NvmeFeatureVal; + +#define NVME_ARB_AB(arb) (arb & 0x7) +#define NVME_ARB_LPW(arb) ((arb >> 8) & 0xff) +#define NVME_ARB_MPW(arb) ((arb >> 16) & 0xff) +#define NVME_ARB_HPW(arb) ((arb >> 24) & 0xff) + +#define NVME_INTC_THR(intc) (intc & 0xff) +#define NVME_INTC_TIME(intc) ((intc >> 8) & 0xff) + +enum NvmeFeatureIds { + NVME_ARBITRATION =3D 0x1, + NVME_POWER_MANAGEMENT =3D 0x2, + NVME_LBA_RANGE_TYPE =3D 0x3, + NVME_TEMPERATURE_THRESHOLD =3D 0x4, + NVME_ERROR_RECOVERY =3D 0x5, + NVME_VOLATILE_WRITE_CACHE =3D 0x6, + NVME_NUMBER_OF_QUEUES =3D 0x7, + NVME_INTERRUPT_COALESCING =3D 0x8, + NVME_INTERRUPT_VECTOR_CONF =3D 0x9, + NVME_WRITE_ATOMICITY =3D 0xa, + NVME_ASYNCHRONOUS_EVENT_CONF =3D 0xb, + NVME_SOFTWARE_PROGRESS_MARKER =3D 0x80 +}; + +typedef struct NvmeRangeType { + uint8_t type; + uint8_t attributes; + uint8_t rsvd2[14]; + uint64_t slba; + uint64_t nlb; + uint8_t guid[16]; + uint8_t rsvd48[16]; +} NvmeRangeType; + +typedef struct NvmeLBAF { + uint16_t ms; + uint8_t ds; + uint8_t rp; +} NvmeLBAF; + +typedef struct NvmeIdNs { + uint64_t nsze; + uint64_t ncap; + uint64_t nuse; + uint8_t nsfeat; + uint8_t nlbaf; + uint8_t flbas; + uint8_t mc; + uint8_t dpc; + uint8_t dps; + uint8_t res30[98]; + NvmeLBAF lbaf[16]; + uint8_t res192[192]; + uint8_t vs[3712]; +} NvmeIdNs; + +#define NVME_ID_NS_NSFEAT_THIN(nsfeat) ((nsfeat & 0x1)) +#define NVME_ID_NS_FLBAS_EXTENDED(flbas) ((flbas >> 4) & 0x1) +#define NVME_ID_NS_FLBAS_INDEX(flbas) ((flbas & 0xf)) +#define NVME_ID_NS_MC_SEPARATE(mc) ((mc >> 1) & 0x1) +#define NVME_ID_NS_MC_EXTENDED(mc) ((mc & 0x1)) +#define NVME_ID_NS_DPC_LAST_EIGHT(dpc) ((dpc >> 4) & 0x1) +#define NVME_ID_NS_DPC_FIRST_EIGHT(dpc) ((dpc >> 3) & 0x1) +#define NVME_ID_NS_DPC_TYPE_3(dpc) ((dpc >> 2) & 0x1) +#define NVME_ID_NS_DPC_TYPE_2(dpc) ((dpc >> 1) & 0x1) +#define NVME_ID_NS_DPC_TYPE_1(dpc) ((dpc & 0x1)) +#define NVME_ID_NS_DPC_TYPE_MASK 0x7 + +enum NvmeIdNsDps { + DPS_TYPE_NONE =3D 0, + DPS_TYPE_1 =3D 1, + DPS_TYPE_2 =3D 2, + DPS_TYPE_3 =3D 3, + DPS_TYPE_MASK =3D 0x7, + DPS_FIRST_EIGHT =3D 8, +}; + +static inline void _nvme_check_size(void) +{ + QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) !=3D 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) !=3D 16); + QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) !=3D 16); + QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeCreateSq) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) !=3D 64); + QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) !=3D 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) !=3D 512); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) !=3D 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) !=3D 4096); +} +#endif --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057527815670.4135968531536; Wed, 7 Feb 2018 18:38:47 -0800 (PST) Received: from localhost ([::1]:60421 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc6l-0005ER-Vy for importer@patchew.org; Wed, 07 Feb 2018 21:38:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41796) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbpB-0006OX-3O for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbpA-0004X1-4e for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:33 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53818 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbp9-0004Wn-Vr for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:32 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A191E818532A; Thu, 8 Feb 2018 02:20:31 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7D64CB0795; Thu, 8 Feb 2018 02:20:30 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:51 +0800 Message-Id: <20180208021953.7354-15-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Feb 2018 02:20:31 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 14/16] docs: Add section for NVMe VFIO driver X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-9-famz@redhat.com> Signed-off-by: Fam Zheng --- docs/qemu-block-drivers.texi | 37 +++++++++++++++++++++++++++++++++++++ qemu-doc.texi | 1 + 2 files changed, 38 insertions(+) diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi index 503c1847aa..cd74767ed3 100644 --- a/docs/qemu-block-drivers.texi +++ b/docs/qemu-block-drivers.texi @@ -785,6 +785,43 @@ warning: ssh server @code{ssh.example.com:22} does not= support fsync With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is supported. =20 +@node disk_images_nvme +@subsection NVMe disk images + +NVM Express (NVMe) storage controllers can be accessed directly by a users= pace +driver in QEMU. This bypasses the host kernel file system and block layers +while retaining QEMU block layer functionalities, such as block jobs, I/O +throttling, image formats, etc. Disk I/O performance is typically higher = than +with @code{-drive file=3D/dev/sda} using either thread pool or linux-aio. + +The controller will be exclusively used by the QEMU process once started. = To be +able to share storage between multiple VMs and other applications on the h= ost, +please use the file based protocols. + +Before starting QEMU, bind the host NVMe controller to the host vfio-pci +driver. For example: + +@example +# modprobe vfio-pci +# lspci -n -s 0000:06:0d.0 +06:0d.0 0401: 1102:0002 (rev 08) +# echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind +# echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id + +# qemu-system-x86_64 -drive file=3Dnvme://@var{host}:@var{bus}:@var{slot}.= @var{func}/@var{namespace} +@end example + +Alternative syntax using properties: + +@example +qemu-system-x86_64 -drive file.driver=3Dnvme,file.device=3D@var{host}:@var= {bus}:@var{slot}.@var{func},file.namespace=3D@var{namespace} +@end example + +@var{host}:@var{bus}:@var{slot}.@var{func} is the NVMe controller's PCI de= vice +address on the host. + +@var{namespace} is the NVMe namespace number, starting from 1. + @node disk_image_locking @subsection Disk image file locking =20 diff --git a/qemu-doc.texi b/qemu-doc.texi index 19a82bfea3..769968aba4 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -621,6 +621,7 @@ encrypted disk images. * disk_images_iscsi:: iSCSI LUNs * disk_images_gluster:: GlusterFS disk images * disk_images_ssh:: Secure Shell (ssh) disk images +* disk_images_nvme:: NVMe userspace driver * disk_image_locking:: Disk image file locking @end menu =20 --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518056815481762.793445473057; Wed, 7 Feb 2018 18:26:55 -0800 (PST) Received: from localhost ([::1]:59246 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbvK-0003az-Cw for importer@patchew.org; Wed, 07 Feb 2018 21:26:54 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41826) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbpC-0006Qg-Mv for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbpB-0004Xq-TJ for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:34 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60064 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbpB-0004XX-Nw for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:33 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5CCB3404008D; Thu, 8 Feb 2018 02:20:33 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 31FE2B0794; Thu, 8 Feb 2018 02:20:31 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:52 +0800 Message-Id: <20180208021953.7354-16-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:33 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Thu, 08 Feb 2018 02:20:33 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 15/16] qapi: Add NVMe driver options to the schema X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180116060901.17413-10-famz@redhat.com> Signed-off-by: Fam Zheng --- qapi/block-core.json | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 8225308904..8046c2da23 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2248,6 +2248,7 @@ # # @vxhs: Since 2.10 # @throttle: Since 2.11 +# @nvme: Since 2.12 # # Since: 2.9 ## @@ -2255,7 +2256,7 @@ 'data': [ 'blkdebug', 'blkverify', 'bochs', 'cloop', 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs', - 'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed', + 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', '= qed', 'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] } =20 @@ -2296,6 +2297,19 @@ { 'struct': 'BlockdevOptionsNull', 'data': { '*size': 'int', '*latency-ns': 'uint64' } } =20 +## +# @BlockdevOptionsNVMe: +# +# Driver specific block device options for the NVMe backend. +# +# @device: controller address of the NVMe device. +# @namespace: namespace number of the device, starting from 1. +# +# Since: 2.12 +## +{ 'struct': 'BlockdevOptionsNVMe', + 'data': { 'device': 'str', 'namespace': 'int' } } + ## # @BlockdevOptionsVVFAT: # @@ -3201,6 +3215,7 @@ 'nfs': 'BlockdevOptionsNfs', 'null-aio': 'BlockdevOptionsNull', 'null-co': 'BlockdevOptionsNull', + 'nvme': 'BlockdevOptionsNVMe', 'parallels': 'BlockdevOptionsGenericFormat', 'qcow2': 'BlockdevOptionsQcow2', 'qcow': 'BlockdevOptionsQcow', --=20 2.14.3 From nobody Mon Apr 29 10:10:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1518057621088717.7370996736998; Wed, 7 Feb 2018 18:40:21 -0800 (PST) Received: from localhost ([::1]:60549 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejc8I-0006LO-2f for importer@patchew.org; Wed, 07 Feb 2018 21:40:18 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41856) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ejbpH-0006Wu-Do for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ejbpD-0004YR-V6 for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:39 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49650 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ejbpD-0004YL-P3 for qemu-devel@nongnu.org; Wed, 07 Feb 2018 21:20:35 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6118A7C6BB; Thu, 8 Feb 2018 02:20:35 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-87.pek2.redhat.com [10.72.12.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id E6678B0794; Thu, 8 Feb 2018 02:20:33 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 8 Feb 2018 10:19:53 +0800 Message-Id: <20180208021953.7354-17-famz@redhat.com> In-Reply-To: <20180208021953.7354-1-famz@redhat.com> References: <20180208021953.7354-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Thu, 08 Feb 2018 02:20:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Thu, 08 Feb 2018 02:20:35 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 16/16] docs: Add docs/devel/testing.rst X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" To make our efforts on QEMU testing easier to consume by contributors, let's add a document. For example, Patchew reports build errors on patches that should be relatively easy to reproduce with a few steps, and it is much nicer if there is such a documentation that it can refer to. This focuses on how to run existing tests and how to write new test cases, without going into the frameworks themselves. The VM based testing section is moved from tests/vm/README which now is a single line pointing to the new doc. Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Message-Id: <20180201022046.9425-1-famz@redhat.com> Signed-off-by: Fam Zheng --- docs/devel/testing.rst | 486 +++++++++++++++++++++++++++++++++++++++++++++= ++++ tests/vm/README | 90 +-------- 2 files changed, 487 insertions(+), 89 deletions(-) create mode 100644 docs/devel/testing.rst diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst new file mode 100644 index 0000000000..0ca1a2d4b5 --- /dev/null +++ b/docs/devel/testing.rst @@ -0,0 +1,486 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Testing in QEMU +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document describes the testing infrastructure in QEMU. + +Testing with "make check" +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + +The "make check" testing family includes most of the C based tests in QEMU= . For +a quick help, run ``make check-help`` from the source tree. + +The usual way to run these tests is: + +.. code:: + + make check + +which includes QAPI schema tests, unit tests, and QTests. Different sub-ty= pes +of "make check" tests will be explained below. + +Before running tests, it is best to build QEMU programs first. Some tests +expect the executables to exist and will fail with obscure messages if they +cannot find them. + +Unit tests +---------- + +Unit tests, which can be invoked with ``make check-unit``, are simple C te= sts +that typically link to individual QEMU object files and exercise them by +calling exported functions. + +If you are writing new code in QEMU, consider adding a unit test, especial= ly +for utility modules that are relatively stateless or have few dependencies= . To +add a new unit test: + +1. Create a new source file. For example, ``tests/foo-test.c``. + +2. Write the test. Normally you would include the header file which exports + the module API, then verify the interface behaves as expected from your + test. The test code should be organized with the glib testing framework. + Copying and modifying an existing test is usually a good idea. + +3. Add the test to ``tests/Makefile.include``. First, name the unit test + program and add it to ``$(check-unit-y)``; then add a rule to build the + executable. Optionally, you can add a magical variable to support ``gco= v``. + For example: + +.. code:: + + check-unit-y +=3D tests/foo-test$(EXESUF) + tests/foo-test$(EXESUF): tests/foo-test.o $(test-util-obj-y) + ... + gcov-files-foo-test-y =3D util/foo.c + +Since unit tests don't require environment variables, the simplest way to = debug +a unit test failure is often directly invoking it or even running it under +``gdb``. However there can still be differences in behavior between ``make= `` +invocations and your manual run, due to ``$MALLOC_PERTURB_`` environment +variable (which affects memory reclamation and catches invalid pointers be= tter) +and gtester options. If necessary, you can run + +.. code:: + make check-unit V=3D1 + +and copy the actual command line which executes the unit test, then run +it from the command line. + +QTest +----- + +QTest is a device emulation testing framework. It can be very useful to t= est +device models; it could also control certain aspects of QEMU (such as virt= ual +clock stepping), with a special purpose "qtest" protocol. Refer to the +documentation in ``qtest.c`` for more details of the protocol. + +QTest cases can be executed with + +.. code:: + + make check-qtest + +The QTest library is implemented by ``tests/libqtest.c`` and the API is de= fined +in ``tests/libqtest.h``. + +Consider adding a new QTest case when you are introducing a new virtual +hardware, or extending one if you are adding functionalities to an existing +virtual device. + +On top of libqtest, a higher level library, ``libqos``, was created to +encapsulate common tasks of device drivers, such as memory management and +communicating with system buses or devices. Many virtual device tests use +libqos instead of directly calling into libqtest. + +Steps to add a new QTest case are: + +1. Create a new source file for the test. (More than one file can be added= as + necessary.) For example, ``tests/test-foo-device.c``. + +2. Write the test code with the glib and libqtest/libqos API. See also exi= sting + tests and the library headers for reference. + +3. Register the new test in ``tests/Makefile.include``. Add the test execu= table + name to an appropriate ``check-qtest-*-y`` variable. For example: + + ``check-qtest-generic-y =3D tests/test-foo-device$(EXESUF)`` + +4. Add object dependencies of the executable in the Makefile, including the + test source file(s) and other interesting objects. For example: + + ``tests/test-foo-device$(EXESUF): tests/test-foo-device.o $(libqos-obj-= y)`` + +Debugging a QTest failure is slightly harder than the unit test because the +tests look up QEMU program names in the environment variables, such as +``QTEST_QEMU_BINARY`` and ``QTEST_QEMU_IMG``, and also because it is not e= asy +to attach gdb to the QEMU process spawned from the test. But manual invoki= ng +and using gdb on the test is still simple to do: find out the actual comma= nd +from the output of + +.. code:: + make check-qtest V=3D1 + +which you can run manually. + +QAPI schema tests +----------------- + +The QAPI schema tests validate the QAPI parser used by QMP, by feeding +predefined input to the parser and comparing the result with the reference +output. + +The input/output data is managed under the ``tests/qapi-schema`` directory. +Each test case includes four files that have a common base name: + + * ``${casename}.json`` - the file contains the JSON input for feeding the + parser + * ``${casename}.out`` - the file contains the expected stdout from the p= arser + * ``${casename}.err`` - the file contains the expected stderr from the p= arser + * ``${casename}.exit`` - the expected error code + +Consider adding a new QAPI schema test when you are making a change on the= QAPI +parser (either fixing a bug or extending/modifying the syntax). To do this: + +1. Add four files for the new case as explained above. For example: + + ``$EDITOR tests/qapi-schema/foo.{json,out,err,exit}``. + +2. Add the new test in ``tests/Makefile.include``. For example: + + ``qapi-schema +=3D foo.json`` + +check-block +----------- + +``make check-block`` is a legacy command to invoke block layer iotests and= is +rarely used. See "QEMU iotests" section below for more information. + +GCC gcov support +---------------- + +``gcov`` is a GCC tool to analyze the testing coverage by instrumenting the +tested code. To use it, configure QEMU with ``--enable-gcov`` option and b= uild. +Then run ``make check`` as usual. There will be additional ``gcov`` output= as +the testing goes on, showing the test coverage percentage numbers per anal= yzed +source file. More detailed reports can be obtained by running ``gcov`` com= mand +on the output files under ``$build_dir/tests/``, please read the ``gcov`` +documentation for more information. + +QEMU iotests +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +QEMU iotests, under the directory ``tests/qemu-iotests``, is the testing +framework widely used to test block layer related features. It is higher l= evel +than "make check" tests and 99% of the code is written in bash or Python +scripts. The testing success criteria is golden output comparison, and the +test files are named with numbers. + +To run iotests, make sure QEMU is built successfully, then switch to the +``tests/qemu-iotests`` directory under the build directory, and run ``./ch= eck`` +with desired arguments from there. + +By default, "raw" format and "file" protocol is used; all tests will be +executed, except the unsupported ones. You can override the format and pro= tocol +with arguments: + +.. code:: + + # test with qcow2 format + ./check -qcow2 + # or test a different protocol + ./check -nbd + +It's also possible to list test numbers explicitly: + +.. code:: + + # run selected cases with qcow2 format + ./check -qcow2 001 030 153 + +Cache mode can be selected with the "-c" option, which may help reveal bugs +that are specific to certain cache mode. + +More options are supported by the ``./check`` script, run ``./check -h`` f= or +help. + +Writing a new test case +----------------------- + +Consider writing a tests case when you are making any changes to the block +layer. An iotest case is usually the choice for that. There are already ma= ny +test cases, so it is possible that extending one of them may achieve the g= oal +and save the boilerplate to create one. (Unfortunately, there isn't a 100% +reliable way to find a related one out of hundreds of tests. One approach= is +using ``git grep``.) + +Usually an iotest case consists of two files. One is an executable that +produces output to stdout and stderr, the other is the expected reference +output. They are given the same number in file names. E.g. Test script ``0= 55`` +and reference output ``055.out``. + +In rare cases, when outputs differ between cache mode ``none`` and others,= a +``.out.nocache`` file is added. In other cases, when outputs differ between +image formats, more than one ``.out`` files are created ending with the +respective format names, e.g. ``178.out.qcow2`` and ``178.out.raw``. + +There isn't a hard rule about how to write a test script, but a new test is +usually a (copy and) modification of an existing case. There are a few +commonly used ways to create a test: + +* A Bash script. It will make use of several environmental variables relat= ed + to the testing procedure, and could source a group of ``common.*`` libra= ries + for some common helper routines. + +* A Python unittest script. Import ``iotests`` and create a subclass of + ``iotests.QMPTestCase``, then call ``iotests.main`` method. The downside= of + this approach is that the output is too scarce, and the script is consid= ered + harder to debug. + +* A simple Python script without using unittest module. This could also im= port + ``iotests`` for launching QEMU and utilities etc, but it doesn't inherit + from ``iotests.QMPTestCase`` therefore doesn't use the Python unittest + execution. This is a combination of 1 and 2. + +Pick the language per your preference since both Bash and Python have +comparable library support for invoking and interacting with QEMU programs= . If +you opt for Python, it is strongly recommended to write Python 3 compatible +code. + +Docker based tests +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Introduction +------------ + +The Docker testing framework in QEMU utilizes public Docker images to buil= d and +test QEMU in predefined and widely accessible Linux environments. This ma= kes +it possible to expand the test coverage across distros, toolchain flavors = and +library versions. + +Prerequisites +------------- + +Install "docker" with the system package manager and start the Docker serv= ice +on your development machine, then make sure you have the privilege to run +Docker commands. Typically it means setting up passwordless ``sudo docker`` +command or login as root. For example: + +.. code:: + + $ sudo yum install docker + $ # or `apt-get install docker` for Ubuntu, etc. + $ sudo systemctl start docker + $ sudo docker ps + +The last command should print an empty table, to verify the system is read= y. + +An alternative method to set up permissions is by adding the current user = to +"docker" group and making the docker daemon socket file (by default +``/var/run/docker.sock``) accessible to the group: + +.. code:: + + $ sudo groupadd docker + $ sudo usermod $USER -G docker + $ sudo chown :docker /var/run/docker.sock + +Note that any one of above configurations makes it possible for the user to +exploit the whole host with Docker bind mounting or other privileged +operations. So only do it on development machines. + +Quickstart +---------- + +From source tree, type ``make docker`` to see the help. Testing can be sta= rted +without configuring or building QEMU (``configure`` and ``make`` are done = in +the container, with parameters defined by the make target): + +.. code:: + + make docker-test-build@min-glib + +This will create a container instance using the ``min-glib`` image (the im= age +is downloaded and initialized automatically), in which the ``test-build`` = job +is executed. + +Images +------ + +Along with many other images, the ``min-glib`` image is defined in a Docke= rfile +in ``tests/docker/dockefiles/``, called ``min-glib.docker``. ``make docker= `` +command will list all the available images. + +To add a new image, simply create a new ``.docker`` file under the +``tests/docker/dockerfiles/`` directory. + +A ``.pre`` script can be added beside the ``.docker`` file, which will be +executed before building the image under the build context directory. This= is +mainly used to do necessary host side setup. One such setup is ``binfmt_mi= sc``, +for example, to make qemu-user powered cross build containers work. + +Tests +----- + +Different tests are added to cover various configurations to build and test +QEMU. Docker tests are the executables under ``tests/docker`` named +``test-*``. They are typically shell scripts and are built on top of a she= ll +library, ``tests/docker/common.rc``, which provides helpers to find the QE= MU +source and build it. + +The full list of tests is printed in the ``make docker`` help. + +Tools +----- + +There are executables that are created to run in a specific Docker environ= ment. +This makes it easy to write scripts that have heavy or special dependencie= s, +but are still very easy to use. + +Currently the only tool is ``travis``, which mimics the Travis-CI tests in= a +container. It runs in the ``travis`` image: + +.. code:: + + make docker-travis@travis + +Debugging a Docker test failure +------------------------------- + +When CI tasks, maintainers or yourself report a Docker test failure, follo= w the +below steps to debug it: + +1. Locally reproduce the failure with the reported command line. E.g. run + ``make docker-test-mingw@fedora J=3D8``. +2. Add "V=3D1" to the command line, try again, to see the verbose output. +3. Further add "DEBUG=3D1" to the command line. This will pause in a shell= prompt + in the container right before testing starts. You could either manually + build QEMU and run tests from there, or press Ctrl-D to let the Docker + testing continue. +4. If you press Ctrl-D, the same building and testing procedure will begin= , and + will hopefully run into the error again. After that, you will be droppe= d to + the prompt for debug. + +Options +------- + +Various options can be used to affect how Docker tests are done. The full +list is in the ``make docker`` help text. The frequently used ones are: + +* ``V=3D1``: the same as in top level ``make``. It will be propagated to t= he + container and enable verbose output. +* ``J=3D$N``: the number of parallel tasks in make commands in the contain= er, + similar to the ``-j $N`` option in top level ``make``. (The ``-j`` optio= n in + top level ``make`` will not be propagated into the container.) +* ``DEBUG=3D1``: enables debug. See the previous "Debugging a Docker test + failure" section. + +VM testing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This test suite contains scripts that bootstrap various guest images that = have +necessary packages to build QEMU. The basic usage is documented in ``Makef= ile`` +help which is displayed with ``make vm-test``. + +Quickstart +---------- + +Run ``make vm-test`` to list available make targets. Invoke a specific make +command to run build test in an image. For example, ``make vm-build-freebs= d`` +will build the source tree in the FreeBSD image. The command can be execut= ed +from either the source tree or the build dir; if the former, ``./configure= `` is +not needed. The command will then generate the test image in ``./tests/vm/= `` +under the working directory. + +Note: images created by the scripts accept a well-known RSA key pair for S= SH +access, so they SHOULD NOT be exposed to external interfaces if you are +concerned about attackers taking control of the guest and potentially +exploiting a QEMU security bug to compromise the host. + +QEMU binary +----------- + +By default, qemu-system-x86_64 is searched in $PATH to run the guest. If t= here +isn't one, or if it is older than 2.10, the test won't work. In this case, +provide the QEMU binary in env var: ``QEMU=3D/path/to/qemu-2.10+``. + +Make jobs +--------- + +The ``-j$X`` option in the make command line is not propagated into the VM, +specify ``J=3D$X`` to control the make jobs in the guest. + +Debugging +--------- + +Add ``DEBUG=3D1`` and/or ``V=3D1`` to the make command to allow interactive +debugging and verbose output. If this is not enough, see the next section. + +Manual invocation +----------------- + +Each guest script is an executable script with the same command line optio= ns. +For example to work with the netbsd guest, use ``$QEMU_SRC/tests/vm/netbsd= ``: + +.. code:: + + $ cd $QEMU_SRC/tests/vm + + # To bootstrap the image + $ ./netbsd --build-image --image /var/tmp/netbsd.img + <...> + + # To run an arbitrary command in guest (the output will not be echoed = unless + # --debug is added) + $ ./netbsd --debug --image /var/tmp/netbsd.img uname -a + + # To build QEMU in guest + $ ./netbsd --debug --image /var/tmp/netbsd.img --build-qemu $QEMU_SRC + + # To get to an interactive shell + $ ./netbsd --interactive --image /var/tmp/netbsd.img sh + +Adding new guests +----------------- + +Please look at existing guest scripts for how to add new guests. + +Most importantly, create a subclass of BaseVM and implement ``build_image(= )`` +method and define ``BUILD_SCRIPT``, then finally call ``basevm.main()`` fr= om +the script's ``main()``. + +* Usually in ``build_image()``, a template image is downloaded from a + predefined URL. ``BaseVM._download_with_cache()`` takes care of the cach= e and + the checksum, so consider using it. + +* Once the image is downloaded, users, SSH server and QEMU build deps shou= ld + be set up: + + - Root password set to ``BaseVM.ROOT_PASS`` + - User ``BaseVM.GUEST_USER`` is created, and password set to + ``BaseVM.GUEST_PASS`` + - SSH service is enabled and started on boot, + ``$QEMU_SRC/tests/keys/id_rsa.pub`` is added to ssh's ``authorized_key= s`` + file of both root and the normal user + - DHCP client service is enabled and started on boot, so that it can + automatically configure the virtio-net-pci NIC and communicate with QE= MU + user net (10.0.2.2) + - Necessary packages are installed to untar the source tarball and build + QEMU + +* Write a proper ``BUILD_SCRIPT`` template, which should be a shell script= that + untars a raw virtio-blk block device, which is the tarball data blob of = the + QEMU source tree, then configure/build it. Running "make check" is also + recommended. + +Image fuzzer testing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +An image fuzzer was added to exercise format drivers. Currently only qcow2= is +supported. To start the fuzzer, run + +.. code:: + + tests/image-fuzzer/runner.py -c '[["qemu-img", "info", "$test_img"]]' /t= mp/test qcow2 + +Alternatively, some command different from "qemu-img info" can be tested, = by +changing the ``-c`` option. diff --git a/tests/vm/README b/tests/vm/README index ae53dce6ee..f9c04cc0e7 100644 --- a/tests/vm/README +++ b/tests/vm/README @@ -1,89 +1 @@ -=3D=3D=3D VM test suite to run build in guests =3D=3D=3D - -=3D=3D Intro =3D=3D - -This test suite contains scripts that bootstrap various guest images that = have -necessary packages to build QEMU. The basic usage is documented in Makefile -help which is displayed with "make vm-test". - -=3D=3D Quick start =3D=3D - -Run "make vm-test" to list available make targets. Invoke a specific make -command to run build test in an image. For example, "make vm-build-freebsd" -will build the source tree in the FreeBSD image. The command can be execut= ed -from either the source tree or the build dir; if the former, ./configure i= s not -needed. The command will then generate the test image in ./tests/vm/ under= the -working directory. - -Note: images created by the scripts accept a well-known RSA key pair for S= SH -access, so they SHOULD NOT be exposed to external interfaces if you are -concerned about attackers taking control of the guest and potentially -exploiting a QEMU security bug to compromise the host. - -=3D=3D QEMU binary =3D=3D - -By default, qemu-system-x86_64 is searched in $PATH to run the guest. If t= here -isn't one, or if it is older than 2.10, the test won't work. In this case, -provide the QEMU binary in env var: QEMU=3D/path/to/qemu-2.10+. - -=3D=3D Make jobs =3D=3D - -The "-j$X" option in the make command line is not propagated into the VM, -specify "J=3D$X" to control the make jobs in the guest. - -=3D=3D Debugging =3D=3D - -Add "DEBUG=3D1" and/or "V=3D1" to the make command to allow interactive de= bugging -and verbose output. If this is not enough, see the next section. - -=3D=3D Manual invocation =3D=3D - -Each guest script is an executable script with the same command line optio= ns. -For example to work with the netbsd guest, use $QEMU_SRC/tests/vm/netbsd: - - $ cd $QEMU_SRC/tests/vm - - # To bootstrap the image - $ ./netbsd --build-image --image /var/tmp/netbsd.img - <...> - - # To run an arbitrary command in guest (the output will not be echoed = unless - # --debug is added) - $ ./netbsd --debug --image /var/tmp/netbsd.img uname -a - - # To build QEMU in guest - $ ./netbsd --debug --image /var/tmp/netbsd.img --build-qemu $QEMU_SRC - - # To get to an interactive shell - $ ./netbsd --interactive --image /var/tmp/netbsd.img sh - -=3D=3D Adding new guests =3D=3D - -Please look at existing guest scripts for how to add new guests. - -Most importantly, create a subclass of BaseVM and implement build_image() -method and define BUILD_SCRIPT, then finally call basevm.main() from the -script's main(). - - - Usually in build_image(), a template image is downloaded from a predef= ined - URL. BaseVM._download_with_cache() takes care of the cache and the - checksum, so consider using it. - - - Once the image is downloaded, users, SSH server and QEMU build deps sh= ould - be set up: - - * Root password set to BaseVM.ROOT_PASS - * User BaseVM.GUEST_USER is created, and password set to BaseVM.GUEST_= PASS - * SSH service is enabled and started on boot, - $QEMU_SRC/tests/keys/id_rsa.pub is added to ssh's "authorized_keys" = file - of both root and the normal user - * DHCP client service is enabled and started on boot, so that it can - automatically configure the virtio-net-pci NIC and communicate with = QEMU - user net (10.0.2.2) - * Necessary packages are installed to untar the source tarball and bui= ld - QEMU - - - Write a proper BUILD_SCRIPT template, which should be a shell script t= hat - untars a raw virtio-blk block device, which is the tarball data blob o= f the - QEMU source tree, then configure/build it. Running "make check" is also - recommended. +See docs/devel/testing.rst for help. --=20 2.14.3