From nobody Thu May 2 00:41:24 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1516366235837720.6286554363011; Fri, 19 Jan 2018 04:50:35 -0800 (PST) Received: from localhost ([::1]:47318 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ecW7s-0006dv-VP for importer@patchew.org; Fri, 19 Jan 2018 07:50:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57655) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ecW6m-00061N-At for qemu-devel@nongnu.org; Fri, 19 Jan 2018 07:49:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ecW6i-0002o1-TD for qemu-devel@nongnu.org; Fri, 19 Jan 2018 07:49:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40056) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ecW6i-0002nk-Kr for qemu-devel@nongnu.org; Fri, 19 Jan 2018 07:49:20 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EF9677EA85; Fri, 19 Jan 2018 12:49:18 +0000 (UTC) Received: from localhost (ovpn-116-254.ams2.redhat.com [10.36.116.254]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A339759F0; Fri, 19 Jan 2018 12:49:10 +0000 (UTC) From: Stefan Hajnoczi To: virtio-dev@lists.oasis-open.org Date: Fri, 19 Jan 2018 12:49:08 +0000 Message-Id: <20180119124908.23637-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 19 Jan 2018 12:49:19 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [RFC virtio-dev v2] vhost-user: add vhost-user device type X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Maxime Coquelin , Wei Wang , qemu-devel@nongnu.org, Stefan Hajnoczi , "Michael S . Tsirkin" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The vhost-user device backend facilitates vhost-user device emulation through vhost-user protocol exchanges and access to shared memory. Software-defined networking, storage, and other I/O appliances can provide services through this device. For more information about virtio-vhost-user, see https://wiki.qemu.org/Features/VirtioVhostUser This device is based on Wei Wang's vhost-pci work. The virtio vhost-user device differs from vhost-pci because it is a single virtio device type that exposes the vhost-user protocol instead of a family of new virtio device types, one for each vhost-user device type. This device supports vhost-user slave and vhost-user master reconnection. It also contains a UUID so that vhost-user slave programs can identify a specific device among many without using bus addresses. It is somewhat unconventional for a virtio device because it makes use of additional resources called doorbells, notifications, and shared memory. A mapping of these resources to the virtio PCI transport is provided. Other transports, such as CCW may not be able to support this device. Cc: Wei Wang Cc: Michael S. Tsirkin Cc: Maxime Coquelin Signed-off-by: Stefan Hajnoczi --- v2: * Call the device a "vhost-user device backend" instead of a "vhost-user slave" * Use rxq/txq layout suggested by Wei Wang content.tex | 295 +++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ introduction.tex | 1 + 2 files changed, 296 insertions(+) diff --git a/content.tex b/content.tex index c840588..d890151 100644 --- a/content.tex +++ b/content.tex @@ -3022,6 +3022,8 @@ Device ID & Virtio Device \\ \hline 22 & pstore device \\ \hline +24 & vhost-user device backend \\ +\hline \end{tabular} =20 Some of the devices above are unspecified by this document, @@ -5819,6 +5821,299 @@ descriptor for the \field{sense_len}, \field{residu= al}, \field{status_qualifier}, \field{status}, \field{response} and \field{sense} fields. =20 +\section{Vhost-user Device Backend}\label{sec:Device Types / Vhost-user De= vice Backend} + +The vhost-user device backend facilitates vhost-user device emulation thro= ugh +vhost-user protocol exchanges and access to shared memory. Software-defin= ed +networking, storage, and other I/O appliances can provide services through= this +device. + +This section relies on definitions from the \hyperref[intro:Vhost-user +Protocol]{Vhost-user Protocol}. Knowledge of the vhost-user protocol is a +prerequisite for understanding this device. + +The \hyperref[intro:Vhost-user Protocol]{Vhost-user Protocol} was original= ly +designed for processes on a single system communicating over UNIX domain +sockets. The virtio vhost-user device backend allows the vhost-user slave= to +communicate with the vhost-user master over the device instead of a UNIX d= omain +socket. This allows the slave and master to run on two separate systems s= uch +as a virtual machine and a hypervisor. + +The vhost-user slave program exchanges vhost-user protocol messages with t= he +vhost-user master through this device. How the device implementation +communicates with the vhost-user master is beyond the scope of this +specification. One possible device implementation uses a UNIX domain sock= et to +relay messages to a vhost-user master process running on the same host. + +Existing vhost-user slave programs that communicate over UNIX domain socke= ts +can support the virtio vhost-user device backend without invasive changes +because the pre-existing vhost-user wire protocol is used. + +\subsection{Device ID}\label{sec:Device Types / Vhost-user Device Backend = / Device ID} + 24 + +\subsection{Virtqueues}\label{sec:Device Types / Vhost-user Device Backend= / Virtqueues} + +\begin{description} +\item[0] rxq (device-to-driver vhost-user protocol messages) +\item[1] txq (driver-to-device vhost-user protocol messages) +\end{description} + +\subsection{Feature bits}\label{sec:Device Types / Vhost-user Device Backe= nd / Feature bits} + +No feature bits are defined at this time. + +\subsection{Device configuration layout}\label{sec:Device Types / Vhost-us= er Device Backend / Device configuration layout} + + All fields of this configuration are always available. + +\begin{lstlisting} +struct virtio_vhost_user_config { + le32 status; +#define VIRTIO_VHOST_USER_STATUS_SLAVE_UP 0 +#define VIRTIO_VHOST_USER_STATUS_MASTER_UP 1 + le32 max_vhost_queues; + u8 uuid[16]; +}; +\end{lstlisting} + +\begin{description} +\item[\field{status}] contains the vhost-user operational status. The def= ault + value of this field is 0. + + The driver sets VIRTIO_VHOST_USER_STATUS_SLAVE_UP to indicate readines= s for + the vhost-user master to connect. The vhost-user master cannot connect + unless the driver has set this bit first. + + When the driver clears VIRTIO_VHOST_USER_SLAVE_UP while the vhost-user + master is connected, the vhost-user master is disconnected. + + When the vhost-user master disconnects, both + VIRTIO_VHOST_USER_STATUS_SLAVE_UP and VIRTIO_VHOST_USER_STATUS_MASTER_= UP + are cleared by the device. Communication can be restarted by the driv= er + setting VIRTIO_VHOST_USER_STATUS_SLAVE_UP again. + + A configuration change notification is sent when the device changes + this field unless a write to the field by the driver caused the change. + +\item[\field{max_vhost_queues}] is the maximum number of vhost-user queues + supported by this device. This field is always greater than 0. + +\item[\field{uuid}] is the Universally Unique Identifier (UUID) for this + device. If the device has no UUID then this field contains the nil + UUID (all zeroes). The UUID allows vhost-user slave programs to ident= ify a + specific vhost-user device backend among many without relying on bus + addresses. +\end{description} + +\drivernormative{\subsubsection}{Device configuration layout}{Device Types= / Vhost-user Device Backend / Device configuration layout} + +The driver MUST NOT write to device configuration fields other than +\field{status}. + +The driver MUST NOT set undefined bits in the \field{status} configuration= field. + +\devicenormative{\subsection}{Device Initialization}{Device Types / Vhost-= user Device Backend / Device Initialization} + +The driver SHOULD check the \field{max_vhost_queues} configuration field to +determine how many queues the vhost-user slave will be able to support. + +The driver SHOULD fetch the \field{uuid} configuration field to allow +vhost-user slave programs to identify a specific device among many. + +The driver SHOULD place at least one buffer in rxq before setting the +VIRTIO_VHOST_USER_SLAVE_UP bit in the \field{status} configuration field. + +The driver MUST handle rxq virtqueue notifications that occur before the +configuration change notification. It is possible that a vhost-user proto= col +message from the vhost-user master arrives before the driver has seen the +configuration change notification for the VIRTIO_VHOST_USER_STATUS_MASTER_= UP +\field{status} change. + +\subsection{Device Operation}\label{sec:Device Types / Vhost-user Device B= ackend / Device Operation} + +Device operation consists of operating request queues and response queues. + +\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / = Vhost-user Device Backend / Device Operation / Device Operation: Request Qu= eues} + +The driver receives vhost-user protocol messages from the vhost-user maste= r on +rxq. The driver sends responses to the vhost-user master on txq. + +The driver sends slave-initiated requests on txq. The driver receives +responses from the vhost-user master on rxq. + +All virtqueues offer in-order guaranteed delivery semantics for vhost-user +protocol messages. + +Each buffer is a vhost-user protocol message as defined by the +\hyperref[intro:Vhost-user Protocol]{Vhost-user Protocol}. In order to en= able +cross-endian communication, all message fields are little-endian instead o= f the +native byte order normally used by the protocol. + +The appropriate size of rxq buffers is at least as large as the largest me= ssage +defined by the \hyperref[intro:Vhost-user Protocol]{Vhost-user Protocol} +standard version that the driver supports. If the vhost-user master sends= a +message that is too large for an rxq buffer then DEVICE_NEEDS_RESET is set= and +the driver must reset the device. + +File descriptor passing is handled differently by the vhost-user device +backend. When a message is received that carries one or more file descrip= tors +according to the vhost-user protocol, additional device resources become +available to the driver. + +\subsection{Additional Device Resources over PCI}\label{sec:Device Types /= Vhost-user Device Backend / Additional Device Resources over PCI} + +The vhost-user device backend contains additional device resources beyond +configuration space and virtqueues. The nature of these resources is +transport-specific and therefore only virtio transports that provide these +resources support the vhost-user device backend. + +The following additional resources exist: +\begin{description} + \item[Doorbells] The driver signals the vhost-user master through doorbe= lls. The signal does not carry any data, it is purely an event. + \item[Notifications] The vhost-user master signals the driver for events= besides virtqueue activity and configuration changes by sending notificati= ons. + \item[Shared memory] The vhost-user master gives access to memory that c= an be mapped by the driver. +\end{description} + +\subsubsection{Doorbell Numbering}\label{sec:Device Types / Vhost-user Dev= ice Backend / Additional Device Resources over PCI / Doorbell Numbering} + +Doorbells are laid out as follows: + +\begin{description} +\item[0] Vring call for vhost-user queue 0 +\item[\ldots] +\item[N] Vring err for vhost-user queue 0 +\item[\ldots] +\item[2N] Log +\end{description} + +\subsubsection{Notifications}\label{sec:Device Types / Vhost-user Device B= ackend / Additional Device Resources over PCI / Notifications} + +Notifications are laid out as follows: + +\begin{description} +\item[0] Vring kick for vhost-user queue 0 +\item[\ldots] +\item[N-1] Vring kick for vhost-user queue N-1 +\end{description} + +\subsubsection{Shared Memory Layout}\label{sec:Device Types / Vhost-user D= evice Backend / Additional Device Resources over PCI / Shared Memory Layout} + +Shared memory is laid out as follows: + +\begin{description} +\item[0] Vhost memory region 0 +\item[SIZE0] Vhost memory region 1 +\item[\ldots] +\item[SIZE0 + SIZE1 + \ldots] Log +\end{description} + +The size of vhost memory region 0 is \field{SIZE0}, the size of vhost memo= ry +region 1 is \field{SIZE1}, and so on. + +\subsubsection{Availability of Additional Resources}\label{sec:Device Type= s / Vhost-user Device Backend / Additional Device Resources over PCI / Avai= lability of Additional Resources} + +The following vhost-user protocol messages convey access to additional dev= ice +resources: + +\begin{description} +\item[VHOST_USER_SET_MEM_TABLE] Contents of vhost memory regions are avail= able to the driver in shared memory. Region contents are laid out in the s= ame order as the vhost memory region list. +\item[VHOST_USER_SET_LOG_BASE] Contents of the log are available to the dr= iver in shared memory. +\item[VHOST_USER_SET_LOG_FD] The log doorbell is available to the driver. = Writes to the log doorbell before this message is received produce no effe= ct. +\item[VHOST_USER_SET_VRING_KICK] The vring kick notification for this queu= e is available to the driver. The first notification may occur before the = driver has processed this message. +\item[VHOST_USER_SET_VRING_CALL] The vring call doorbell for this queue is= available to the driver. Writes to the vring call doorbell before this me= ssage is received produce no effect. +\item[VHOST_USER_SET_VRING_ERR] The vring err doorbell for this queue is a= vailable to the driver. Writes to the vring err doorbell before this messa= ge is received produce no effect. +\item[VHOST_USER_SET_SLAVE_REQ_FD] The driver may send vhost-user protocol= slave messages on txq. Buffers put onto txq before this message is receiv= ed are discarded by the device. +\end{description} + +Additional resources are configured on the virtio PCI transport by the fol= lowing \field{struct virtio_pci_cap.cfg_type} values: + +\begin{lstlisting} +#define VIRTIO_PCI_CAP_DOORBELL_CFG 6 +#define VIRTIO_PCI_CAP_NOTIFICATION_CFG 7 +#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 +\end{lstlisting} + +\subsubsection{Doorbell structure layout}\label{sec:Device Types / Vhost-u= ser Device Backend / Additional Device Resources over PCI / Doorbell capabi= lity} + +The doorbell location is found using the VIRTIO_PCI_CAP_DOORBELL_CFG +capability. This capability is immediately followed by an additional +field, like so: + +\begin{lstlisting} +struct virtio_pci_doorbell_cap { + struct virtio_pci_cap cap; + le32 doorbell_off_multiplier; +}; +\end{lstlisting} + +The doorbell address within a BAR is calculated as follows: + +\begin{lstlisting} + cap.offset + doorbell_idx * doorbell_off_multiplier +\end{lstlisting} + +The \field{cap.offset} and \field{doorbell_off_multiplier} are taken from = the +notification capability structure above, and the \field{doorbell_idx} is t= he +doorbell number. + +\devicenormative{\paragraph}{Doorbell capability}{Device Types / Vhost-use= r Device Backend / Additional Device Resources over PCI / Doorbell capabili= ty} +The device MUST present at least one doorbell capability. + +The \field{cap.offset} MUST be 2-byte aligned. =20 + +The device MUST either present \field{doorbell_off_multiplier} as an even = power of 2, +or present \field{doorbell_off_multiplier} as 0. + +The value \field{cap.length} presented by the device MUST be at least 2 +and MUST be large enough to support doorbell offsets for all supported +doorbells in all possible configurations. + +The value \field{cap.length} presented by the device MUST satisfy: +\begin{lstlisting} +cap.length >=3D num_doorbells * doorbell_off_multiplier + 2 +\end{lstlisting} + +The number of doorbells is \field{num_doorbells} and is dependent on the +device. + +\subsubsection{Notification structure layout}\label{sec:Device Types / Vho= st-user Device Backend / Additional Device Resources over PCI / Notificatio= n capability} + +The notification structure allows MSI-X vectors to be configured for +notification interrupts. If MSI-X is not available, bit 2 of the ISR stat= us +indicates that a notification occurred. + +The notification structure is found using the VIRTIO_PCI_CAP_DOORBELL_CFG +capability. + +\begin{lstlisting} +struct virtio_pci_notification_cfg { + le16 notification_select; /* read-write */ + le16 notification_msix_vector; /* read-write */ +}; +\end{lstlisting} + +The driver indicates which notification is of interest by writing the +\field{notification_select} field. The driver then writes the MSI-X vecto= r or +\field{VIRTIO_MSI_NO_VECTOR} to \field{notification_msix_vector} to change= the +MSI-X vector for that notification. + +\subsubsection{Shared memory capability}\label{sec:Device Types / Vhost-us= er Device Backend / Additional Device Resources over PCI / Shared Memory ca= pability} + +The shared memory location is found using the VIRTIO_PCI_CAP_SHARED_MEMORY= _CFG +capability. + +\devicenormative{\paragraph}{Shared Memory capability}{Device Types / Vhos= t-user Device Backend / Additional Device Resources over PCI / Shared Memor= y capability} +The device MUST present exactly one shared memory capability. + +The device MUST locate shared memory in a Memory Space BAR. + +The device SHOULD locate shared memory in a Prefetchable BAR. + +The \field{cap.offset} MUST be 4096-byte aligned. + +The value \field{cap.length} presented by the device MUST be non-zero and = 4096-byte aligned. + \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} =20 Currently these device-independent feature bits defined: diff --git a/introduction.tex b/introduction.tex index 979881e..0bf400d 100644 --- a/introduction.tex +++ b/introduction.tex @@ -60,6 +60,7 @@ Levels'', BCP 14, RFC 2119, March 1997. \newline\url{http= ://www.ietf.org/rfc/rfc \phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} & SCSI Multimedia Commands, \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=3Df&f=3Dmmc6r00.pd= f}\\ + \phantomsection\label{intro:Vhost-user Protocol}\textbf{[Vhost-user Proto= col]} & Vhost-user Protocol, \newline\url{https://git.qemu.org/?p=3Dqemu.gi= t;a=3Dblob_plain;f=3Ddocs/interop/vhost-user.txt;hb=3DHEAD}, and any future= revisions\\ =20 \end{longtable} =20 --=20 2.14.3