[RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability

David Woodhouse posted 3 patches 1 month, 1 week ago
[RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by David Woodhouse 1 month, 1 week ago
From: David Woodhouse <dwmw@amazon.co.uk>

Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
buffer similar to the existing `restricted-dma-pool` device-tree feature.

The difference is that this is per-device; each device needs to have its
own. Perhaps we should add a UUID to the capability, and have a way for
a device to not *provide* its own buffer, but just to reference the UUID
of a buffer elsewhere?

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..23e0d57 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
 \item ISR Status
 \item Device-specific configuration (optional)
 \item PCI configuration access
+\item SWIOTLB bounce buffer
 \end{itemize}
 
 Each structure can be mapped by a Base Address register (BAR) belonging to
@@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
 #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
 /* Vendor-specific data */
 #define VIRTIO_PCI_CAP_VENDOR_CFG        9
+/* Software IOTLB bounce buffer */
+#define VIRTIO_PCI_CAP_SWIOTLB           10
 \end{lstlisting}
 
         Any other value is reserved for future use.
@@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
 The driver MUST qualify the \field{vendor_id} before
 interpreting or writing into the Vendor data capability.
 
+\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+The optional Software IOTLB bounce buffer capability allows the
+device to provide a memory region which can be used by the driver
+driver for bounce buffering. This allows a device on the PCI
+transport to operate without DMA access to system memory addresses.
+
+The Software IOTLB region is referenced by the
+VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
+range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
+capability, if negotiated.
+
+\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+Devices which present the Software IOTLB bounce buffer capability
+SHOULD also offer the VIRTIO_F_SWIOTLB feature.
+
+\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Layout /
+Software IOTLB bounce buffer capability}
+
+The driver SHOULD use the offered buffer in preference to passing system
+memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
+feature, then the driver MUST use the offered buffer and never pass system
+memory addresses to the device.
+
 \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
 
 The VIRTIO_PCI_CAP_PCI_CFG capability
-- 
2.49.0
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Michael S. Tsirkin 1 month, 1 week ago
On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
> buffer similar to the existing `restricted-dma-pool` device-tree feature.
> 
> The difference is that this is per-device; each device needs to have its
> own. Perhaps we should add a UUID to the capability, and have a way for
> a device to not *provide* its own buffer, but just to reference the UUID
> of a buffer elsewhere?
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..23e0d57 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  \item ISR Status
>  \item Device-specific configuration (optional)
>  \item PCI configuration access
> +\item SWIOTLB bounce buffer
>  \end{itemize}
>  
>  Each structure can be mapped by a Base Address register (BAR) belonging to
> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>  /* Vendor-specific data */
>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> +/* Software IOTLB bounce buffer */
> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>  \end{lstlisting}
>  
>          Any other value is reserved for future use.
> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>  The driver MUST qualify the \field{vendor_id} before
>  interpreting or writing into the Vendor data capability.
>  
> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The optional Software IOTLB bounce buffer capability allows the
> +device to provide a memory region which can be used by the driver
> +driver for bounce buffering. This allows a device on the PCI
> +transport to operate without DMA access to system memory addresses.
> +
> +The Software IOTLB region is referenced by the
> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> +capability, if negotiated.
> +
> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +Devices which present the Software IOTLB bounce buffer capability
> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> +
> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The driver SHOULD use the offered buffer in preference to passing system
> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
> +feature, then the driver MUST use the offered buffer and never pass system
> +memory addresses to the device.
> +
>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>  
>  The VIRTIO_PCI_CAP_PCI_CFG capability
> -- 
> 2.49.0
> 



So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
like this, performance will suffer. But if you do normal WB, since device
accesses do not go on the bus, they do not get synchronized with driver
writes and there's really no way to synchronize them.

First, this needs to be addressed.

In this age of accelerators for everything, building pci based
interfaces that can't be efficiently accelerated seems shortsighted ...

-- 
MST
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by David Woodhouse 1 month, 1 week ago
On Thu, 2025-04-03 at 03:27 -0400, Michael S. Tsirkin wrote:
> 
> So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
> like this, performance will suffer. But if you do normal WB, since device
> accesses do not go on the bus, they do not get synchronized with driver
> writes and there's really no way to synchronize them.
> 
> First, this needs to be addressed.

I was assuming the bounce buffer region would generally be in a BAR of
its own. Would a write-combining mapping not suffice?

In the case of a virtual device where the hypervisor *knows* it's all
just host memory anyway and is cache-coherent, doesn't the hypervisor
get to just make it normally cached anyway, regardless of what the
guest asks for? I forget all the bizarre rules about guest/host PAT
combinations now, and that's just x86 anyway...

I think it's OK to have a feature which makes more sense for a virtual
device than it does for a physical device.

For example, it doesn't make any sense for a physical device *not* to
have VIRTIO_F_ACCESS_PLATFORM, does it? That is only possible because
virtual devices are "special" and can have the bug^Wmicro-optimisation
of which we spoke.

The intended use case for this bounce buffering *is* more targeted at
virtual devices than physical, and yes, it'll probably perform better
on virtual devices than physical too.

But if a physical device finds itself in a system where it actually
*cannot* do DMA to system memory, and provides this bounce-buffer...
then however slow it is, it's still going to have better performance
than the complete lack of functionality that would otherwise result :)



Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Zhu Lingshan 1 month, 1 week ago
On 4/3/2025 3:27 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
>> buffer similar to the existing `restricted-dma-pool` device-tree feature.
>>
>> The difference is that this is per-device; each device needs to have its
>> own. Perhaps we should add a UUID to the capability, and have a way for
>> a device to not *provide* its own buffer, but just to reference the UUID
>> of a buffer elsewhere?
>>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
>>  1 file changed, 33 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..23e0d57 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>  \item ISR Status
>>  \item Device-specific configuration (optional)
>>  \item PCI configuration access
>> +\item SWIOTLB bounce buffer
>>  \end{itemize}
>>  
>>  Each structure can be mapped by a Base Address register (BAR) belonging to
>> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>>  /* Vendor-specific data */
>>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
>> +/* Software IOTLB bounce buffer */
>> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>>  \end{lstlisting}
>>  
>>          Any other value is reserved for future use.
>> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>>  The driver MUST qualify the \field{vendor_id} before
>>  interpreting or writing into the Vendor data capability.
>>  
>> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> +Software IOTLB bounce buffer capability}
>> +
>> +The optional Software IOTLB bounce buffer capability allows the
>> +device to provide a memory region which can be used by the driver
>> +driver for bounce buffering. This allows a device on the PCI
>> +transport to operate without DMA access to system memory addresses.
>> +
>> +The Software IOTLB region is referenced by the
>> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
>> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
>> +capability, if negotiated.
>> +
>> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> +Software IOTLB bounce buffer capability}
>> +
>> +Devices which present the Software IOTLB bounce buffer capability
>> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
>> +
>> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>> +Software IOTLB bounce buffer capability}
>> +
>> +The driver SHOULD use the offered buffer in preference to passing system
>> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
>> +feature, then the driver MUST use the offered buffer and never pass system
>> +memory addresses to the device.
>> +
>>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>>  
>>  The VIRTIO_PCI_CAP_PCI_CFG capability
>> -- 
>> 2.49.0
>>
>
>
> So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
> like this, performance will suffer. But if you do normal WB, since device
and this even possibly can cause TLB thrashing.... which is a worse case.

Thanks
Zhu Lingshan
> accesses do not go on the bus, they do not get synchronized with driver
> writes and there's really no way to synchronize them.
>
> First, this needs to be addressed.
>
> In this age of accelerators for everything, building pci based
> interfaces that can't be efficiently accelerated seems shortsighted ...
>
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Michael S. Tsirkin 1 month, 1 week ago
On Thu, Apr 03, 2025 at 03:36:04PM +0800, Zhu Lingshan wrote:
> On 4/3/2025 3:27 PM, Michael S. Tsirkin wrote:
> > On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
> >> From: David Woodhouse <dwmw@amazon.co.uk>
> >>
> >> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
> >> buffer similar to the existing `restricted-dma-pool` device-tree feature.
> >>
> >> The difference is that this is per-device; each device needs to have its
> >> own. Perhaps we should add a UUID to the capability, and have a way for
> >> a device to not *provide* its own buffer, but just to reference the UUID
> >> of a buffer elsewhere?
> >>
> >> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> >> ---
> >>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
> >>  1 file changed, 33 insertions(+)
> >>
> >> diff --git a/transport-pci.tex b/transport-pci.tex
> >> index a5c6719..23e0d57 100644
> >> --- a/transport-pci.tex
> >> +++ b/transport-pci.tex
> >> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >>  \item ISR Status
> >>  \item Device-specific configuration (optional)
> >>  \item PCI configuration access
> >> +\item SWIOTLB bounce buffer
> >>  \end{itemize}
> >>  
> >>  Each structure can be mapped by a Base Address register (BAR) belonging to
> >> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> >>  /* Vendor-specific data */
> >>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> >> +/* Software IOTLB bounce buffer */
> >> +#define VIRTIO_PCI_CAP_SWIOTLB           10
> >>  \end{lstlisting}
> >>  
> >>          Any other value is reserved for future use.
> >> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> >>  The driver MUST qualify the \field{vendor_id} before
> >>  interpreting or writing into the Vendor data capability.
> >>  
> >> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> >> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >> +Software IOTLB bounce buffer capability}
> >> +
> >> +The optional Software IOTLB bounce buffer capability allows the
> >> +device to provide a memory region which can be used by the driver
> >> +driver for bounce buffering. This allows a device on the PCI
> >> +transport to operate without DMA access to system memory addresses.
> >> +
> >> +The Software IOTLB region is referenced by the
> >> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> >> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> >> +capability, if negotiated.
> >> +
> >> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> >> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >> +Software IOTLB bounce buffer capability}
> >> +
> >> +Devices which present the Software IOTLB bounce buffer capability
> >> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> >> +
> >> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> >> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >> +Software IOTLB bounce buffer capability}
> >> +
> >> +The driver SHOULD use the offered buffer in preference to passing system
> >> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
> >> +feature, then the driver MUST use the offered buffer and never pass system
> >> +memory addresses to the device.
> >> +
> >>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
> >>  
> >>  The VIRTIO_PCI_CAP_PCI_CFG capability
> >> -- 
> >> 2.49.0
> >>
> >
> >
> > So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
> > like this, performance will suffer. But if you do normal WB, since device
> and this even possibly can cause TLB thrashing.... which is a worse case.
> 
> Thanks
> Zhu Lingshan


Hmm which TLB? I don't get it.

> > accesses do not go on the bus, they do not get synchronized with driver
> > writes and there's really no way to synchronize them.
> >
> > First, this needs to be addressed.
> >
> > In this age of accelerators for everything, building pci based
> > interfaces that can't be efficiently accelerated seems shortsighted ...
> >
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Zhu Lingshan 1 month, 1 week ago
On 4/3/2025 3:37 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 03, 2025 at 03:36:04PM +0800, Zhu Lingshan wrote:
>> On 4/3/2025 3:27 PM, Michael S. Tsirkin wrote:
>>> On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>
>>>> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
>>>> buffer similar to the existing `restricted-dma-pool` device-tree feature.
>>>>
>>>> The difference is that this is per-device; each device needs to have its
>>>> own. Perhaps we should add a UUID to the capability, and have a way for
>>>> a device to not *provide* its own buffer, but just to reference the UUID
>>>> of a buffer elsewhere?
>>>>
>>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>>> ---
>>>>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
>>>>  1 file changed, 33 insertions(+)
>>>>
>>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>>> index a5c6719..23e0d57 100644
>>>> --- a/transport-pci.tex
>>>> +++ b/transport-pci.tex
>>>> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>>>  \item ISR Status
>>>>  \item Device-specific configuration (optional)
>>>>  \item PCI configuration access
>>>> +\item SWIOTLB bounce buffer
>>>>  \end{itemize}
>>>>  
>>>>  Each structure can be mapped by a Base Address register (BAR) belonging to
>>>> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>>>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>>>>  /* Vendor-specific data */
>>>>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
>>>> +/* Software IOTLB bounce buffer */
>>>> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>>>>  \end{lstlisting}
>>>>  
>>>>          Any other value is reserved for future use.
>>>> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>>>>  The driver MUST qualify the \field{vendor_id} before
>>>>  interpreting or writing into the Vendor data capability.
>>>>  
>>>> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>> +Software IOTLB bounce buffer capability}
>>>> +
>>>> +The optional Software IOTLB bounce buffer capability allows the
>>>> +device to provide a memory region which can be used by the driver
>>>> +driver for bounce buffering. This allows a device on the PCI
>>>> +transport to operate without DMA access to system memory addresses.
>>>> +
>>>> +The Software IOTLB region is referenced by the
>>>> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
>>>> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
>>>> +capability, if negotiated.
>>>> +
>>>> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>> +Software IOTLB bounce buffer capability}
>>>> +
>>>> +Devices which present the Software IOTLB bounce buffer capability
>>>> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
>>>> +
>>>> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>> +Software IOTLB bounce buffer capability}
>>>> +
>>>> +The driver SHOULD use the offered buffer in preference to passing system
>>>> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
>>>> +feature, then the driver MUST use the offered buffer and never pass system
>>>> +memory addresses to the device.
>>>> +
>>>>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>>>>  
>>>>  The VIRTIO_PCI_CAP_PCI_CFG capability
>>>> -- 
>>>> 2.49.0
>>>>
>>>
>>> So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
>>> like this, performance will suffer. But if you do normal WB, since device
>> and this even possibly can cause TLB thrashing.... which is a worse case.
>>
>> Thanks
>> Zhu Lingshan
>
> Hmm which TLB? I don't get it.
CPU TLB, because a device side bounce buffer design requires mapping
device memory to CPU address space, so that CPU to help copy data,
and causing a more frequent TLB switch.

TLB thrashing will occur when many devices doing DMA through
the device side bounce buffer, or scattered DMA.

If the bounce buffer resides in the hypervisor, for example QEMU,
then TLB switch while QEMU process context switch which already occur all the time.

Thanks
Zhu Lingshan
>
>>> accesses do not go on the bus, they do not get synchronized with driver
>>> writes and there's really no way to synchronize them.
>>>
>>> First, this needs to be addressed.
>>>
>>> In this age of accelerators for everything, building pci based
>>> interfaces that can't be efficiently accelerated seems shortsighted ...
>>>
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Michael S. Tsirkin 1 month, 1 week ago
On Thu, Apr 03, 2025 at 04:12:20PM +0800, Zhu Lingshan wrote:
> On 4/3/2025 3:37 PM, Michael S. Tsirkin wrote:
> > On Thu, Apr 03, 2025 at 03:36:04PM +0800, Zhu Lingshan wrote:
> >> On 4/3/2025 3:27 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
> >>>> From: David Woodhouse <dwmw@amazon.co.uk>
> >>>>
> >>>> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
> >>>> buffer similar to the existing `restricted-dma-pool` device-tree feature.
> >>>>
> >>>> The difference is that this is per-device; each device needs to have its
> >>>> own. Perhaps we should add a UUID to the capability, and have a way for
> >>>> a device to not *provide* its own buffer, but just to reference the UUID
> >>>> of a buffer elsewhere?
> >>>>
> >>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> >>>> ---
> >>>>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 33 insertions(+)
> >>>>
> >>>> diff --git a/transport-pci.tex b/transport-pci.tex
> >>>> index a5c6719..23e0d57 100644
> >>>> --- a/transport-pci.tex
> >>>> +++ b/transport-pci.tex
> >>>> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >>>>  \item ISR Status
> >>>>  \item Device-specific configuration (optional)
> >>>>  \item PCI configuration access
> >>>> +\item SWIOTLB bounce buffer
> >>>>  \end{itemize}
> >>>>  
> >>>>  Each structure can be mapped by a Base Address register (BAR) belonging to
> >>>> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
> >>>>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> >>>>  /* Vendor-specific data */
> >>>>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> >>>> +/* Software IOTLB bounce buffer */
> >>>> +#define VIRTIO_PCI_CAP_SWIOTLB           10
> >>>>  \end{lstlisting}
> >>>>  
> >>>>          Any other value is reserved for future use.
> >>>> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> >>>>  The driver MUST qualify the \field{vendor_id} before
> >>>>  interpreting or writing into the Vendor data capability.
> >>>>  
> >>>> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> >>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >>>> +Software IOTLB bounce buffer capability}
> >>>> +
> >>>> +The optional Software IOTLB bounce buffer capability allows the
> >>>> +device to provide a memory region which can be used by the driver
> >>>> +driver for bounce buffering. This allows a device on the PCI
> >>>> +transport to operate without DMA access to system memory addresses.
> >>>> +
> >>>> +The Software IOTLB region is referenced by the
> >>>> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> >>>> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> >>>> +capability, if negotiated.
> >>>> +
> >>>> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> >>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >>>> +Software IOTLB bounce buffer capability}
> >>>> +
> >>>> +Devices which present the Software IOTLB bounce buffer capability
> >>>> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> >>>> +
> >>>> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> >>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> >>>> +Software IOTLB bounce buffer capability}
> >>>> +
> >>>> +The driver SHOULD use the offered buffer in preference to passing system
> >>>> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
> >>>> +feature, then the driver MUST use the offered buffer and never pass system
> >>>> +memory addresses to the device.
> >>>> +
> >>>>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
> >>>>  
> >>>>  The VIRTIO_PCI_CAP_PCI_CFG capability
> >>>> -- 
> >>>> 2.49.0
> >>>>
> >>>
> >>> So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
> >>> like this, performance will suffer. But if you do normal WB, since device
> >> and this even possibly can cause TLB thrashing.... which is a worse case.
> >>
> >> Thanks
> >> Zhu Lingshan
> >
> > Hmm which TLB? I don't get it.
> CPU TLB, because a device side bounce buffer design requires mapping
> device memory to CPU address space, so that CPU to help copy data,
> and causing a more frequent TLB switch.

Lost me here. It's mapped, why switch?

> TLB thrashing will occur when many devices doing DMA through
> the device side bounce buffer, or scattered DMA.

Yea I don't think this idea even works. Each device can only use
its own swiotlb.

> If the bounce buffer resides in the hypervisor, for example QEMU,
> then TLB switch while QEMU process context switch which already occur all the time.
> 
> Thanks
> Zhu Lingshan
> >
> >>> accesses do not go on the bus, they do not get synchronized with driver
> >>> writes and there's really no way to synchronize them.
> >>>
> >>> First, this needs to be addressed.
> >>>
> >>> In this age of accelerators for everything, building pci based
> >>> interfaces that can't be efficiently accelerated seems shortsighted ...
> >>>
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Zhu Lingshan 1 month, 1 week ago
On 4/3/2025 4:16 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 03, 2025 at 04:12:20PM +0800, Zhu Lingshan wrote:
>> On 4/3/2025 3:37 PM, Michael S. Tsirkin wrote:
>>> On Thu, Apr 03, 2025 at 03:36:04PM +0800, Zhu Lingshan wrote:
>>>> On 4/3/2025 3:27 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
>>>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>>>
>>>>>> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
>>>>>> buffer similar to the existing `restricted-dma-pool` device-tree feature.
>>>>>>
>>>>>> The difference is that this is per-device; each device needs to have its
>>>>>> own. Perhaps we should add a UUID to the capability, and have a way for
>>>>>> a device to not *provide* its own buffer, but just to reference the UUID
>>>>>> of a buffer elsewhere?
>>>>>>
>>>>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>>>>> ---
>>>>>>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
>>>>>>  1 file changed, 33 insertions(+)
>>>>>>
>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>>>>> index a5c6719..23e0d57 100644
>>>>>> --- a/transport-pci.tex
>>>>>> +++ b/transport-pci.tex
>>>>>> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>>>>>  \item ISR Status
>>>>>>  \item Device-specific configuration (optional)
>>>>>>  \item PCI configuration access
>>>>>> +\item SWIOTLB bounce buffer
>>>>>>  \end{itemize}
>>>>>>  
>>>>>>  Each structure can be mapped by a Base Address register (BAR) belonging to
>>>>>> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>>>>>>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>>>>>>  /* Vendor-specific data */
>>>>>>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
>>>>>> +/* Software IOTLB bounce buffer */
>>>>>> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>>>>>>  \end{lstlisting}
>>>>>>  
>>>>>>          Any other value is reserved for future use.
>>>>>> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>>>>>>  The driver MUST qualify the \field{vendor_id} before
>>>>>>  interpreting or writing into the Vendor data capability.
>>>>>>  
>>>>>> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
>>>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>>>> +Software IOTLB bounce buffer capability}
>>>>>> +
>>>>>> +The optional Software IOTLB bounce buffer capability allows the
>>>>>> +device to provide a memory region which can be used by the driver
>>>>>> +driver for bounce buffering. This allows a device on the PCI
>>>>>> +transport to operate without DMA access to system memory addresses.
>>>>>> +
>>>>>> +The Software IOTLB region is referenced by the
>>>>>> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
>>>>>> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
>>>>>> +capability, if negotiated.
>>>>>> +
>>>>>> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>>>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>>>> +Software IOTLB bounce buffer capability}
>>>>>> +
>>>>>> +Devices which present the Software IOTLB bounce buffer capability
>>>>>> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
>>>>>> +
>>>>>> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
>>>>>> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
>>>>>> +Software IOTLB bounce buffer capability}
>>>>>> +
>>>>>> +The driver SHOULD use the offered buffer in preference to passing system
>>>>>> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
>>>>>> +feature, then the driver MUST use the offered buffer and never pass system
>>>>>> +memory addresses to the device.
>>>>>> +
>>>>>>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>>>>>>  
>>>>>>  The VIRTIO_PCI_CAP_PCI_CFG capability
>>>>>> -- 
>>>>>> 2.49.0
>>>>>>
>>>>> So on the PCI option. The normal mapping (ioremap) for BAR is uncached. If done
>>>>> like this, performance will suffer. But if you do normal WB, since device
>>>> and this even possibly can cause TLB thrashing.... which is a worse case.
>>>>
>>>> Thanks
>>>> Zhu Lingshan
>>> Hmm which TLB? I don't get it.
>> CPU TLB, because a device side bounce buffer design requires mapping
>> device memory to CPU address space, so that CPU to help copy data,
>> and causing a more frequent TLB switch.
> Lost me here. It's mapped, why switch?
Because the number of TLB entries is quite limited.
But never mind, this is not a key topic of this discussion.

Thanks
Zhu Lingshan
>
>> TLB thrashing will occur when many devices doing DMA through
>> the device side bounce buffer, or scattered DMA.
> Yea I don't think this idea even works. Each device can only use
> its own swiotlb.
>
>> If the bounce buffer resides in the hypervisor, for example QEMU,
>> then TLB switch while QEMU process context switch which already occur all the time.
>>
>> Thanks
>> Zhu Lingshan
>>>>> accesses do not go on the bus, they do not get synchronized with driver
>>>>> writes and there's really no way to synchronize them.
>>>>>
>>>>> First, this needs to be addressed.
>>>>>
>>>>> In this age of accelerators for everything, building pci based
>>>>> interfaces that can't be efficiently accelerated seems shortsighted ...
>>>>>
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by Michael S. Tsirkin 1 month, 1 week ago
On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
> buffer similar to the existing `restricted-dma-pool` device-tree feature.
> 
> The difference is that this is per-device; each device needs to have its
> own. Perhaps we should add a UUID to the capability, and have a way for
> a device to not *provide* its own buffer, but just to reference the UUID
> of a buffer elsewhere?

Such uuid appoach will be really messy with physical devices.
Also messy with confidential since config space is not encrypted.
Really, if you want something complex like this, just use virtio-iommu.
It does not require parsing page tables or anything complex like that.
The attraction of the same BAR proposal is that it is, at least, simple.


> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  transport-pci.tex | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..23e0d57 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  \item ISR Status
>  \item Device-specific configuration (optional)
>  \item PCI configuration access
> +\item SWIOTLB bounce buffer
>  \end{itemize}
>  
>  Each structure can be mapped by a Base Address register (BAR) belonging to
> @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
>  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
>  /* Vendor-specific data */
>  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> +/* Software IOTLB bounce buffer */
> +#define VIRTIO_PCI_CAP_SWIOTLB           10
>  \end{lstlisting}
>  
>          Any other value is reserved for future use.
> @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
>  The driver MUST qualify the \field{vendor_id} before
>  interpreting or writing into the Vendor data capability.
>  
> +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The optional Software IOTLB bounce buffer capability allows the
> +device to provide a memory region which can be used by the driver
> +driver for bounce buffering. This allows a device on the PCI
> +transport to operate without DMA access to system memory addresses.
> +
> +The Software IOTLB region is referenced by the
> +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> +capability, if negotiated.
> +
> +\devicenormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +Devices which present the Software IOTLB bounce buffer capability
> +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> +
> +\drivernormative{\paragraph}{Software IOTLB bounce buffer capability}{Virtio
> +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> +Software IOTLB bounce buffer capability}
> +
> +The driver SHOULD use the offered buffer in preference to passing system
> +memory addresses to the device. If the driver accepts the VIRTIO_F_SWIOTLB
> +feature, then the driver MUST use the offered buffer and never pass system
> +memory addresses to the device.
> +
>  \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
>  
>  The VIRTIO_PCI_CAP_PCI_CFG capability
> -- 
> 2.49.0
>
Re: [RFC PATCH 3/3] transport-pci: Add SWIOTLB bounce buffer capability
Posted by David Woodhouse 1 month, 1 week ago
On Wed, 2025-04-02 at 10:58 -0400, Michael S. Tsirkin wrote:
> On Wed, Apr 02, 2025 at 12:04:47PM +0100, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > Add a VIRTIO_PCI_CAP_SWIOTLB capability which advertises a SWIOTLB bounce
> > buffer similar to the existing `restricted-dma-pool` device-tree feature.
> > 
> > The difference is that this is per-device; each device needs to have its
> > own. Perhaps we should add a UUID to the capability, and have a way for
> > a device to not *provide* its own buffer, but just to reference the UUID
> > of a buffer elsewhere?
> 
> Such uuid appoach will be really messy with physical devices.

Yes, although it could work with multifunction devices all sharing a
SWIOTLB buffer that just *one* of the functions provides. Although in
that case, it's simpler just to add a function number as well as the
existing BAR#/offset/length of the virtio_pci_cap.

> Also messy with confidential since config space is not encrypted.

It's messy on the guest side too, as both provider and consumers of the
same SWIOTLB buffer have to be in the same IOMMU group and can't be
shared to different (nested) guests or userspace VFIO users.

> Really, if you want something complex like this, just use virtio-iommu.
> It does not require parsing page tables or anything complex like that.
> The attraction of the same BAR proposal is that it is, at least, simple.

Agreed. I think the simple option of a device having its *own* SWIOTLB
region makes most sense, without overengineering a sharing model.