[Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions

Peter Maydell posted 1 patch 5 years ago
Test docker-clang@ubuntu passed
Test asan passed
Test checkpatch passed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190318112938.8298-1-peter.maydell@linaro.org
include/qemu/bswap.h | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
[Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions
Posted by Peter Maydell 5 years ago
In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
to perform a load or store to a pointer which might not be aligned
for the size of the type. We rely on the compiler to optimize this
memcpy() into an efficient load or store instruction where possible.
This is required for good performance, but at the moment it is also
required for correct operation, because some users of these functions
require that the access is atomic if the pointer is aligned, which
will only be the case if the compiler has optimized out the memcpy().
(The particular example where we discovered this is the virtio
vring_avail_idx() which calls virtio_lduw_phys_cached() which
eventually ends up calling lduw_he_p().)

Unfortunately some compile environments, such as the fortify-source
setup used in Alpine Linux, define memcpy() to a wrapper function
in a way that inhibits this compiler optimization.

The correct long-term fix here is to add a set of functions for
doing atomic accesses into AddressSpaces (and to other relevant
families of accessor functions like the virtio_*_phys_cached()
ones), and make sure that callsites which want atomic behaviour
use the correct functions.

In the meantime, switch to using __builtin_memcpy() in the
bswap.h accessor functions. This will make us robust against things
like this fortify library in the short term. In the longer term
it will mean that we don't end up with these functions being really
badly-performing even if the semantics of the out-of-line memcpy()
are correct.

Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/qemu/bswap.h | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 5a70f78c0ba..2a9f3fe783e 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -316,51 +316,57 @@ static inline void stb_p(void *ptr, uint8_t v)
     *(uint8_t *)ptr = v;
 }
 
-/* Any compiler worth its salt will turn these memcpy into native unaligned
-   operations.  Thus we don't need to play games with packed attributes, or
-   inline byte-by-byte stores.  */
+/*
+ * Any compiler worth its salt will turn these memcpy into native unaligned
+ * operations.  Thus we don't need to play games with packed attributes, or
+ * inline byte-by-byte stores.
+ * Some compilation environments (eg some fortify-source implementations)
+ * may intercept memcpy() in a way that defeats the compiler optimization,
+ * though, so we use __builtin_memcpy() to give ourselves the best chance
+ * of good performance.
+ */
 
 static inline int lduw_he_p(const void *ptr)
 {
     uint16_t r;
-    memcpy(&r, ptr, sizeof(r));
+    __builtin_memcpy(&r, ptr, sizeof(r));
     return r;
 }
 
 static inline int ldsw_he_p(const void *ptr)
 {
     int16_t r;
-    memcpy(&r, ptr, sizeof(r));
+    __builtin_memcpy(&r, ptr, sizeof(r));
     return r;
 }
 
 static inline void stw_he_p(void *ptr, uint16_t v)
 {
-    memcpy(ptr, &v, sizeof(v));
+    __builtin_memcpy(ptr, &v, sizeof(v));
 }
 
 static inline int ldl_he_p(const void *ptr)
 {
     int32_t r;
-    memcpy(&r, ptr, sizeof(r));
+    __builtin_memcpy(&r, ptr, sizeof(r));
     return r;
 }
 
 static inline void stl_he_p(void *ptr, uint32_t v)
 {
-    memcpy(ptr, &v, sizeof(v));
+    __builtin_memcpy(ptr, &v, sizeof(v));
 }
 
 static inline uint64_t ldq_he_p(const void *ptr)
 {
     uint64_t r;
-    memcpy(&r, ptr, sizeof(r));
+    __builtin_memcpy(&r, ptr, sizeof(r));
     return r;
 }
 
 static inline void stq_he_p(void *ptr, uint64_t v)
 {
-    memcpy(ptr, &v, sizeof(v));
+    __builtin_memcpy(ptr, &v, sizeof(v));
 }
 
 static inline int lduw_le_p(const void *ptr)
-- 
2.20.1


Re: [Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions
Posted by Richard Henderson 5 years ago
On 3/18/19 4:29 AM, Peter Maydell wrote:
> In the meantime, switch to using __builtin_memcpy() in the
> bswap.h accessor functions. This will make us robust against things
> like this fortify library in the short term. In the longer term
> it will mean that we don't end up with these functions being really
> badly-performing even if the semantics of the out-of-line memcpy()
> are correct.
> 
> Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Re: [Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions
Posted by Paolo Bonzini 5 years ago
On 18/03/19 12:29, Peter Maydell wrote:
> In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
> to perform a load or store to a pointer which might not be aligned
> for the size of the type. We rely on the compiler to optimize this
> memcpy() into an efficient load or store instruction where possible.
> This is required for good performance, but at the moment it is also
> required for correct operation, because some users of these functions
> require that the access is atomic if the pointer is aligned, which
> will only be the case if the compiler has optimized out the memcpy().
> (The particular example where we discovered this is the virtio
> vring_avail_idx() which calls virtio_lduw_phys_cached() which
> eventually ends up calling lduw_he_p().)
> 
> Unfortunately some compile environments, such as the fortify-source
> setup used in Alpine Linux, define memcpy() to a wrapper function
> in a way that inhibits this compiler optimization.
> 
> The correct long-term fix here is to add a set of functions for
> doing atomic accesses into AddressSpaces (and to other relevant
> families of accessor functions like the virtio_*_phys_cached()
> ones), and make sure that callsites which want atomic behaviour
> use the correct functions.
> 
> In the meantime, switch to using __builtin_memcpy() in the
> bswap.h accessor functions. This will make us robust against things
> like this fortify library in the short term. In the longer term
> it will mean that we don't end up with these functions being really
> badly-performing even if the semantics of the out-of-line memcpy()
> are correct.
> 
> Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/qemu/bswap.h | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
> index 5a70f78c0ba..2a9f3fe783e 100644
> --- a/include/qemu/bswap.h
> +++ b/include/qemu/bswap.h
> @@ -316,51 +316,57 @@ static inline void stb_p(void *ptr, uint8_t v)
>      *(uint8_t *)ptr = v;
>  }
>  
> -/* Any compiler worth its salt will turn these memcpy into native unaligned
> -   operations.  Thus we don't need to play games with packed attributes, or
> -   inline byte-by-byte stores.  */
> +/*
> + * Any compiler worth its salt will turn these memcpy into native unaligned
> + * operations.  Thus we don't need to play games with packed attributes, or
> + * inline byte-by-byte stores.
> + * Some compilation environments (eg some fortify-source implementations)
> + * may intercept memcpy() in a way that defeats the compiler optimization,
> + * though, so we use __builtin_memcpy() to give ourselves the best chance
> + * of good performance.
> + */
>  
>  static inline int lduw_he_p(const void *ptr)
>  {
>      uint16_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline int ldsw_he_p(const void *ptr)
>  {
>      int16_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stw_he_p(void *ptr, uint16_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline int ldl_he_p(const void *ptr)
>  {
>      int32_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stl_he_p(void *ptr, uint32_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline uint64_t ldq_he_p(const void *ptr)
>  {
>      uint64_t r;
> -    memcpy(&r, ptr, sizeof(r));
> +    __builtin_memcpy(&r, ptr, sizeof(r));
>      return r;
>  }
>  
>  static inline void stq_he_p(void *ptr, uint64_t v)
>  {
> -    memcpy(ptr, &v, sizeof(v));
> +    __builtin_memcpy(ptr, &v, sizeof(v));
>  }
>  
>  static inline int lduw_le_p(const void *ptr)
> 

Queued, thanks.

Paolo

Re: [Qemu-devel] [PATCH for-4.0] include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions
Posted by Peter Maydell 4 years, 11 months ago
On Mon, 18 Mar 2019 at 17:07, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 18/03/19 12:29, Peter Maydell wrote:
> > In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
> > to perform a load or store to a pointer which might not be aligned
> > for the size of the type. We rely on the compiler to optimize this
> > memcpy() into an efficient load or store instruction where possible.
> > This is required for good performance, but at the moment it is also
> > required for correct operation, because some users of these functions
> > require that the access is atomic if the pointer is aligned, which
> > will only be the case if the compiler has optimized out the memcpy().
> > (The particular example where we discovered this is the virtio
> > vring_avail_idx() which calls virtio_lduw_phys_cached() which
> > eventually ends up calling lduw_he_p().)
> >
> > Unfortunately some compile environments, such as the fortify-source
> > setup used in Alpine Linux, define memcpy() to a wrapper function
> > in a way that inhibits this compiler optimization.
> >
> > The correct long-term fix here is to add a set of functions for
> > doing atomic accesses into AddressSpaces (and to other relevant
> > families of accessor functions like the virtio_*_phys_cached()
> > ones), and make sure that callsites which want atomic behaviour
> > use the correct functions.
> >
> > In the meantime, switch to using __builtin_memcpy() in the
> > bswap.h accessor functions. This will make us robust against things
> > like this fortify library in the short term. In the longer term
> > it will mean that we don't end up with these functions being really
> > badly-performing even if the semantics of the out-of-line memcpy()
> > are correct.
> >
> > Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


> Queued, thanks.

Ping? This hasn't hit master yet, and today is the last day
for rc3 which I'm hoping might be the last rc for 4.0...

thanks
-- PMM