[PATCH v2] util/oslib-posix: increase memprealloc thread count to 32

Jon Kohler posted 1 patch 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251106163143.4185468-1-jon@nutanix.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
util/oslib-posix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] util/oslib-posix: increase memprealloc thread count to 32
Posted by Jon Kohler 1 week ago
Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last
touched in 2017 [1] and, since then, physical machine sizes and VMs
therein have continue to get even bigger, both on average and on the
extremes.

For very large VMs, using 16 threads to preallocate memory can be a
non-trivial bottleneck during VM start-up and migration. Increasing
this limit to 32 threads reduces the time taken for these operations.

Test results from quad socket Intel 8490H (4x 60 cores) show a fairly
linear gain of 50% with the 2x thread count increase.

---------------------------------------------
Idle Guest w/ 2M HugePages   | Start-up time
---------------------------------------------
240 vCPU, 7.5TB (16 threads) | 2m41.955s
---------------------------------------------
240 vCPU, 7.5TB (32 threads) | 1m19.404s
---------------------------------------------

Note: Going above 32 threads appears to have diminishing returns at
the point where the memory bandwidth and context switching costs
appear to be a limiting factor to linear scaling. For posterity, on
the same system as above:
- 32 threads: 1m19s
- 48 threads: 1m4s
- 64 threads: 59s
- 240 threads: 50s

Additional thread counts also get less interesting as the amount of
memory is to be preallocated is smaller. Putting that all together,
32 threads appears to be a sane number with a solid speedup on fairly
modern hardware. To go faster, we'd either need to improve the hardware
(CPU/memory) itself or improve clear_pages_*() on the kernel side to
be more efficient.

[1] 1e356fc14bea ("mem-prealloc: reduce large guest start-up and migration time.")

Signed-off-by: Jon Kohler <jon@nutanix.com>
---
 util/oslib-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 3c14b72665..dc001da66d 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -61,7 +61,7 @@
 #include "qemu/memalign.h"
 #include "qemu/mmap-alloc.h"
 
-#define MAX_MEM_PREALLOC_THREAD_COUNT 16
+#define MAX_MEM_PREALLOC_THREAD_COUNT 32
 
 struct MemsetThread;
 
-- 
2.43.0
Re: [PATCH v2] util/oslib-posix: increase memprealloc thread count to 32
Posted by Daniel P. Berrangé 1 week ago
On Thu, Nov 06, 2025 at 09:31:43AM -0700, Jon Kohler wrote:
> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last
> touched in 2017 [1] and, since then, physical machine sizes and VMs
> therein have continue to get even bigger, both on average and on the
> extremes.
> 
> For very large VMs, using 16 threads to preallocate memory can be a
> non-trivial bottleneck during VM start-up and migration. Increasing
> this limit to 32 threads reduces the time taken for these operations.
> 
> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly
> linear gain of 50% with the 2x thread count increase.
> 
> ---------------------------------------------
> Idle Guest w/ 2M HugePages   | Start-up time
> ---------------------------------------------
> 240 vCPU, 7.5TB (16 threads) | 2m41.955s
> ---------------------------------------------
> 240 vCPU, 7.5TB (32 threads) | 1m19.404s
> ---------------------------------------------
> 
> Note: Going above 32 threads appears to have diminishing returns at
> the point where the memory bandwidth and context switching costs
> appear to be a limiting factor to linear scaling. For posterity, on
> the same system as above:
> - 32 threads: 1m19s
> - 48 threads: 1m4s
> - 64 threads: 59s
> - 240 threads: 50s
> 
> Additional thread counts also get less interesting as the amount of
> memory is to be preallocated is smaller. Putting that all together,
> 32 threads appears to be a sane number with a solid speedup on fairly
> modern hardware. To go faster, we'd either need to improve the hardware
> (CPU/memory) itself or improve clear_pages_*() on the kernel side to
> be more efficient.
> 
> [1] 1e356fc14bea ("mem-prealloc: reduce large guest start-up and migration time.")
> 
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> ---
>  util/oslib-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 3c14b72665..dc001da66d 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -61,7 +61,7 @@
>  #include "qemu/memalign.h"
>  #include "qemu/mmap-alloc.h"
>  
> -#define MAX_MEM_PREALLOC_THREAD_COUNT 16
> +#define MAX_MEM_PREALLOC_THREAD_COUNT 32
>  
>  struct MemsetThread;
>  
> -- 
> 2.43.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|