Now that the size of "struct kvm" is less than 2KiB, switch back to using
kzalloc() to allocate the VM structures. Add compile-time assertions in
vendor code to ensure the size is an order-0 allocation, i.e. to prevent
unknowingly letting the size balloon in the future.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e523d7d8a107..6c7fd7db6f11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1940,7 +1940,7 @@ void kvm_x86_vendor_exit(void);
#define __KVM_HAVE_ARCH_VM_ALLOC
static inline struct kvm *kvm_arch_alloc_vm(void)
{
- return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ return kzalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT);
}
#define __KVM_HAVE_ARCH_VM_FREE
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8abeab91d329..589adc5f92e0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5536,6 +5536,7 @@ static int __init svm_init(void)
if (r)
goto err_kvm_init;
+ BUILD_BUG_ON(get_order(sizeof(struct kvm_svm) != 0));
return 0;
err_kvm_init:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b70ed72c1783..01264842bf45 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8755,6 +8755,7 @@ static int __init vmx_init(void)
if (r)
goto err_kvm_init;
+ BUILD_BUG_ON(get_order(sizeof(struct kvm_vmx) != 0));
return 0;
err_kvm_init:
--
2.49.0.472.ge94155a9ec-goog
On 2025-04-01 08:57:13, Sean Christopherson wrote:
> Now that the size of "struct kvm" is less than 2KiB, switch back to using
> kzalloc() to allocate the VM structures. Add compile-time assertions in
> vendor code to ensure the size is an order-0 allocation, i.e. to prevent
> unknowingly letting the size balloon in the future.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 1 +
> 3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e523d7d8a107..6c7fd7db6f11 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1940,7 +1940,7 @@ void kvm_x86_vendor_exit(void);
> #define __KVM_HAVE_ARCH_VM_ALLOC
> static inline struct kvm *kvm_arch_alloc_vm(void)
> {
> - return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + return kzalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT);
> }
>
> #define __KVM_HAVE_ARCH_VM_FREE
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8abeab91d329..589adc5f92e0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5536,6 +5536,7 @@ static int __init svm_init(void)
> if (r)
> goto err_kvm_init;
>
> + BUILD_BUG_ON(get_order(sizeof(struct kvm_svm) != 0));
There is a typo here. It is checking sizeof(struct kvm_svm) != 0, instead
of checking get_order(...) != 0.
> return 0;
>
> err_kvm_init:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b70ed72c1783..01264842bf45 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8755,6 +8755,7 @@ static int __init vmx_init(void)
> if (r)
> goto err_kvm_init;
>
> + BUILD_BUG_ON(get_order(sizeof(struct kvm_vmx) != 0));
Same as above.
> return 0;
>
> err_kvm_init:
> --
> 2.49.0.472.ge94155a9ec-goog
>
On 2025-04-16 11:24:37, Vipin Sharma wrote:
> On 2025-04-01 08:57:13, Sean Christopherson wrote:
> >
> > + BUILD_BUG_ON(get_order(sizeof(struct kvm_svm) != 0));
>
> There is a typo here. It is checking sizeof(struct kvm_svm) != 0, instead
> of checking get_order(...) != 0.
>
> > return 0;
> >
> > err_kvm_init:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index b70ed72c1783..01264842bf45 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -8755,6 +8755,7 @@ static int __init vmx_init(void)
> > if (r)
> > goto err_kvm_init;
> >
> > + BUILD_BUG_ON(get_order(sizeof(struct kvm_vmx) != 0));
>
> Same as above.
>
After fixing the typo build is failing.
Checked via pahole, sizes of struct have reduced but still not under 4k.
After applying the patch:
struct kvm{} - 4104
struct kvm_svm{} - 4320
struct kvm_vmx{} - 4128
Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
under kvm_[vmx|svm] and its children are enabled. Won't that be an
issue?
On Wed, Apr 16, 2025, Vipin Sharma wrote:
> On 2025-04-16 11:24:37, Vipin Sharma wrote:
> > On 2025-04-01 08:57:13, Sean Christopherson wrote:
> > >
> > > + BUILD_BUG_ON(get_order(sizeof(struct kvm_svm) != 0));
> >
> > There is a typo here. It is checking sizeof(struct kvm_svm) != 0, instead
> > of checking get_order(...) != 0.
> >
> > > return 0;
> > >
> > > err_kvm_init:
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index b70ed72c1783..01264842bf45 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8755,6 +8755,7 @@ static int __init vmx_init(void)
> > > if (r)
> > > goto err_kvm_init;
> > >
> > > + BUILD_BUG_ON(get_order(sizeof(struct kvm_vmx) != 0));
> >
> > Same as above.
Ugh. That's what I get for violating the kernel's "don't check for '0'" rule
(I thought it would make the code more understandable). Bad me.
> After fixing the typo build is failing.
>
> Checked via pahole, sizes of struct have reduced but still not under 4k.
> After applying the patch:
>
> struct kvm{} - 4104
> struct kvm_svm{} - 4320
> struct kvm_vmx{} - 4128
>
> Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
> under kvm_[vmx|svm] and its children are enabled. Won't that be an
> issue?
That's what build bots (and to a lesser extent, maintainers) are for. An individual
developer might miss a particular config, but the build bots that run allyesconfig
will very quickly detect the issue, and then we fix it.
I also build what is effectively an "allkvmconfig" before officially applying
anything, so in general things like this shouldn't even make it to the bots.
On Wed, 2025-04-16 at 12:57 -0700, Sean Christopherson wrote:
> On Wed, Apr 16, 2025, Vipin Sharma wrote:
> > On 2025-04-16 11:24:37, Vipin Sharma wrote:
> > > On 2025-04-01 08:57:13, Sean Christopherson wrote:
> > > >
> > > > + BUILD_BUG_ON(get_order(sizeof(struct kvm_svm) != 0));
> > >
> > > There is a typo here. It is checking sizeof(struct kvm_svm) != 0, instead
> > > of checking get_order(...) != 0.
> > >
> > > > return 0;
> > > >
> > > > err_kvm_init:
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index b70ed72c1783..01264842bf45 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -8755,6 +8755,7 @@ static int __init vmx_init(void)
> > > > if (r)
> > > > goto err_kvm_init;
> > > >
> > > > + BUILD_BUG_ON(get_order(sizeof(struct kvm_vmx) != 0));
> > >
> > > Same as above.
>
> Ugh. That's what I get for violating the kernel's "don't check for '0'" rule
> (I thought it would make the code more understandable). Bad me.
>
> > After fixing the typo build is failing.
> >
> > Checked via pahole, sizes of struct have reduced but still not under 4k.
> > After applying the patch:
> >
> > struct kvm{} - 4104
> > struct kvm_svm{} - 4320
> > struct kvm_vmx{} - 4128
> >
> > Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
> > under kvm_[vmx|svm] and its children are enabled. Won't that be an
> > issue?
>
> That's what build bots (and to a lesser extent, maintainers) are for. An individual
> developer might miss a particular config, but the build bots that run allyesconfig
> will very quickly detect the issue, and then we fix it.
>
> I also build what is effectively an "allkvmconfig" before officially applying
> anything, so in general things like this shouldn't even make it to the bots.
>
Just want to understand the intention here:
What if someday a developer really needs to add some new field(s) to, lets say
'struct kvm_vmx', and that makes the size exceed 4K?
What should the developer do? Is it a hard requirement that the size should
never go beyond 4K? Or, should the assert of order 0 allocation be changed to
the assert of order 1 allocation?
On Tue, Apr 22, 2025, Kai Huang wrote:
> On Wed, 2025-04-16 at 12:57 -0700, Sean Christopherson wrote:
> > On Wed, Apr 16, 2025, Vipin Sharma wrote:
> > > Checked via pahole, sizes of struct have reduced but still not under 4k.
> > > After applying the patch:
> > >
> > > struct kvm{} - 4104
> > > struct kvm_svm{} - 4320
> > > struct kvm_vmx{} - 4128
> > >
> > > Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
> > > under kvm_[vmx|svm] and its children are enabled. Won't that be an
> > > issue?
> >
> > That's what build bots (and to a lesser extent, maintainers) are for. An individual
> > developer might miss a particular config, but the build bots that run allyesconfig
> > will very quickly detect the issue, and then we fix it.
> >
> > I also build what is effectively an "allkvmconfig" before officially applying
> > anything, so in general things like this shouldn't even make it to the bots.
> >
>
> Just want to understand the intention here:
>
> What if someday a developer really needs to add some new field(s) to, lets say
> 'struct kvm_vmx', and that makes the size exceed 4K?
If it helps, here's the changelog I plan on posting for v3:
Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical
allocation before falling back to __vmalloc(), to avoid the overhead of
establishing the virtual mappings. The SVM and VMX (and TDX) structures
are now just above 4096 bytes, i.e. are order-1 allocations, and will
likely remain that way for quite some time.
Add compile-time assertions in vendor code to ensure the size is an
order-0 or order-1 allocation, i.e. to prevent unknowingly letting the
size balloon in the future. There's nothing fundamentally wrong with a
larger kvm_{svm,vmx,tdx} size, but given that the size is barely above
4096 after 18+ years of existence, exceeding exceed 8192 bytes would be
quite notable.
> What should the developer do? Is it a hard requirement that the size should
> never go beyond 4K? Or, should the assert of order 0 allocation be changed to
> the assert of order 1 allocation?
It depends. Now that Vipin has corrected my math, the assertion will be that the
VM struct is order-1 or smaller, i.e. <= 8KiB. That gives us a _lot_ of room to
grow. E.g. KVM has existed for ~18 years and is barely about 4KiB, so for organic
growth (small additions here and there), I don't expect to hit the 8KiB limit in
the next decade (famous last words). And the memory landscape will likely be
quite different 10+ years from now, i.e. the assertion may be completely unnecessary
by the time it fires.
What I'm most interested in detecting and prevent is things like mmu_page_hash,
where a massive field is embedded in struct kvm for an *optional* feature. I.e.
if a new feature adds a massive field, then it should probably be placed in a
separate, dynamically allocated structure. And for those, it should be quite
obvious that a separate allocation is the way to go.
On Wed, 2025-04-23 at 10:07 -0700, Sean Christopherson wrote:
> On Tue, Apr 22, 2025, Kai Huang wrote:
> > On Wed, 2025-04-16 at 12:57 -0700, Sean Christopherson wrote:
> > > On Wed, Apr 16, 2025, Vipin Sharma wrote:
> > > > Checked via pahole, sizes of struct have reduced but still not under 4k.
> > > > After applying the patch:
> > > >
> > > > struct kvm{} - 4104
> > > > struct kvm_svm{} - 4320
> > > > struct kvm_vmx{} - 4128
> > > >
> > > > Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
> > > > under kvm_[vmx|svm] and its children are enabled. Won't that be an
> > > > issue?
> > >
> > > That's what build bots (and to a lesser extent, maintainers) are for. An individual
> > > developer might miss a particular config, but the build bots that run allyesconfig
> > > will very quickly detect the issue, and then we fix it.
> > >
> > > I also build what is effectively an "allkvmconfig" before officially applying
> > > anything, so in general things like this shouldn't even make it to the bots.
> > >
> >
> > Just want to understand the intention here:
> >
> > What if someday a developer really needs to add some new field(s) to, lets say
> > 'struct kvm_vmx', and that makes the size exceed 4K?
>
> If it helps, here's the changelog I plan on posting for v3:
>
> Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical
> allocation before falling back to __vmalloc(), to avoid the overhead of
> establishing the virtual mappings. The SVM and VMX (and TDX) structures
> are now just above 4096 bytes, i.e. are order-1 allocations, and will
> likely remain that way for quite some time.
>
> Add compile-time assertions in vendor code to ensure the size is an
> order-0 or order-1 allocation, i.e. to prevent unknowingly letting the
> size balloon in the future. There's nothing fundamentally wrong with a
> larger kvm_{svm,vmx,tdx} size, but given that the size is barely above
> 4096 after 18+ years of existence, exceeding exceed 8192 bytes would be
> quite notable.
Yeah looks reasonable.
Nit: I am not quite following "falling back to __vmalloc()" part. We are
replacing __vmalloc() with kzalloc() AFAICT, therefore there should be no
"falling back"?
>
>
> > What should the developer do? Is it a hard requirement that the size should
> > never go beyond 4K? Or, should the assert of order 0 allocation be changed to
> > the assert of order 1 allocation?
>
> It depends. Now that Vipin has corrected my math, the assertion will be that the
> VM struct is order-1 or smaller, i.e. <= 8KiB. That gives us a _lot_ of room to
> grow. E.g. KVM has existed for ~18 years and is barely about 4KiB, so for organic
> growth (small additions here and there), I don't expect to hit the 8KiB limit in
> the next decade (famous last words). And the memory landscape will likely be
> quite different 10+ years from now, i.e. the assertion may be completely unnecessary
> by the time it fires.
>
> What I'm most interested in detecting and prevent is things like mmu_page_hash,
> where a massive field is embedded in struct kvm for an *optional* feature. I.e.
> if a new feature adds a massive field, then it should probably be placed in a
> separate, dynamically allocated structure. And for those, it should be quite
> obvious that a separate allocation is the way to go.
Agreed. Thanks for explaining.
On Wed, Apr 23, 2025, Kai Huang wrote:
> On Wed, 2025-04-23 at 10:07 -0700, Sean Christopherson wrote:
> > On Tue, Apr 22, 2025, Kai Huang wrote:
> > > On Wed, 2025-04-16 at 12:57 -0700, Sean Christopherson wrote:
> > > > On Wed, Apr 16, 2025, Vipin Sharma wrote:
> > > > > Checked via pahole, sizes of struct have reduced but still not under 4k.
> > > > > After applying the patch:
> > > > >
> > > > > struct kvm{} - 4104
> > > > > struct kvm_svm{} - 4320
> > > > > struct kvm_vmx{} - 4128
> > > > >
> > > > > Also, this BUILD_BUG_ON() might not be reliable unless all of the ifdefs
> > > > > under kvm_[vmx|svm] and its children are enabled. Won't that be an
> > > > > issue?
> > > >
> > > > That's what build bots (and to a lesser extent, maintainers) are for. An individual
> > > > developer might miss a particular config, but the build bots that run allyesconfig
> > > > will very quickly detect the issue, and then we fix it.
> > > >
> > > > I also build what is effectively an "allkvmconfig" before officially applying
> > > > anything, so in general things like this shouldn't even make it to the bots.
> > > >
> > >
> > > Just want to understand the intention here:
> > >
> > > What if someday a developer really needs to add some new field(s) to, lets say
> > > 'struct kvm_vmx', and that makes the size exceed 4K?
> >
> > If it helps, here's the changelog I plan on posting for v3:
> >
> > Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical
> > allocation before falling back to __vmalloc(), to avoid the overhead of
> > establishing the virtual mappings. The SVM and VMX (and TDX) structures
> > are now just above 4096 bytes, i.e. are order-1 allocations, and will
> > likely remain that way for quite some time.
> >
> > Add compile-time assertions in vendor code to ensure the size is an
> > order-0 or order-1 allocation, i.e. to prevent unknowingly letting the
> > size balloon in the future. There's nothing fundamentally wrong with a
> > larger kvm_{svm,vmx,tdx} size, but given that the size is barely above
> > 4096 after 18+ years of existence, exceeding exceed 8192 bytes would be
> > quite notable.
>
> Yeah looks reasonable.
>
> Nit: I am not quite following "falling back to __vmalloc()" part. We are
> replacing __vmalloc() with kzalloc() AFAICT, therefore there should be no
> "falling back"?
Correct, not in this version. In the next version, my plan is to use kvzalloc()
(though honestly, I'm not sure that's worth doing; it'll be an order-1 allocation,
and if that fails...).
© 2016 - 2026 Red Hat, Inc.