[PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory

fanhuang posted 1 patch 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251209093841.2250527-1-FangSheng.Huang@amd.com
Maintainers: Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>, "Michael S. Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
[PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by fanhuang 2 months ago
Hi all,

This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan
for the detailed review.

Changes in v4 (addressing Jonathan's feedback):
- Added architecture check: spm=on now reports error on non-x86 machines
- Simplified return logic in e820_update_entry_type() (return true/false directly)
- Changed 4GB boundary spanning from warn_report to error_report + exit
- Updated QAPI documentation to be architecture-agnostic (removed E820 reference)
- Removed unnecessary comments

Use case:
This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from
host to guest VM, useful for memory reserved for specific PCI devices
(e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved
to the guest and is managed by device drivers rather than the OS memory
allocator.

Example usage:
  -object memory-backend-ram,size=8G,id=m0
  -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
  -numa node,nodeid=0,memdev=m0
  -numa node,nodeid=1,memdev=m1,spm=on

Please review. Thanks!

Best regards,
Jerry Huang

-- 
2.34.1
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Gregory Price 1 month ago
On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on
> 

Should discussion with Jonathan - whatever form this ends up taking, can
we change this from [on,off] to [normal,spm,reserved] and apply the
appropriate types accordingly?

don't know what to name the tag in that case, something like..

memmap_type=[normal,spm,reserved] ?

(not married to this, open to suggestions)

~Gregory
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by David Hildenbrand (Red Hat) 1 month ago
On 1/2/26 17:30, Gregory Price wrote:
> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>>    -numa node,nodeid=0,memdev=m0
>>    -numa node,nodeid=1,memdev=m1,spm=on
>>
> 
> Should discussion with Jonathan - whatever form this ends up taking, can
> we change this from [on,off] to [normal,spm,reserved] and apply the
> appropriate types accordingly?
> 
> don't know what to name the tag in that case, something like..
> 
> memmap_type=[normal,spm,reserved] ?

That looks more extensible indeed.

The semantics would be unchanged compared to spm=on: only applies to 
boot memory. Although, as discussed, mixing and matching types per node 
should be avoided either way.

-- 
Cheers

David
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Huang, FangSheng (Jerry) 1 month ago

On 1/5/2026 11:29 PM, David Hildenbrand (Red Hat) wrote:
> On 1/2/26 17:30, Gregory Price wrote:
>> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>>>    -numa node,nodeid=0,memdev=m0
>>>    -numa node,nodeid=1,memdev=m1,spm=on
>>>
>>
>> Should discussion with Jonathan - whatever form this ends up taking, can
>> we change this from [on,off] to [normal,spm,reserved] and apply the
>> appropriate types accordingly?
>>
>> don't know what to name the tag in that case, something like..
>>
>> memmap_type=[normal,spm,reserved] ?
> 
> That looks more extensible indeed.
> 
> The semantics would be unchanged compared to spm=on: only applies to 
> boot memory. Although, as discussed, mixing and matching types per node 
> should be avoided either way.
> 
Hi Gregory, David,

Thank you for the suggestion on making this more extensible.

I agree that `memmap_type=[normal,spm,reserved]` is a better approach
than the simple boolean `spm=on|off`.

I've analyzed the required changes and will prepare an updated patch
implementing this. However, I need to go through an internal review
process before submitting to the community, which may take some time.

In the meantime, any feedback or suggestions on the design
are welcome.

Best Regards,
Jerry Huang

Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Igor Mammedov 1 month ago
On Tue, 9 Dec 2025 17:38:40 +0800
fanhuang <FangSheng.Huang@amd.com> wrote:

> Hi all,
> 
> This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan
> for the detailed review.
> 
> Changes in v4 (addressing Jonathan's feedback):
> - Added architecture check: spm=on now reports error on non-x86 machines
> - Simplified return logic in e820_update_entry_type() (return true/false directly)
> - Changed 4GB boundary spanning from warn_report to error_report + exit
> - Updated QAPI documentation to be architecture-agnostic (removed E820 reference)
> - Removed unnecessary comments
> 
> Use case:
> This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from
> host to guest VM, useful for memory reserved for specific PCI devices
> (e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved
> to the guest and is managed by device drivers rather than the OS memory
> allocator.
> 
> Example usage:
>   -object memory-backend-ram,size=8G,id=m0
>   -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on

I'm still not fond of 'spm' toggle on numa node itself (even though on AMD hadware sunch memory has 1:1 mapping)
without device model in between.

Can we try following instead:
  * add 'spm' property to DIMM device and disable hotplug on it in such case
  * make E820 enumerate spm/not hotpluggble marked DIMMs.

That will let us later to have mixed memory on the node if such need arises without
breaking QEMU CLI.

> Please review. Thanks!
> 
> Best regards,
> Jerry Huang
>
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Gregory Price 1 month ago
On Fri, Jan 02, 2026 at 02:09:22PM +0100, Igor Mammedov wrote:
> That will let us later to have mixed memory on the node 

We were just discussing strongly-dissuading such a configuration from
a linux perspective, even if it's technically allowed.

If only because it makes reasoning about placement policy on such a node
completely impossible.

~Gregory
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Gregory Price 1 month, 1 week ago
On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
> Example usage:
>   -object memory-backend-ram,size=8G,id=m0
>   -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1,spm=on
> 

Interesting that you added spm= to NUMA rather than the memory backend,
but then in the patch you consume it to apply to the EFI/E820 memory
maps.

Sorry i've missed prior versions, is numa the right place to put this,
considering that the node is not necessarily 100% SPM on a real system?

(in practice it should be, but not technically required to be)

~Gregory
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Huang, FangSheng (Jerry) 1 month, 1 week ago
Hi Gregory,

Thanks for your review and good question!

On 12/30/2025 2:26 AM, Gregory Price wrote:
> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote:
>> Example usage:
>>    -object memory-backend-ram,size=8G,id=m0
>>    -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0
>>    -numa node,nodeid=0,memdev=m0
>>    -numa node,nodeid=1,memdev=m1,spm=on
>>
> 
> Interesting that you added spm= to NUMA rather than the memory backend,
> but then in the patch you consume it to apply to the EFI/E820 memory
> maps.
> 
> Sorry i've missed prior versions, is numa the right place to put this,
> considering that the node is not necessarily 100% SPM on a real system?
> 

The decision to add `spm=` to NUMA rather than the memory backend was 
based on
earlier feedback from David during our initial RFC discussions.

David raised a concern that if we put the spm flag on the memory 
backend, a user
could accidentally pass such a memory backend to DIMM/virtio-mem/boot 
memory,
which would have very undesired side effects.

> (in practice it should be, but not technically required to be)

You're right that on a real system, a NUMA node is not technically 
required to
be 100% SPM. However, in AMD's use case, the entire NUMA node memory 
(backed by
memdev) is intended to be SPM, and this approach provides a cleaner and 
safer
configuration interface.

> 
> ~Gregory

Please let me know if you have further concerns or suggestions.

Best Regards,
Jerry Huang
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Gregory Price 1 month, 1 week ago
On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote:
> Hi Gregory,
> 
> > Sorry i've missed prior versions, is numa the right place to put this,
> > considering that the node is not necessarily 100% SPM on a real system?
> > 
> 
> The decision to add `spm=` to NUMA rather than the memory backend was based
> on
> earlier feedback from David during our initial RFC discussions.
> 
> David raised a concern that if we put the spm flag on the memory backend, a
> user
> could accidentally pass such a memory backend to DIMM/virtio-mem/boot
> memory,
> which would have very undesired side effects.
> 

This makes sense, and in fact I almost wonder if we should actually
encode a warning in linux in general if a signal NUMA node contains
both normal and SPM.  That would help drive consistency between QEMU/KVM
and real platforms from the direction of linux.

> > (in practice it should be, but not technically required to be)
> 
> You're right that on a real system, a NUMA node is not technically required
> to
> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed
> by
> memdev) is intended to be SPM, and this approach provides a cleaner and
> safer
> configuration interface.
> 

I figured this was the case, and honestly this just provides more
evidence that any given NUMA node probably should only have 1 "type" of
memory (or otherwise stated: uniform access within a node, non-uniform
across nodes).

---

bit of an aside - but at LPC we also talked about SPM NUMA nodes:
https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/

Would be cool to be able to detect this in the drivers and have hotplug
automatically mark a node SPM unless a driver overrides it.
(MHP flag? Sorry David :P)

> > 
> > ~Gregory
> 
> Please let me know if you have further concerns or suggestions.
> 

I'll look at the patch details a bit more, but generally I like the
direction - with an obvious note that I have a biased given the above.

~Gregory
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by David Hildenbrand (Red Hat) 1 month, 1 week ago
On 12/30/25 15:06, Gregory Price wrote:
> On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote:
>> Hi Gregory,
>>
>>> Sorry i've missed prior versions, is numa the right place to put this,
>>> considering that the node is not necessarily 100% SPM on a real system?
>>>
>>
>> The decision to add `spm=` to NUMA rather than the memory backend was based
>> on
>> earlier feedback from David during our initial RFC discussions.
>>
>> David raised a concern that if we put the spm flag on the memory backend, a
>> user
>> could accidentally pass such a memory backend to DIMM/virtio-mem/boot
>> memory,
>> which would have very undesired side effects.
>>
> 
> This makes sense, and in fact I almost wonder if we should actually
> encode a warning in linux in general if a signal NUMA node contains
> both normal and SPM.  That would help drive consistency between QEMU/KVM
> and real platforms from the direction of linux.

Yeah, in theory we would have a "memory device" for all boot memory 
(boot DIMM, not sure ...) and that one would actually be marked as "spm".

It's not really a thing of a memory backend after all, it's only how 
that memory is exposed to the VM.

And given we don't have a boot memory device, the idea was to set it for 
the Node, where it means "all boot memory is SPM". And we only allow one 
type of boot memory (one memory backend) per node in QEMU.

The tricky question is what happens with memory hotplug (DIMMs etc) on 
such a node. I'd argue that it's simply not SPM.

> 
>>> (in practice it should be, but not technically required to be)
>>
>> You're right that on a real system, a NUMA node is not technically required
>> to
>> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed
>> by
>> memdev) is intended to be SPM, and this approach provides a cleaner and
>> safer
>> configuration interface.
>>
> 
> I figured this was the case, and honestly this just provides more
> evidence that any given NUMA node probably should only have 1 "type" of
> memory (or otherwise stated: uniform access within a node, non-uniform
> across nodes).

That makes sense.

> 
> ---
> 
> bit of an aside - but at LPC we also talked about SPM NUMA nodes:
> https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/
> 
> Would be cool to be able to detect this in the drivers and have hotplug
> automatically mark a node SPM unless a driver overrides it.
> (MHP flag? Sorry David :P)

:)

If it's a per-node thing, MHP flags feel a bit like "too late". It 
should be configured earlier for the node somehow.

> 
>>>
>>> ~Gregory
>>
>> Please let me know if you have further concerns or suggestions.
>>
> 
> I'll look at the patch details a bit more, but generally I like the
> direction - with an obvious note that I have a biased given the above.


Thanks for taking a look!


-- 
Cheers

David
Re: [PATCH v4 0/1] numa: add 'spm' option for Specific Purpose Memory
Posted by Gregory Price 1 month, 1 week ago
On Tue, Dec 30, 2025 at 09:15:34PM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/30/25 15:06, Gregory Price wrote:
> 
> And given we don't have a boot memory device, the idea was to set it for the
> Node, where it means "all boot memory is SPM". And we only allow one type of
> boot memory (one memory backend) per node in QEMU.
> 
> The tricky question is what happens with memory hotplug (DIMMs etc) on such
> a node. I'd argue that it's simply not SPM.
>

...

+++ .../docs/whatever

+ Don't do that.

:]

> > 
> > ---
> > 
> > bit of an aside - but at LPC we also talked about SPM NUMA nodes:
> > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/
> > 
> > Would be cool to be able to detect this in the drivers and have hotplug
> > automatically mark a node SPM unless a driver overrides it.
> > (MHP flag? Sorry David :P)
> 
> :)
> 
> If it's a per-node thing, MHP flags feel a bit like "too late". It should be
> configured earlier for the node somehow.
> 

just a clarification, the flag would be an override to have mhp mark a
node N_MEMORY instead of N_SPM.

As it stands right now, a node is "online with memory" if N_MEMORY is
set for that node.

https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/memory_hotplug.c#L717

I imagine hotplugged N_SPM would operate the same.

So mhp code would look like

if (node_data->is_spm && !override)
	node_set_state(node, N_SPM)
else
	node_set_state(node, N_MEMORY)

Basically would allow SPM nodes to operate the same as they did before
when hotplugged to retain existing behavior.

(Sorry i'm think waaaaaaaaaaaaay far ahead here)

~Gregory