Hi all, This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan for the detailed review. Changes in v4 (addressing Jonathan's feedback): - Added architecture check: spm=on now reports error on non-x86 machines - Simplified return logic in e820_update_entry_type() (return true/false directly) - Changed 4GB boundary spanning from warn_report to error_report + exit - Updated QAPI documentation to be architecture-agnostic (removed E820 reference) - Removed unnecessary comments Use case: This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from host to guest VM, useful for memory reserved for specific PCI devices (e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved to the guest and is managed by device drivers rather than the OS memory allocator. Example usage: -object memory-backend-ram,size=8G,id=m0 -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 -numa node,nodeid=0,memdev=m0 -numa node,nodeid=1,memdev=m1,spm=on Please review. Thanks! Best regards, Jerry Huang -- 2.34.1
On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on > Should discussion with Jonathan - whatever form this ends up taking, can we change this from [on,off] to [normal,spm,reserved] and apply the appropriate types accordingly? don't know what to name the tag in that case, something like.. memmap_type=[normal,spm,reserved] ? (not married to this, open to suggestions) ~Gregory
On 1/2/26 17:30, Gregory Price wrote: > On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >> -numa node,nodeid=0,memdev=m0 >> -numa node,nodeid=1,memdev=m1,spm=on >> > > Should discussion with Jonathan - whatever form this ends up taking, can > we change this from [on,off] to [normal,spm,reserved] and apply the > appropriate types accordingly? > > don't know what to name the tag in that case, something like.. > > memmap_type=[normal,spm,reserved] ? That looks more extensible indeed. The semantics would be unchanged compared to spm=on: only applies to boot memory. Although, as discussed, mixing and matching types per node should be avoided either way. -- Cheers David
On 1/5/2026 11:29 PM, David Hildenbrand (Red Hat) wrote: > On 1/2/26 17:30, Gregory Price wrote: >> On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >>> -numa node,nodeid=0,memdev=m0 >>> -numa node,nodeid=1,memdev=m1,spm=on >>> >> >> Should discussion with Jonathan - whatever form this ends up taking, can >> we change this from [on,off] to [normal,spm,reserved] and apply the >> appropriate types accordingly? >> >> don't know what to name the tag in that case, something like.. >> >> memmap_type=[normal,spm,reserved] ? > > That looks more extensible indeed. > > The semantics would be unchanged compared to spm=on: only applies to > boot memory. Although, as discussed, mixing and matching types per node > should be avoided either way. > Hi Gregory, David, Thank you for the suggestion on making this more extensible. I agree that `memmap_type=[normal,spm,reserved]` is a better approach than the simple boolean `spm=on|off`. I've analyzed the required changes and will prepare an updated patch implementing this. However, I need to go through an internal review process before submitting to the community, which may take some time. In the meantime, any feedback or suggestions on the design are welcome. Best Regards, Jerry Huang
On Tue, 9 Dec 2025 17:38:40 +0800 fanhuang <FangSheng.Huang@amd.com> wrote: > Hi all, > > This is v4 of the SPM (Specific Purpose Memory) patch. Thank you Jonathan > for the detailed review. > > Changes in v4 (addressing Jonathan's feedback): > - Added architecture check: spm=on now reports error on non-x86 machines > - Simplified return logic in e820_update_entry_type() (return true/false directly) > - Changed 4GB boundary spanning from warn_report to error_report + exit > - Updated QAPI documentation to be architecture-agnostic (removed E820 reference) > - Removed unnecessary comments > > Use case: > This feature allows passing EFI_MEMORY_SP (Specific Purpose Memory) from > host to guest VM, useful for memory reserved for specific PCI devices > (e.g., GPU memory via VFIO-PCI). The SPM memory appears as soft reserved > to the guest and is managed by device drivers rather than the OS memory > allocator. > > Example usage: > -object memory-backend-ram,size=8G,id=m0 > -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on I'm still not fond of 'spm' toggle on numa node itself (even though on AMD hadware sunch memory has 1:1 mapping) without device model in between. Can we try following instead: * add 'spm' property to DIMM device and disable hotplug on it in such case * make E820 enumerate spm/not hotpluggble marked DIMMs. That will let us later to have mixed memory on the node if such need arises without breaking QEMU CLI. > Please review. Thanks! > > Best regards, > Jerry Huang >
On Fri, Jan 02, 2026 at 02:09:22PM +0100, Igor Mammedov wrote: > That will let us later to have mixed memory on the node We were just discussing strongly-dissuading such a configuration from a linux perspective, even if it's technically allowed. If only because it makes reasoning about placement policy on such a node completely impossible. ~Gregory
On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: > Example usage: > -object memory-backend-ram,size=8G,id=m0 > -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 > -numa node,nodeid=0,memdev=m0 > -numa node,nodeid=1,memdev=m1,spm=on > Interesting that you added spm= to NUMA rather than the memory backend, but then in the patch you consume it to apply to the EFI/E820 memory maps. Sorry i've missed prior versions, is numa the right place to put this, considering that the node is not necessarily 100% SPM on a real system? (in practice it should be, but not technically required to be) ~Gregory
Hi Gregory, Thanks for your review and good question! On 12/30/2025 2:26 AM, Gregory Price wrote: > On Tue, Dec 09, 2025 at 05:38:40PM +0800, fanhuang wrote: >> Example usage: >> -object memory-backend-ram,size=8G,id=m0 >> -object memory-backend-file,size=8G,id=m1,mem-path=/dev/dax0.0 >> -numa node,nodeid=0,memdev=m0 >> -numa node,nodeid=1,memdev=m1,spm=on >> > > Interesting that you added spm= to NUMA rather than the memory backend, > but then in the patch you consume it to apply to the EFI/E820 memory > maps. > > Sorry i've missed prior versions, is numa the right place to put this, > considering that the node is not necessarily 100% SPM on a real system? > The decision to add `spm=` to NUMA rather than the memory backend was based on earlier feedback from David during our initial RFC discussions. David raised a concern that if we put the spm flag on the memory backend, a user could accidentally pass such a memory backend to DIMM/virtio-mem/boot memory, which would have very undesired side effects. > (in practice it should be, but not technically required to be) You're right that on a real system, a NUMA node is not technically required to be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed by memdev) is intended to be SPM, and this approach provides a cleaner and safer configuration interface. > > ~Gregory Please let me know if you have further concerns or suggestions. Best Regards, Jerry Huang
On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote: > Hi Gregory, > > > Sorry i've missed prior versions, is numa the right place to put this, > > considering that the node is not necessarily 100% SPM on a real system? > > > > The decision to add `spm=` to NUMA rather than the memory backend was based > on > earlier feedback from David during our initial RFC discussions. > > David raised a concern that if we put the spm flag on the memory backend, a > user > could accidentally pass such a memory backend to DIMM/virtio-mem/boot > memory, > which would have very undesired side effects. > This makes sense, and in fact I almost wonder if we should actually encode a warning in linux in general if a signal NUMA node contains both normal and SPM. That would help drive consistency between QEMU/KVM and real platforms from the direction of linux. > > (in practice it should be, but not technically required to be) > > You're right that on a real system, a NUMA node is not technically required > to > be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed > by > memdev) is intended to be SPM, and this approach provides a cleaner and > safer > configuration interface. > I figured this was the case, and honestly this just provides more evidence that any given NUMA node probably should only have 1 "type" of memory (or otherwise stated: uniform access within a node, non-uniform across nodes). --- bit of an aside - but at LPC we also talked about SPM NUMA nodes: https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ Would be cool to be able to detect this in the drivers and have hotplug automatically mark a node SPM unless a driver overrides it. (MHP flag? Sorry David :P) > > > > ~Gregory > > Please let me know if you have further concerns or suggestions. > I'll look at the patch details a bit more, but generally I like the direction - with an obvious note that I have a biased given the above. ~Gregory
On 12/30/25 15:06, Gregory Price wrote: > On Tue, Dec 30, 2025 at 10:55:02AM +0800, Huang, FangSheng (Jerry) wrote: >> Hi Gregory, >> >>> Sorry i've missed prior versions, is numa the right place to put this, >>> considering that the node is not necessarily 100% SPM on a real system? >>> >> >> The decision to add `spm=` to NUMA rather than the memory backend was based >> on >> earlier feedback from David during our initial RFC discussions. >> >> David raised a concern that if we put the spm flag on the memory backend, a >> user >> could accidentally pass such a memory backend to DIMM/virtio-mem/boot >> memory, >> which would have very undesired side effects. >> > > This makes sense, and in fact I almost wonder if we should actually > encode a warning in linux in general if a signal NUMA node contains > both normal and SPM. That would help drive consistency between QEMU/KVM > and real platforms from the direction of linux. Yeah, in theory we would have a "memory device" for all boot memory (boot DIMM, not sure ...) and that one would actually be marked as "spm". It's not really a thing of a memory backend after all, it's only how that memory is exposed to the VM. And given we don't have a boot memory device, the idea was to set it for the Node, where it means "all boot memory is SPM". And we only allow one type of boot memory (one memory backend) per node in QEMU. The tricky question is what happens with memory hotplug (DIMMs etc) on such a node. I'd argue that it's simply not SPM. > >>> (in practice it should be, but not technically required to be) >> >> You're right that on a real system, a NUMA node is not technically required >> to >> be 100% SPM. However, in AMD's use case, the entire NUMA node memory (backed >> by >> memdev) is intended to be SPM, and this approach provides a cleaner and >> safer >> configuration interface. >> > > I figured this was the case, and honestly this just provides more > evidence that any given NUMA node probably should only have 1 "type" of > memory (or otherwise stated: uniform access within a node, non-uniform > across nodes). That makes sense. > > --- > > bit of an aside - but at LPC we also talked about SPM NUMA nodes: > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ > > Would be cool to be able to detect this in the drivers and have hotplug > automatically mark a node SPM unless a driver overrides it. > (MHP flag? Sorry David :P) :) If it's a per-node thing, MHP flags feel a bit like "too late". It should be configured earlier for the node somehow. > >>> >>> ~Gregory >> >> Please let me know if you have further concerns or suggestions. >> > > I'll look at the patch details a bit more, but generally I like the > direction - with an obvious note that I have a biased given the above. Thanks for taking a look! -- Cheers David
On Tue, Dec 30, 2025 at 09:15:34PM +0100, David Hildenbrand (Red Hat) wrote: > On 12/30/25 15:06, Gregory Price wrote: > > And given we don't have a boot memory device, the idea was to set it for the > Node, where it means "all boot memory is SPM". And we only allow one type of > boot memory (one memory backend) per node in QEMU. > > The tricky question is what happens with memory hotplug (DIMMs etc) on such > a node. I'd argue that it's simply not SPM. > ... +++ .../docs/whatever + Don't do that. :] > > > > --- > > > > bit of an aside - but at LPC we also talked about SPM NUMA nodes: > > https://lore.kernel.org/linux-mm/20251112192936.2574429-1-gourry@gourry.net/ > > > > Would be cool to be able to detect this in the drivers and have hotplug > > automatically mark a node SPM unless a driver overrides it. > > (MHP flag? Sorry David :P) > > :) > > If it's a per-node thing, MHP flags feel a bit like "too late". It should be > configured earlier for the node somehow. > just a clarification, the flag would be an override to have mhp mark a node N_MEMORY instead of N_SPM. As it stands right now, a node is "online with memory" if N_MEMORY is set for that node. https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/memory_hotplug.c#L717 I imagine hotplugged N_SPM would operate the same. So mhp code would look like if (node_data->is_spm && !override) node_set_state(node, N_SPM) else node_set_state(node, N_MEMORY) Basically would allow SPM nodes to operate the same as they did before when hotplugged to retain existing behavior. (Sorry i'm think waaaaaaaaaaaaay far ahead here) ~Gregory
© 2016 - 2026 Red Hat, Inc.