[PATCH v2 0/2] Register device memory for poison handling

ankita@nvidia.com posted 2 patches 3 weeks, 2 days ago
drivers/vfio/pci/nvgrace-gpu/main.c | 113 +++++++++++++++++++++++++++-
include/linux/memory-failure.h      |  13 +++-
2 files changed, 120 insertions(+), 6 deletions(-)
[PATCH v2 0/2] Register device memory for poison handling
Posted by ankita@nvidia.com 3 weeks, 2 days ago
From: Ankit Agrawal <ankita@nvidia.com>

Linux MM provides interfaces to allow a driver to [un]register device
memory not backed by struct page for poison handling through
memory_failure.

The device memory on NVIDIA Grace based systems are not added to the
kernel and are not backed by struct pages. So nvgrace-gpu module
which manages the device memory can make use of these interfaces to
get the benefit of poison handling. Make nvgrace-gpu register the device
memory with the MM on open.

Moreover, the stubs are added to accommodate for CONFIG_MEMORY_FAILURE
being disabled.

Patch 1/2 introduces stubs for CONFIG_MEMORY_FAILURE disabled.
Patch 2/2 registers the device memory at the time of open instead of mmap.

Note that this is a reposting of an earlier series [1] which is partly
(patch 1/3) merged to v6.19-rc4. This one addresses the leftover patching.
Many thanks to Jason Gunthorpe (jgg@nvidia.com) and Alex Williamson
(alex@shazbot.org) for valuable suggestions.

Link: https://lore.kernel.org/all/20251213044708.3610-1-ankita@nvidia.com/ [1]

Changelog:
v2:
- Fixed nit to cleanup nvgrace_gpu_vfio_pci_register_pfn_range
  (Thanks Jiaqi Yan)
Link: https://lore.kernel.org/all/20260108153548.7386-1-ankita@nvidia.com/ [v1]

Ankit Agrawal (2):
  mm: add stubs for PFNMAP memory failure registration functions
  vfio/nvgrace-gpu: register device memory for poison handling

 drivers/vfio/pci/nvgrace-gpu/main.c | 113 +++++++++++++++++++++++++++-
 include/linux/memory-failure.h      |  13 +++-
 2 files changed, 120 insertions(+), 6 deletions(-)

-- 
2.34.1
Re: [PATCH v2 0/2] Register device memory for poison handling
Posted by Alex Williamson 2 weeks, 5 days ago
On Thu, 15 Jan 2026 20:28:47 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> Linux MM provides interfaces to allow a driver to [un]register device
> memory not backed by struct page for poison handling through
> memory_failure.
> 
> The device memory on NVIDIA Grace based systems are not added to the
> kernel and are not backed by struct pages. So nvgrace-gpu module
> which manages the device memory can make use of these interfaces to
> get the benefit of poison handling. Make nvgrace-gpu register the device
> memory with the MM on open.
> 
> Moreover, the stubs are added to accommodate for CONFIG_MEMORY_FAILURE
> being disabled.
> 
> Patch 1/2 introduces stubs for CONFIG_MEMORY_FAILURE disabled.
> Patch 2/2 registers the device memory at the time of open instead of mmap.
> 
> Note that this is a reposting of an earlier series [1] which is partly
> (patch 1/3) merged to v6.19-rc4. This one addresses the leftover patching.
> Many thanks to Jason Gunthorpe (jgg@nvidia.com) and Alex Williamson
> (alex@shazbot.org) for valuable suggestions.
> 
> Link: https://lore.kernel.org/all/20251213044708.3610-1-ankita@nvidia.com/ [1]
> 
> Changelog:
> v2:
> - Fixed nit to cleanup nvgrace_gpu_vfio_pci_register_pfn_range
>   (Thanks Jiaqi Yan)
> Link: https://lore.kernel.org/all/20260108153548.7386-1-ankita@nvidia.com/ [v1]
> 
> Ankit Agrawal (2):
>   mm: add stubs for PFNMAP memory failure registration functions
>   vfio/nvgrace-gpu: register device memory for poison handling
> 
>  drivers/vfio/pci/nvgrace-gpu/main.c | 113 +++++++++++++++++++++++++++-
>  include/linux/memory-failure.h      |  13 +++-
>  2 files changed, 120 insertions(+), 6 deletions(-)
> 

Applied to vfio next branch for v6.20/7.0.  Thanks,

Alex