drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ 1 file changed, 3 insertions(+)
From: Arjan van de Ven <arjan@linux.intel.com>
RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
resources. The gfx_v12_0 initialisation code correctly leaves
adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
zero to reflect this.
amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
each of these resources regardless of size. When the size is zero,
amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.
Guard against this by returning 0 early from
amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM
resource manager registration for hardware resources that are absent,
without affecting any other GPU type.
Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376
Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html
Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index afaaab6496def..8075ac735321e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
unsigned int type,
uint64_t size_in_page)
{
+ if (!size_in_page)
+ return 0;
+
return ttm_range_man_init(&adev->mman.bdev, type,
false, size_in_page);
}
On 4/20/26 23:57, arjan@linux.intel.com wrote:
>
> RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
> resources. The gfx_v12_0 initialisation code correctly leaves
> adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
> zero to reflect this.
>
> amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
> each of these resources regardless of size. When the size is zero,
> amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
> which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
> DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
> zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.
Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there.
We are clearly missing something here. Is that on an upstream kernel or something backported?
Regards,
Christian.
>
> Guard against this by returning 0 early from
> amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM
> resource manager registration for hardware resources that are absent,
> without affecting any other GPU type.
>
> Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376
> Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html
> Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86.
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index afaaab6496def..8075ac735321e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
> unsigned int type,
> uint64_t size_in_page)
> {
> + if (!size_in_page)
> + return 0;
> +
> return ttm_range_man_init(&adev->mman.bdev, type,
> false, size_in_page);
> }
On Tue, Apr 21, 2026 at 2:59 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 4/20/26 23:57, arjan@linux.intel.com wrote:
> >
> > RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
> > resources. The gfx_v12_0 initialisation code correctly leaves
> > adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
> > zero to reflect this.
> >
> > amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
> > each of these resources regardless of size. When the size is zero,
> > amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
> > which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
> > DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
> > zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.
>
> Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there.
>
> We are clearly missing something here. Is that on an upstream kernel or something backported?
Looks like that check only asserts if CONFIG_DRM_DEBUG_MM is set in
the user's kernel config. I guess no one uses that option. These
chips have been in the market for over a year and no one has reported
that until now. Applied with a note about this in the commit message.
Thanks!
Alex
>
> Regards,
> Christian.
>
> >
> > Guard against this by returning 0 early from
> > amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM
> > resource manager registration for hardware resources that are absent,
> > without affecting any other GPU type.
> >
> > Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376
> > Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html
> > Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86.
> > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-kernel@vger.kernel.org
> >
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index afaaab6496def..8075ac735321e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
> > unsigned int type,
> > uint64_t size_in_page)
> > {
> > + if (!size_in_page)
> > + return 0;
> > +
> > return ttm_range_man_init(&adev->mman.bdev, type,
> > false, size_in_page);
> > }
>
On 4/20/2026 11:42 PM, Christian König wrote: > On 4/20/26 23:57, arjan@linux.intel.com wrote: >> >> RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory >> resources. The gfx_v12_0 initialisation code correctly leaves >> adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at >> zero to reflect this. >> >> amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for >> each of these resources regardless of size. When the size is zero, >> amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(), >> which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires >> DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is >> zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT. > > Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there. > > We are clearly missing something here. Is that on an upstream kernel or something backported? > the reported oops/etc say 6.18.22 so that does not sound like something crazy backported (https://bugzilla.kernel.org/show_bug.cgi?id=221376)
© 2016 - 2026 Red Hat, Inc.