From nobody Sat Jun 13 00:24:00 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F50B3FD131; Mon, 11 May 2026 17:30:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520657; cv=none; b=oqOq/F3DZn0DnSuqwbvImcQLiaeYSHgtsoM2E47wdvOygzQniOyawWXI6Ir45b1yPAchv99NKixyCdKTHCMCSTRJvZEGITodCJFgISNP/YC4t/HdXWhMfsGjD1tC+CUvR7D+74mmJXhDfB8wqkuZGceCSkckZj3GoaP2F90nOt4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520657; c=relaxed/simple; bh=Z5tnKsqQw7EiBeSDr2ywtUGax4C0M+Mxkthf6nfbtWw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Cv750ScBUK+71UQL1coRQ622hWFPS4gYI349OWdthYxtMfmzNhUiSlJ6JKz+XHFWd7rUujCKyAvtHpYWx2T6ppthpkX+JMNiH+58IVGXi1Yxg98Vk7RCHuSZ/3pjfKuNGBpHCFpSyUh6d216kzmZz/D8TZYKgQ8EJTg+yGwhiyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hE4ZFYc/; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hE4ZFYc/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778520655; x=1810056655; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Z5tnKsqQw7EiBeSDr2ywtUGax4C0M+Mxkthf6nfbtWw=; b=hE4ZFYc/2/nB0/Y+xi27khGjg96mQK1WGrvduCh+zcRdasiKIJGF/ZLk ppc9Quq/yAidwJkXtWU1g/MpnXxWAspoEk7f8GPP/eI//s5LaEs8bd2EP DXoBPg82KaxfDoIhb1HKPfdxpRXc97eZWQxHJgJSBePWAlURix2jltFTa PCd5n4oebSyT/bR9KiSPpPXz+QuUlnUMyqhtFCDeheLEmvR7shhLzgaKJ S8eZGePWsFKtXr3T3WfR0LcDzK+QOmGuDaJBk9ql/pPZLK2ED70C9kHji RrpLs2rKpEjmSJ7sglvx764sYZCRAUSNAnicqH2EGn9I8iFCUp5A4gK76 w==; X-CSE-ConnectionGUID: lzDWTOeLQw2U91JTJuLDrQ== X-CSE-MsgGUID: iXZh2kymReGM5jjDpkLnPQ== X-IronPort-AV: E=McAfee;i="6800,10657,11783"; a="79314118" X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="79314118" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:45 -0700 X-CSE-ConnectionGUID: cQZX9EhnR/iT6c7D0YB+FQ== X-CSE-MsgGUID: RVLexnFOSyCbdPBgRcYhSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="261000339" Received: from pgcooper-mobl3.ger.corp.intel.com (HELO fedora) ([10.245.244.248]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:39 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Sashiko-bot , Friedrich Vock , Maarten Lankhorst , Tejun Heo , Maxime Ripard , =?UTF-8?q?Christian=20K=C3=B6nig?= , Alex Deucher , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, stable@vger.kernel.org, Natalie Vock , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , cgroups@vger.kernel.org, Huang Rui , Matthew Brost , Matthew Auld , Maarten Lankhorst , Thomas Zimmermann , Simona Vetter , David Airlie , Rodrigo Vivi , linux-kernel@vger.kernel.org Subject: [PATCH v3 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Date: Mon, 11 May 2026 19:30:04 +0200 Message-ID: <20260511173008.36526-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable drmm_cgroup_register_region() is called before INIT_LIST_HEAD() and gpu_buddy_init() in amdgpu_vram_mgr_init(). If it fails, the function returns early and bypasses those initializations. Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init() is called, a failure triggers amdgpu_ttm_fini(), which calls amdgpu_vram_mgr_fini(), which then: - Calls list_for_each_entry_safe() on reservations_pending and reserved_pages, whose list_head::next pointers are zero-initialized (NULL). The loop does not recognize them as empty and dereferences NULL. - Calls gpu_buddy_fini(), which iterates free_trees[] unconditionally via for_each_free_tree(). Since mm->free_trees is NULL (never allocated), this dereferences NULL. Both result in a kernel panic on the module load error path. Fix by moving drmm_cgroup_register_region() to after the list and buddy allocator are fully initialized, so the teardown path is safe to run. Reported-by: Sashiko-bot Closes: https://sashiko.dev/#/patchset/20260428073116.15687-1-thomas.hellst= rom@linux.intel.com?part=3D4 Fixes: 2b624a2c1865 ("drm/ttm: Handle cgroup based eviction in TTM") Cc: Friedrich Vock Cc: Maarten Lankhorst Cc: Tejun Heo Cc: Maxime Ripard Cc: Christian K=C3=B6nig Cc: Alex Deucher Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: # v6.14+ Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm= /amd/amdgpu/amdgpu_vram_mgr.c index 2a241a5b12c4..ac3f71d77140 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -918,9 +918,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) struct ttm_resource_manager *man =3D &mgr->manager; int err; =20 - man->cg =3D drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->= gmc.real_vram_size); - if (IS_ERR(man->cg)) - return PTR_ERR(man->cg); ttm_resource_manager_init(man, &adev->mman.bdev, adev->gmc.real_vram_size); =20 @@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) if (err) return err; =20 + man->cg =3D drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->= gmc.real_vram_size); + if (IS_ERR(man->cg)) + return PTR_ERR(man->cg); + ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); ttm_resource_manager_set_used(man, true); return 0; --=20 2.54.0 From nobody Sat Jun 13 00:24:00 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30CAD4508FF; Mon, 11 May 2026 17:30:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520658; cv=none; b=tUT9SkrEXYhRmbSSYVZ6bSTx9uDa4VQ/4KTmW26dnkJEy+pMMYhcKgd0lgsoqJtxRJHuSGbLnQqp7SewuU66V4/BrAWYtWJ4c1QDkIzNM6zl2ZV6DU18MH/8Cghz7b5Xo6/3HNY0XiYjw7tAfhn89WjR9ic5rei0scVQ5UWqMPU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520658; c=relaxed/simple; bh=bQWecHFcBq87ss3OM4J3cujuOBJnVEKB2A85X2D8cPQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IiKPL7g33xyxW6LuiS6xq8YkDAc8G9x4dI5ULXZpJwCo/HI+NkvmNLi0rLWDHqR8UUj0L6wXrNTUGmIxDnij+SoK0a1tCH4UdIZY7AeZk24SMXVfc/buf8n7YC9BO+Lul6JQCgajOGoQBn99A9uUpXFo/9Zho/803cIG1gXLklQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Zf0QM4mB; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Zf0QM4mB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778520657; x=1810056657; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bQWecHFcBq87ss3OM4J3cujuOBJnVEKB2A85X2D8cPQ=; b=Zf0QM4mBXUTy7haG26vBgKL27vnE1EwT/iIhzvrN6zwGYlrdDYt918rr /kSwSRjVoTVsUatR7ATvEIvXWPs6wl9RuuUj8MtSBEUzbUMoLO/PRpVJ5 D9bV+EPDrifqoFrH/fm48+YWNdwvpd2BwbobJsORZjKwrncncrxMsTprn VQIqOtwY3AxFtekf3nI2GHucqgq1JyZxCPCLRtsaQb8y7/uadKuhFxifP q3mTqfqc9AO8r7aHGHd6+vzjD5aeVhskLHJEACWd2+p8Q9VuHL0t2C+eQ +Ytvzs1w9QehUKAB3sVJmmbI2nuFWAVvEst2kLyTNmhzlhBj3NHzKE93Y w==; X-CSE-ConnectionGUID: RxRjXKK/TMSCCRZawa4hvg== X-CSE-MsgGUID: 2hwJpjGlQXC/2TwBReJ+8Q== X-IronPort-AV: E=McAfee;i="6800,10657,11783"; a="79314148" X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="79314148" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:49 -0700 X-CSE-ConnectionGUID: m3qXd0HlSQe47iinqetkOQ== X-CSE-MsgGUID: 0/ieesWXQZGi4uGfltaS1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="261000363" Received: from pgcooper-mobl3.ger.corp.intel.com (HELO fedora) ([10.245.244.248]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:44 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Natalie Vock , Johannes Weiner , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , cgroups@vger.kernel.org, Huang Rui , Matthew Brost , Matthew Auld , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Simona Vetter , David Airlie , =?UTF-8?q?Christian=20K=C3=B6nig?= , Alex Deucher , Rodrigo Vivi , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Date: Mon, 11 May 2026 19:30:05 +0200 Message-ID: <20260511173008.36526-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add an optional reclaim callback to struct dmem_cgroup_region. When dmem.max is set below the current usage of a cgroup pool, the new limit is applied immediately (so that concurrent allocations are throttled while reclaim is in progress) and then the driver is asked to evict memory to bring usage back below the limit. Reclaim is attempted up to a bounded number of times. No error is returned to userspace if usage remains above the limit after reclaim, and a pending signal will abort the reclaim loop early. This matches the behavior of memory.max in the memory cgroup controller. Also honor O_NONBLOCK so that if that flag is set during the max value write, no reclaim is initiated. The idea is to avoid charging the reclaim cost to the writer of the max value. v2: - Write max before reclaim is attempted (Maarten) - Let signals abort the reclaim without error (Maarten) - If a new max value is written with the O_NONBLOCK flag, reclaim is not attempted (Maarten) - Extract region from the pool parameter rather than passing it explicitly to set_resource_xxx(). v3: - Use an rwsem to protect reclaim callback registration and region unregister against concurrent reclaim invocations, ensuring reclaim_priv is visible when the callback is invoked. (Sashiko-bot) Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- include/linux/cgroup_dmem.h | 24 ++++++++ kernel/cgroup/dmem.c | 106 +++++++++++++++++++++++++++++++++--- 2 files changed, 121 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h index dd4869f1d736..c3bce21cbe80 100644 --- a/include/linux/cgroup_dmem.h +++ b/include/linux/cgroup_dmem.h @@ -14,6 +14,21 @@ struct dmem_cgroup_pool_state; /* Opaque definition of a cgroup region, used internally */ struct dmem_cgroup_region; =20 +/** + * typedef dmem_cgroup_reclaim_fn_t - Reclaim callback for a dmem cgroup r= egion. + * @pool: The cgroup pool that needs memory reclaimed. + * @target_bytes: Minimum number of bytes the driver should attempt to fre= e. + * @priv: Private data registered with dmem_cgroup_region_set_reclaim(). + * + * Called by the dmem cgroup controller when dmem.max is set below the cur= rent + * usage of @pool. The driver should evict at least @target_bytes of memory + * from @pool. May be called multiple times if usage remains above the lim= it. + * + * Return: 0 if progress was made, negative error code otherwise. + */ +typedef int (*dmem_cgroup_reclaim_fn_t)(struct dmem_cgroup_pool_state *poo= l, + u64 target_bytes, void *priv); + #if IS_ENABLED(CONFIG_CGROUP_DMEM) struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const cha= r *name_fmt, ...) __printf(2,3); void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region); @@ -26,6 +41,9 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_= pool_state *limit_pool, bool ignore_low, bool *ret_hit_low); =20 void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool); +void dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region, + dmem_cgroup_reclaim_fn_t reclaim, + void *priv); #else static inline __printf(2,3) struct dmem_cgroup_region * dmem_cgroup_register_region(u64 size, const char *name_fmt, ...) @@ -62,5 +80,11 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup= _pool_state *limit_pool, static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_stat= e *pool) { } =20 +static inline void +dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region, + dmem_cgroup_reclaim_fn_t reclaim, + void *priv) +{ } + #endif #endif /* _CGROUP_DMEM_H */ diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c index 1ab1fb47f271..5fd5a1634d21 100644 --- a/kernel/cgroup/dmem.c +++ b/kernel/cgroup/dmem.c @@ -51,6 +51,20 @@ struct dmem_cgroup_region { * No new pools should be added to the region afterwards. */ bool unregistered; + + /** + * @reclaim: Optional callback invoked when dmem.max is set below the + * current usage of a pool. The driver should attempt to free at least + * @target_bytes from @pool. May be called multiple times if usage + * remains above the limit after returning. + */ + dmem_cgroup_reclaim_fn_t reclaim; + + /** @reclaim_priv: Private data passed to @reclaim. */ + void *reclaim_priv; + + /** @unregister_sem: Protect @reclaim while it is running. */ + struct rw_semaphore unregister_sem; }; =20 struct dmemcg_state { @@ -145,21 +159,58 @@ static void free_cg_pool(struct dmem_cgroup_pool_stat= e *pool) } =20 static void -set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val) +set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblo= ck) { page_counter_set_min(&pool->cnt, val); } =20 static void -set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val) +set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblo= ck) { page_counter_set_low(&pool->cnt, val); } =20 static void -set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val) +set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblo= ck) { - page_counter_set_max(&pool->cnt, val); + struct dmem_cgroup_region *region =3D pool->region; + + /* + * Always update the limit, even if usage currently exceeds it. + * Concurrent allocations will be throttled against the new limit + * while reclaim is in progress. + */ + xchg(&pool->cnt.max, (unsigned long)val); + + if (nonblock || !READ_ONCE(region->reclaim)) + return; + + for (int retries =3D 5; retries > 0; retries--) { + u64 usage =3D page_counter_read(&pool->cnt); + int ret; + + if (usage <=3D val) + break; + + if (signal_pending(current)) + break; + + /* Block unregister until the reclaim callback completes. */ + if (down_read_interruptible(®ion->unregister_sem)) + break; + + if (!region->reclaim) { + up_read(®ion->unregister_sem); + break; + } + + ret =3D region->reclaim(pool, usage - val, region->reclaim_priv); + up_read(®ion->unregister_sem); + if (ret) + break; + + cond_resched(); + } } =20 static u64 get_resource_low(struct dmem_cgroup_pool_state *pool) @@ -184,9 +235,9 @@ static u64 get_resource_current(struct dmem_cgroup_pool= _state *pool) =20 static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool) { - set_resource_min(rpool, 0); - set_resource_low(rpool, 0); - set_resource_max(rpool, PAGE_COUNTER_MAX); + set_resource_min(rpool, 0, false); + set_resource_low(rpool, 0, false); + set_resource_max(rpool, PAGE_COUNTER_MAX, false); } =20 static void dmemcs_offline(struct cgroup_subsys_state *css) @@ -491,6 +542,12 @@ void dmem_cgroup_unregister_region(struct dmem_cgroup_= region *region) region->unregistered =3D true; spin_unlock(&dmemcg_lock); =20 + /* Ensure all reclaim() callbacks have finished. */ + down_write(®ion->unregister_sem); + /* Pairs with READ_ONCE() in set_resource_max() */ + WRITE_ONCE(region->reclaim, NULL); + up_write(®ion->unregister_sem); + kref_put(®ion->ref, dmemcg_free_region); } EXPORT_SYMBOL_GPL(dmem_cgroup_unregister_region); @@ -530,6 +587,7 @@ struct dmem_cgroup_region *dmem_cgroup_register_region(= u64 size, const char *fmt INIT_LIST_HEAD(&ret->pools); ret->name =3D region_name; ret->size =3D size; + init_rwsem(&ret->unregister_sem); kref_init(&ret->ref); =20 spin_lock(&dmemcg_lock); @@ -568,6 +626,34 @@ void dmem_cgroup_pool_state_put(struct dmem_cgroup_poo= l_state *pool) } EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put); =20 +/** + * dmem_cgroup_region_set_reclaim() - Register a reclaim callback on a reg= ion. + * @region: The region to register the callback for. + * @reclaim: Callback to invoke when dmem.max is set below current usage. + * Called with the pool that needs reclaiming and the number of + * bytes to free. Returns 0 on progress, negative on failure. + * @priv: Opaque pointer passed back to @reclaim. + * + * When dmem.max is lowered below the current usage of a cgroup pool, the + * dmem controller will call @reclaim with a target number of bytes to fre= e. + * After @reclaim returns the controller retries setting the limit; if usa= ge + * is still too high it calls @reclaim again, up to a bounded retry count. + */ +void dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region, + dmem_cgroup_reclaim_fn_t reclaim, + void *priv) +{ + if (!region) + return; + + down_write(®ion->unregister_sem); + region->reclaim_priv =3D priv; + /* Pairs with READ_ONCE() in set_resource_max() */ + WRITE_ONCE(region->reclaim, reclaim); + up_write(®ion->unregister_sem); +} +EXPORT_SYMBOL_GPL(dmem_cgroup_region_set_reclaim); + static struct dmem_cgroup_pool_state * get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *r= egion) { @@ -725,9 +811,10 @@ static int dmemcg_parse_limit(char *options, u64 *new_= limit) =20 static ssize_t dmemcg_limit_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, - void (*apply)(struct dmem_cgroup_pool_state *, u64)) + void (*apply)(struct dmem_cgroup_pool_state *, u64, bool)) { struct dmemcg_state *dmemcs =3D css_to_dmemcs(of_css(of)); + bool nonblock =3D of->file->f_flags & O_NONBLOCK; int err =3D 0; =20 while (buf && !err) { @@ -772,7 +859,8 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_fi= le *of, } =20 /* And commit */ - apply(pool, new_limit); + apply(pool, new_limit, nonblock); + dmemcg_pool_put(pool); =20 out_put: --=20 2.54.0 From nobody Sat Jun 13 00:24:00 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05EE8441022; Mon, 11 May 2026 17:30:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520660; cv=none; b=VKoolyhi1F3HBaeBlrqoZ6jzLsgdXT+iVaymy4JjXcxNOf5yEFHbAbKKFi/C7qbViQ515bPtJdd1cZCcSLWaJn2cvhvbvHeBYXn84l0aBzaJtjQqAAq9Vhbaw5bQeBn1+BgQJtGctihsozi9EDUU+/SZ4WI+ruyiPE737+MDEKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520660; c=relaxed/simple; bh=7fkP4XhAcBV425BQiDnb4rljqYeWpD8bk5sSN20uFww=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=anfhGB905UuA0nvT0+eDF/JW7IXLuf5F3akb6vcdUFpfg/WbQTGnGadD4hVrcWenlqDgFe1evUasn9vQnDUGmEwI5s2jxIIRDu7jZVUOChnjs4hla9uR8rvHF4z/BbDMD/T6z7W9MU1TBgFwGHaO3+bSTgTtYrgL0dQfZEKgPgs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OOJZNPx3; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OOJZNPx3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778520659; x=1810056659; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7fkP4XhAcBV425BQiDnb4rljqYeWpD8bk5sSN20uFww=; b=OOJZNPx3DVqvPfnZ4iHOEQSZf/MOrOVHvggGvb0Qgl4D2LPrrMRAQAhP fcIj867PfEzECA1DHG11f/PMcCT4KMTq9wdU+NBKxyqcM0VyqfqeZH5Xt JlaQ19AbBc3oTmItP6cbQG13/icV7PJpnVUHi6fd05aKuoVLGqQ2TNrwg dKJr7NSIJB4Rdz5NEyUSc3Wf99ui52BhoxDs51EhLhelA7LUT6+HVC36F WTmWzkgVPHifVEa07rAYE6aF3mhIY5Yr2yyJlBHZtWWJavAJhpNs9pzBN shF4+u1wPbmN4msVEucBVNfWgD/fiaEOoPbm9gfjfeCDUy/HDp0Oqn632 g==; X-CSE-ConnectionGUID: gzYq+4drSye1KkveP/DWlA== X-CSE-MsgGUID: M1kPDlJjSGKbrpDI9Lcdig== X-IronPort-AV: E=McAfee;i="6800,10657,11783"; a="79314183" X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="79314183" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:54 -0700 X-CSE-ConnectionGUID: GS+/le3pTiKc3aMWed5d2A== X-CSE-MsgGUID: BPn03mK8TbmzaqvnYA+zBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="261000394" Received: from pgcooper-mobl3.ger.corp.intel.com (HELO fedora) ([10.245.244.248]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:49 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Natalie Vock , Johannes Weiner , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , cgroups@vger.kernel.org, Huang Rui , Matthew Brost , Matthew Auld , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Simona Vetter , David Airlie , =?UTF-8?q?Christian=20K=C3=B6nig?= , Alex Deucher , Rodrigo Vivi , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Date: Mon, 11 May 2026 19:30:06 +0200 Message-ID: <20260511173008.36526-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add ttm_bo_evict_cgroup() to evict buffer objects charged to a specific dmem cgroup pool from a resource manager's LRU until a byte target is met. Add ttm_resource_manager_set_dmem_region() to register the TTM eviction path as the reclaim callback for a dmem cgroup region. The eviction context is interruptible; signals abort the operation and propagate back through the write() syscall. Introduce a new mode for the bo LRU walker so that sleeping locks can be taken. This can be used when the caller doesn't hold any previous dma_resv locks, and where it intends to hold at most one lock at a time. Like the rest of the TTM eviction this should sooner than later be converted to full WW transactions. v3: - Fix ttm_resource_manager_set_dmem_region() storing an error pointer in man->cg unconditionally. (Sashiko-bot) - Fix kernel-doc function name format for ttm_bo_evict_cgroup() and ttm_resource_manager_set_dmem_region(). Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/ttm/ttm_bo.c | 95 +++++++++++++++++++++++++++++- drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +- drivers/gpu/drm/ttm/ttm_resource.c | 37 ++++++++++++ include/drm/ttm/ttm_bo.h | 10 ++++ include/drm/ttm/ttm_resource.h | 4 ++ 5 files changed, 145 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index d85f0a37ac35..249d626dc061 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -515,12 +515,20 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk,= struct ttm_buffer_object * { struct ttm_bo_evict_walk *evict_walk =3D container_of(walk, typeof(*evict_walk), walk); + /* Capture size before eviction in case res is cleared. */ + s64 bo_size =3D bo->base.size; s64 lret; =20 if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resourc= e->css, evict_walk->try_low, &evict_walk->hit_low)) return 0; =20 + /* + * evict_walk->place is NULL in cgroup drain mode. Drivers' + * eviction_valuable() callbacks must handle a NULL place, treating it + * as "any placement": the TTM base implementation already does so via + * ttm_resource_intersects(). + */ if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->= place)) return 0; =20 @@ -536,11 +544,15 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk,= struct ttm_buffer_object * goto out; =20 evict_walk->evicted++; - if (evict_walk->res) + if (evict_walk->res) { lret =3D ttm_resource_alloc(evict_walk->evictor, evict_walk->place, evict_walk->res, NULL); - if (lret =3D=3D 0) - return 1; + if (lret =3D=3D 0) + return 1; + } else { + /* Cgroup drain: return bytes freed for byte-denominated progress. */ + return bo_size; + } out: /* Errors that should terminate the walk. */ if (lret =3D=3D -ENOSPC) @@ -614,6 +626,83 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev, return 0; } =20 +/** + * ttm_bo_evict_cgroup() - Evict buffer objects charged to a specific cgro= up. + * @bdev: The TTM device. + * @man: The resource manager whose LRU to walk. + * @limit_pool: The cgroup pool state whose members should be evicted. + * @target_bytes: Number of bytes to free. + * @ctx: The TTM operation context. + * + * Walk the LRU of @man and evict buffer objects that are charged to the + * cgroup identified by @limit_pool, until at least @target_bytes have been + * freed. Mirrors the two-pass (trylock -> sleeping-lock, low-watermark) + * strategy used by ttm_bo_evict_alloc(). + * + * Return: >=3D @target_bytes on full success, 0..target_bytes-1 if partia= l, + * negative error code on fatal error. + */ +s64 ttm_bo_evict_cgroup(struct ttm_device *bdev, + struct ttm_resource_manager *man, + struct dmem_cgroup_pool_state *limit_pool, + s64 target_bytes, + struct ttm_operation_ctx *ctx) +{ + struct ttm_bo_evict_walk evict_walk =3D { + .walk =3D { + .ops =3D &ttm_evict_walk_ops, + .arg =3D { .ctx =3D ctx }, + }, + .limit_pool =3D limit_pool, + /* place, evictor, res left NULL: selects cgroup drain mode */ + }; + s64 lret, pass; + + evict_walk.walk.arg.trylock_only =3D true; + lret =3D ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, target_bytes= ); + if (lret < 0 || lret >=3D target_bytes) + return lret; + + /* Second pass: also evict BOs at the low watermark. */ + if (evict_walk.hit_low) { + evict_walk.try_low =3D true; + pass =3D ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, + target_bytes - lret); + if (pass < 0) + return pass; + lret +=3D pass; + if (lret >=3D target_bytes) + return lret; + } + + /* Full sleeping-lock pass for remaining target. */ + evict_walk.try_low =3D evict_walk.hit_low =3D false; + evict_walk.walk.arg.trylock_only =3D false; + +retry: + evict_walk.walk.arg.sleeping_lock =3D true; + do { + evict_walk.evicted =3D 0; + pass =3D ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, + target_bytes - lret); + if (pass < 0) { + lret =3D pass; + goto out; + } + lret +=3D pass; + } while (lret < target_bytes && evict_walk.evicted); + + /* One more attempt if we hit the low limit during sleeping-lock pass. */ + if (lret < target_bytes && evict_walk.hit_low && !evict_walk.try_low) { + evict_walk.try_low =3D true; + goto retry; + } + +out: + return lret; +} +EXPORT_SYMBOL(ttm_bo_evict_cgroup); + /** * ttm_bo_pin - Pin the buffer object. * @bo: The buffer object to pin diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo= _util.c index f83b7d5ec6c6..81c6a674c462 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -999,7 +999,8 @@ __ttm_bo_lru_cursor_next(struct ttm_bo_lru_cursor *curs) bo =3D res->bo; if (ttm_lru_walk_trylock(curs, bo)) bo_locked =3D true; - else if (!arg->ticket || arg->ctx->no_wait_gpu || arg->trylock_only) + else if ((!arg->ticket && !arg->sleeping_lock) || arg->ctx->no_wait_gpu = || + arg->trylock_only) continue; =20 if (!ttm_bo_get_unless_zero(bo)) { diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_r= esource.c index 9f36631d48b6..6867ada16545 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -937,3 +937,40 @@ void ttm_resource_manager_create_debugfs(struct ttm_re= source_manager *man, #endif } EXPORT_SYMBOL(ttm_resource_manager_create_debugfs); + +static int ttm_resource_manager_dmem_reclaim(struct dmem_cgroup_pool_state= *pool, + u64 target_bytes, void *priv) +{ + struct ttm_resource_manager *man =3D priv; + struct ttm_operation_ctx ctx =3D { .interruptible =3D true }; + s64 freed; + + freed =3D ttm_bo_evict_cgroup(man->bdev, man, pool, target_bytes, &ctx); + if (freed < 0) + return freed; + + return freed >=3D (s64)target_bytes ? 0 : -ENOSPC; +} + +/** + * ttm_resource_manager_set_dmem_region() - Associate a dmem cgroup region= with a + * resource manager and register a = reclaim + * callback. + * @man: The resource manager. + * @region: The dmem cgroup region to associate, may be NULL or IS_ERR(). + * + * Sets @man->cg and registers ttm_resource_manager_dmem_reclaim() so that + * writing to dmem.max below current usage triggers TTM eviction rather th= an + * returning -EBUSY to userspace. + */ +void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man, + struct dmem_cgroup_region *region) +{ + if (!IS_ERR_OR_NULL(region)) { + man->cg =3D region; + dmem_cgroup_region_set_reclaim(region, + ttm_resource_manager_dmem_reclaim, + man); + } +} +EXPORT_SYMBOL(ttm_resource_manager_set_dmem_region); diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 8310bc3d55f9..32791c4db2a9 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -226,6 +226,11 @@ struct ttm_lru_walk_arg { struct ww_acquire_ctx *ticket; /** @trylock_only: Only use trylock for locking. */ bool trylock_only; + /** + * @sleeping_lock: Use sleeping locks even with %NULL @ticket. + * @trylock_only has precedence over this field. + */ + bool sleeping_lock; }; =20 /** @@ -431,6 +436,11 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo); int ttm_bo_evict_first(struct ttm_device *bdev, struct ttm_resource_manager *man, struct ttm_operation_ctx *ctx); +s64 ttm_bo_evict_cgroup(struct ttm_device *bdev, + struct ttm_resource_manager *man, + struct dmem_cgroup_pool_state *limit_pool, + s64 target_bytes, + struct ttm_operation_ctx *ctx); int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset, void *buf, int len, int write); vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index 33e80f30b8b8..c187e6c8b871 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -39,6 +39,7 @@ =20 struct dentry; struct dmem_cgroup_device; +struct dmem_cgroup_region; struct drm_printer; struct ttm_device; struct ttm_resource_manager; @@ -475,6 +476,9 @@ void ttm_resource_manager_init(struct ttm_resource_mana= ger *man, struct ttm_device *bdev, uint64_t size); =20 +void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man, + struct dmem_cgroup_region *region); + int ttm_resource_manager_evict_all(struct ttm_device *bdev, struct ttm_resource_manager *man); =20 --=20 2.54.0 From nobody Sat Jun 13 00:24:00 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F379B44B67D; Mon, 11 May 2026 17:30:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520660; cv=none; b=rnKLxkFiOB5xf3naX9VkCa7+E6Qe2fOpl/yHk19ItX6lXkYUqTW1RvznV92XZqaCs3D2XiOMhFK2NdlFOCU9yuKrMOSk3f2pO1YAmIpEWAJvVWpXWaMrpkM0o4rzYhw50RwoPggQT3vdNFvTsJf/maGlHlzOcgK8YHjMZrk1JsU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520660; c=relaxed/simple; bh=D4EknCkaHCNIg8p5sLeL9NZg/F8aPtj1hSM+njyxWqg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=l9kEFKwEfgqkAVGEbAsqzPOi8OiJ01+8Jvqeq7ct0ZFQ8yg64lm5W/5tdu4dFF+k/Ik09r74RTdwWq/JcetZxhCLAd0HFJ2nzlvHI4TtD3M1cucNfJnkLha7dMN2visuaNJl6Xv7V+aRCz12x7pUleBn+ktgwRRVFlWjL7bRb0o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UDV2W6Uq; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UDV2W6Uq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778520659; x=1810056659; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D4EknCkaHCNIg8p5sLeL9NZg/F8aPtj1hSM+njyxWqg=; b=UDV2W6UqvVS2MVCnRmgAUAKmiz2AmqhRm+q78LoxZWciVr1+pvMaxZ1J JxrboQEljTzoGEmHl1xQlwryf6cSO/5APGDpQ3Wk8yqnCDbQxVkrN2Vln VYD1Kd1EAcw4z58/lSvP1+ReArtVAJuiBDybBJjlFrwWsXjRNP7fBmJd9 1ZbAgCkfZpayUunvA4ZsDzs64E+ILiVB90xGplg3bkp9z7mgmxJSEOhQl F/Pp8nS2XXTJlsvdWGScBpobWsE/ZpmSCapeTfRC0QztdA1drGxEkx3gv YJaYv4KwVhpuu4lhpxtZ06fAzk+o+bI4+2pOHJ5rJGFeYj2VqsqRdhMS4 A==; X-CSE-ConnectionGUID: qHy+V74YTwe1l9pafJIn8w== X-CSE-MsgGUID: 4WEQsdThQvO31fom+15qmg== X-IronPort-AV: E=McAfee;i="6800,10657,11783"; a="79314213" X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="79314213" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:58 -0700 X-CSE-ConnectionGUID: mZBnCvj3TlG9ul2tPZ5/0g== X-CSE-MsgGUID: sUW8LuzDQ7KgesZFC0Eyxw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="261000443" Received: from pgcooper-mobl3.ger.corp.intel.com (HELO fedora) ([10.245.244.248]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:53 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Natalie Vock , Johannes Weiner , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , cgroups@vger.kernel.org, Huang Rui , Matthew Brost , Matthew Auld , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Simona Vetter , David Airlie , =?UTF-8?q?Christian=20K=C3=B6nig?= , Alex Deucher , Rodrigo Vivi , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Date: Mon, 11 May 2026 19:30:07 +0200 Message-ID: <20260511173008.36526-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Register the VRAM manager with the dmem cgroup reclaim infrastructure so that lowering dmem.max below current VRAM usage triggers TTM eviction rather than failing with -EBUSY. Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_t= tm_vram_mgr.c index 5fd0d5506a7e..1bdcb3fee901 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -303,13 +303,6 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struc= t xe_ttm_vram_mgr *mgr, struct ttm_resource_manager *man =3D &mgr->manager; int err; =20 - if (mem_type !=3D XE_PL_STOLEN) { - const char *name =3D mem_type =3D=3D XE_PL_VRAM0 ? "vram0" : "vram1"; - man->cg =3D drmm_cgroup_register_region(&xe->drm, name, size); - if (IS_ERR(man->cg)) - return PTR_ERR(man->cg); - } - man->func =3D &xe_ttm_vram_mgr_func; mgr->mem_type =3D mem_type; mutex_init(&mgr->lock); @@ -318,6 +311,18 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struc= t xe_ttm_vram_mgr *mgr, mgr->visible_avail =3D io_size; =20 ttm_resource_manager_init(man, &xe->ttm, size); + + if (mem_type !=3D XE_PL_STOLEN) { + const char *name =3D mem_type =3D=3D XE_PL_VRAM0 ? "vram0" : "vram1"; + struct dmem_cgroup_region *cg =3D + drmm_cgroup_register_region(&xe->drm, name, size); + + if (IS_ERR(cg)) + return PTR_ERR(cg); + + ttm_resource_manager_set_dmem_region(man, cg); + } + err =3D gpu_buddy_init(&mgr->mm, man->size, default_page_size); if (err) return err; --=20 2.54.0 From nobody Sat Jun 13 00:24:00 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6F73472785; Mon, 11 May 2026 17:31:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520664; cv=none; b=rhYPXHCLb5+IDfwQfd13mRHlzFgo0k+pWnHWSxS+UUCUXrV7EZ8pUj8aROPkwMybB9Kgs1xsaI1Fnu65GAs/GzhFV7fOuLWm56ZIq9fNmwUGuXRkwqjelWO1Dvxj9p3ZFA3oVb7PyD/hOK4/IBD1VizHhWfE1O70SExBkQLUL6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778520664; c=relaxed/simple; bh=+LnmobJh1DnauE2NercKhjcRavXPbcfXqJhkutDMYck=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YnPpE+6OPrdUyRTBcHNTUiQnmKqg9huQ9gJdwvpUtKNQxYrDJgb1DXHf5YSQyp17HOd0pgahVKHvhxOD/QsdqO2b7T7+QZ0L7mc/1l6ZIf5jIusoi/xXtHvqKHJbiJiy5BAph3mAvfI7XVMdtR8GJv0ZR81AQ8GWfLrsqDuxwBw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TjeQZaay; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TjeQZaay" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778520663; x=1810056663; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+LnmobJh1DnauE2NercKhjcRavXPbcfXqJhkutDMYck=; b=TjeQZaayKth76rRy+4Xo/9ZAVU0LniylPYDWpum34pUMD6jQpJITNuo9 4WoGWxgZQNSRtWjL58CMbRtTMZpKLy4RV4AIT7gCxUKYDW6uiH0HkDPK3 0+PnKDU0RjSOqFGGwJdAArirYpWZytT8WGTJZ9lQdB5Es0YzglucgvCI/ 3ZhwUrgouFd5wDAKAXKKvnlfH8bn9Fc0lpRDpwMX9v/KIlnVK23F2Knhn HEQjpWGythmkFZz2ak8AM7ipHIbHPVI9VpHdpCsy3ayybbg83vgxBLV5y bVs/68pp0h5zPFjtsAyugPhLofwuGXOy6OP7Ef2FckGliJh3flt3rVEg/ Q==; X-CSE-ConnectionGUID: T2+/DA1RTDyYJK50Apc3/g== X-CSE-MsgGUID: hIoQBN05Sfabcr7icv9fPQ== X-IronPort-AV: E=McAfee;i="6800,10657,11783"; a="79314247" X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="79314247" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:31:03 -0700 X-CSE-ConnectionGUID: DvbSBq9sQdWWe7FxUzkH4A== X-CSE-MsgGUID: G/tszHa2SrybPgfcN93OIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,229,1770624000"; d="scan'208";a="261000497" Received: from pgcooper-mobl3.ger.corp.intel.com (HELO fedora) ([10.245.244.248]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2026 10:30:58 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Natalie Vock , Johannes Weiner , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , cgroups@vger.kernel.org, Huang Rui , Matthew Brost , Matthew Auld , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Simona Vetter , David Airlie , =?UTF-8?q?Christian=20K=C3=B6nig?= , Alex Deucher , Rodrigo Vivi , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 5/5] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager Date: Mon, 11 May 2026 19:30:08 +0200 Message-ID: <20260511173008.36526-6-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Register the VRAM manager with the dmem cgroup reclaim infrastructure so that lowering dmem.max below current VRAM usage triggers TTM eviction rather than failing with -EBUSY. Guard place->flags in amdgpu_ttm_bo_eviction_valuable() against NULL, as the TTM reclaim path passes a NULL place in cgroup drain mode. v3: - Rebased on fix for uninitialized list and buddy allocator on the drmm_cgroup_register_region() error path. Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 ++++++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/= amdgpu/amdgpu_ttm.c index 0dc68fb9d88e..334a177ae8d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1485,7 +1485,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct tt= m_buffer_object *bo, dma_resv_for_each_fence(&resv_cursor, bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, f) { if (amdkfd_fence_check_mm(f, current->mm) && - !(place->flags & TTM_PL_FLAG_CONTIGUOUS)) + !(place && (place->flags & TTM_PL_FLAG_CONTIGUOUS))) return false; } =20 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm= /amd/amdgpu/amdgpu_vram_mgr.c index ac3f71d77140..a1f1ae264a40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -916,6 +916,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) { struct amdgpu_vram_mgr *mgr =3D &adev->mman.vram_mgr; struct ttm_resource_manager *man =3D &mgr->manager; + struct dmem_cgroup_region *cg; int err; =20 ttm_resource_manager_init(man, &adev->mman.bdev, @@ -932,9 +933,11 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) if (err) return err; =20 - man->cg =3D drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->= gmc.real_vram_size); - if (IS_ERR(man->cg)) - return PTR_ERR(man->cg); + cg =3D drmm_cgroup_register_region(adev_to_drm(adev), "vram", + adev->gmc.real_vram_size); + if (IS_ERR(cg)) + return PTR_ERR(cg); + ttm_resource_manager_set_dmem_region(man, cg); =20 ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager); ttm_resource_manager_set_used(man, true); --=20 2.54.0