From nobody Thu Apr 9 13:31:31 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8278730B53E for ; Mon, 2 Mar 2026 16:33:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469194; cv=none; b=quQDu0rUdbQUw7pOmyhFJ1evZX9u30eWbnUPZEBo+a1Ze9Y2K8MfGoiLWnYaIwj4d8WrYjNav6SrD/hCDXt1jcKyWmydF0U/2+vM4uhSlOkVZ5R0avUY3sW4yHZbaB7AetLXPeocm1OgChR91bqUZIscEsvwOfY5Dnjj7lSkrf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469194; c=relaxed/simple; bh=+dB6I97jnF3C+kz4LYr+QDqcLMSjhH89nLttR0xMEkA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nLrB9yo1ErVY8q9dy7zSLhHI8fPhbSeX0LPSf7fe4U35zRYL5u9tPM4+jIY2/4zRkyWLAT1eRY3HUDE5U04Gc9oO9pMEZokQwk00jtWwoGcOpUuQg0KKQ6ctIzUWWv7veA6Vkfz35GWzUoofVaNGH0ncgMsIpapGlzFau7T+d/g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DAEWngh1; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DAEWngh1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772469192; x=1804005192; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+dB6I97jnF3C+kz4LYr+QDqcLMSjhH89nLttR0xMEkA=; b=DAEWngh1L3YtMSMVSfyiutybU0aR45Nh2nJArTB/HGDtKh5D00Eelvi0 DYSzDKk4Ik9XX/P6IUbYP+T7sMwIIK3cjVFK82RVxkjL1u6t2rpQcDpji REFJXphP1nOjTRSiyPvQ7TOLV0rmSZwK8XPMf7eZDI3/S1dt+cbwrXEb3 4vDak+lxqNxwfZfwHJ7yusVeoL0EofsIqHBOUXhbvXeBalJEi/F/d2GGt 0BcqauyaWB0OMrqzYVXKdeg3VZ4ZTQjz+5TeNneEE4uwqOXFp+4HCHwRV AgY8Uyxx3lGXSHHqqv1RT2V/czuq3aanIpmBNr7mrJYoInhfBj5xnyeM6 A==; X-CSE-ConnectionGUID: IG6GrbiOSZWQBBI0PNV+IQ== X-CSE-MsgGUID: 0t1z8UN5TDW7K92eJxd/DA== X-IronPort-AV: E=McAfee;i="6800,10657,11717"; a="73447819" X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="73447819" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:12 -0800 X-CSE-ConnectionGUID: pUth11IySXeudJp5F/KqPA== X-CSE-MsgGUID: nOQS+yt6QGaZqsL5agaPdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="255564512" Received: from smoticic-mobl1.ger.corp.intel.com (HELO fedora) ([10.245.244.81]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:10 -0800 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Jason Gunthorpe , Andrew Morton , Simona Vetter , Dave Airlie , Alistair Popple , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Matthew Brost , =?UTF-8?q?Christian=20K=C3=B6nig?= Subject: [PATCH v2 1/4] mm/mmu_notifier: Allow two-pass struct mmu_interval_notifiers Date: Mon, 2 Mar 2026 17:32:45 +0100 Message-ID: <20260302163248.105454-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable GPU use-cases for mmu_interval_notifiers with hmm often involve starting a gpu operation and then waiting for it to complete. These operations are typically context preemption or TLB flushing. With single-pass notifiers per GPU this doesn't scale in multi-gpu scenarios. In those scenarios we'd want to first start preemption- or TLB flushing on all GPUs and as a second pass wait for them to complete. One can do this on per-driver basis multiplexing per-driver notifiers but that would mean sharing the notifier "user" lock across all GPUs and that doesn't scale well either, so adding support for multi-pass in the core appears to be the right choice. Implement two-pass capability in the mmu_interval_notifier. Use a linked list for the final passes to minimize the impact for use-cases that don't need the multi-pass functionality by avoiding a second interval tree walk, and to be able to easily pass data between the two passes. v1: - Restrict to two passes (Jason Gunthorpe) - Improve on documentation (Jason Gunthorpe) - Improve on function naming (Alistair Popple) v2: - Include the invalidate_finish() callback in the struct mmu_interval_notifier_ops. - Update documentation (GitHub Copilot:claude-sonnet-4.6) - Use lockless list for list management. Cc: Jason Gunthorpe Cc: Andrew Morton Cc: Simona Vetter Cc: Dave Airlie Cc: Alistair Popple Cc: Cc: Cc: Assisted-by: GitHub Copilot:claude-sonnet-4.6 # Documentation only. Signed-off-by: Thomas Hellstr=C3=B6m --- include/linux/mmu_notifier.h | 38 +++++++++++++++++++++ mm/mmu_notifier.c | 64 +++++++++++++++++++++++++++++++----- 2 files changed, 93 insertions(+), 9 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 07a2bbaf86e9..de0e742ea808 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -233,16 +233,54 @@ struct mmu_notifier { unsigned int users; }; =20 +/** + * struct mmu_interval_notifier_finish - mmu_interval_notifier two-pass ab= straction + * @link: List link for the notifiers pending pass list + * @notifier: The mmu_interval_notifier for which the finish pass is calle= d. + * + * Allocate, typically using GFP_NOWAIT in the interval notifier's first p= ass. + * If allocation fails (which is not unlikely under memory pressure), fall= back + * to single-pass operation. Note that with a large number of notifiers + * implementing two passes, allocation with GFP_NOWAIT will become increas= ingly + * likely to fail, so consider implementing a small pool instead of using + * kmalloc() allocations. + * + * If the implementation needs to pass data between the two passes, + * the recommended way is to embed struct mmu_interval_notifier_finish int= o a larger + * structure that also contains the data needed to be shared. Keep in mind= that + * a notifier callback can be invoked in parallel, and each invocation nee= ds its + * own struct mmu_interval_notifier_finish. + */ +struct mmu_interval_notifier_finish { + struct llist_node link; + struct mmu_interval_notifier *notifier; +}; + /** * struct mmu_interval_notifier_ops * @invalidate: Upon return the caller must stop using any SPTEs within th= is * range. This function can sleep. Return false only if sleep= ing * was required but mmu_notifier_range_blockable(range) is fa= lse. + * @invalidate_start: Similar to @invalidate, but intended for two-pass no= tifier + * callbacks where the call to @invalidate_start is the= first + * pass and any struct mmu_interval_notifier_finish poi= nter + * returned in the @finish parameter describes the fina= l pass. + * If @finish is %NULL on return, then no final pass wi= ll be + * called. + * @invalidate_finish: Called as the second pass for any notifier that ret= urned + * a non-NULL @finish from @invalidate_start. The @fin= ish + * pointer passed here is the same one returned by + * @invalidate_start. */ struct mmu_interval_notifier_ops { bool (*invalidate)(struct mmu_interval_notifier *interval_sub, const struct mmu_notifier_range *range, unsigned long cur_seq); + bool (*invalidate_start)(struct mmu_interval_notifier *interval_sub, + const struct mmu_notifier_range *range, + unsigned long cur_seq, + struct mmu_interval_notifier_finish **finish); + void (*invalidate_finish)(struct mmu_interval_notifier_finish *finish); }; =20 struct mmu_interval_notifier { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index a6cdf3674bdc..38acd5ef8eb0 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -260,6 +260,15 @@ mmu_interval_read_begin(struct mmu_interval_notifier *= interval_sub) } EXPORT_SYMBOL_GPL(mmu_interval_read_begin); =20 +static void mn_itree_finish_pass(struct llist_head *finish_passes) +{ + struct llist_node *first =3D llist_reverse_order(__llist_del_all(finish_p= asses)); + struct mmu_interval_notifier_finish *f, *next; + + llist_for_each_entry_safe(f, next, first, link) + f->notifier->ops->invalidate_finish(f); +} + static void mn_itree_release(struct mmu_notifier_subscriptions *subscripti= ons, struct mm_struct *mm) { @@ -271,6 +280,7 @@ static void mn_itree_release(struct mmu_notifier_subscr= iptions *subscriptions, .end =3D ULONG_MAX, }; struct mmu_interval_notifier *interval_sub; + LLIST_HEAD(finish_passes); unsigned long cur_seq; bool ret; =20 @@ -278,11 +288,27 @@ static void mn_itree_release(struct mmu_notifier_subs= criptions *subscriptions, mn_itree_inv_start_range(subscriptions, &range, &cur_seq); interval_sub; interval_sub =3D mn_itree_inv_next(interval_sub, &range)) { - ret =3D interval_sub->ops->invalidate(interval_sub, &range, - cur_seq); + if (interval_sub->ops->invalidate_start) { + struct mmu_interval_notifier_finish *finish =3D NULL; + + ret =3D interval_sub->ops->invalidate_start(interval_sub, + &range, + cur_seq, + &finish); + if (ret && finish) { + finish->notifier =3D interval_sub; + __llist_add(&finish->link, &finish_passes); + } + + } else { + ret =3D interval_sub->ops->invalidate(interval_sub, + &range, + cur_seq); + } WARN_ON(!ret); } =20 + mn_itree_finish_pass(&finish_passes); mn_itree_inv_end(subscriptions); } =20 @@ -430,7 +456,9 @@ static int mn_itree_invalidate(struct mmu_notifier_subs= criptions *subscriptions, const struct mmu_notifier_range *range) { struct mmu_interval_notifier *interval_sub; + LLIST_HEAD(finish_passes); unsigned long cur_seq; + int err =3D 0; =20 for (interval_sub =3D mn_itree_inv_start_range(subscriptions, range, &cur_seq); @@ -438,23 +466,41 @@ static int mn_itree_invalidate(struct mmu_notifier_su= bscriptions *subscriptions, interval_sub =3D mn_itree_inv_next(interval_sub, range)) { bool ret; =20 - ret =3D interval_sub->ops->invalidate(interval_sub, range, - cur_seq); + if (interval_sub->ops->invalidate_start) { + struct mmu_interval_notifier_finish *finish =3D NULL; + + ret =3D interval_sub->ops->invalidate_start(interval_sub, + range, + cur_seq, + &finish); + if (ret && finish) { + finish->notifier =3D interval_sub; + __llist_add(&finish->link, &finish_passes); + } + + } else { + ret =3D interval_sub->ops->invalidate(interval_sub, + range, + cur_seq); + } if (!ret) { if (WARN_ON(mmu_notifier_range_blockable(range))) continue; - goto out_would_block; + err =3D -EAGAIN; + break; } } - return 0; =20 -out_would_block: + mn_itree_finish_pass(&finish_passes); + /* * On -EAGAIN the non-blocking caller is not allowed to call * invalidate_range_end() */ - mn_itree_inv_end(subscriptions); - return -EAGAIN; + if (err) + mn_itree_inv_end(subscriptions); + + return err; } =20 static int mn_hlist_invalidate_range_start( --=20 2.53.0 From nobody Thu Apr 9 13:31:31 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 921A741C31D for ; Mon, 2 Mar 2026 16:33:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469196; cv=none; b=EjqPbE+Bsc8eySyppF2sLDY6zaSf0F51Efmh+fTe5NgJad3tQMoTroXFDiG3Y/m7AfPHB5XelGiOrQtogqeaosBy+MU+2jCPw/dR/ICCoLfEfHNbQVpu1hIaqOEB/Pu1AprViUELzvFksRMT3x5DWMG04IzcPPE3c5tGytOnpwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469196; c=relaxed/simple; bh=xybYJgKcjIrQv217e/P+1Z/wnP7KMdw9tzTC5IHcWI4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H+aBKYxpzzzQw6+P4GK2L+1H19ySad5sO7qP4IApE9M7OAm5nls51O+5E6Lb1KNXWG2G+VCjEEVKDa+jQzJF2LWvDaJ0df+h/5PQnfbvtgi1qE7odJttJPu8VUUhx4gBFh5DVz4ZPOzUMVfgD4Fa+pex6S/wG/mpeVB161emYQg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WSjMvcuh; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WSjMvcuh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772469195; x=1804005195; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xybYJgKcjIrQv217e/P+1Z/wnP7KMdw9tzTC5IHcWI4=; b=WSjMvcuhwR06HWo8Azk/rAMeNL+R9lboSTtTDukAvJ7jXOkLGkO6leID nVVgeFfvjwxUyvQieVU0ATbvzQnZFGrgDYjuxgFgGJ1eYgA2q+5e9YJaB +JXd8FhX7yJ9u9XZgV7H9E6QhuJ6Kov0HU948tBqlyM/1ZFFD+QcbKPWM lAWMbaftO0OwksYtV89qyed1QPYK/xLPySbXByI5GrhGpaw9PUNvO9gAy eKRNhD6UXo+ejJI+VFPtKEVy1s6YkNW58Z2RnXkyYrcMEUa9qojmC7ITi FFVdCYOFemSN6frwy7JSvhzqn0AJRnwim2UUbl8N4nk2DAozEqY5h6/16 Q==; X-CSE-ConnectionGUID: dstFj4nJQ0K67kcLI+Ijxw== X-CSE-MsgGUID: G8d7sWNES3m/ucwPGkaj3g== X-IronPort-AV: E=McAfee;i="6800,10657,11717"; a="73447837" X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="73447837" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:15 -0800 X-CSE-ConnectionGUID: HyEPAEjNRT+ouo20DbAkTw== X-CSE-MsgGUID: wN2u68H7QY274Oz8dz2kMw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="255564528" Received: from smoticic-mobl1.ger.corp.intel.com (HELO fedora) ([10.245.244.81]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:13 -0800 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Matthew Brost , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, Jason Gunthorpe , Andrew Morton , Simona Vetter , Dave Airlie , Alistair Popple , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/4] drm/xe/userptr: Convert invalidation to two-pass MMU notifier Date: Mon, 2 Mar 2026 17:32:46 +0100 Message-ID: <20260302163248.105454-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable In multi-GPU scenarios, asynchronous GPU job latency is a bottleneck if each notifier waits for its own GPU before returning. The two-pass mmu_interval_notifier infrastructure allows deferring the wait to a second pass, so all GPUs can be signalled in the first pass before any of them are waited on. Convert the userptr invalidation to use the two-pass model: Use invalidate_start as the first pass to mark the VMA for repin and enable software signalling on the VM reservation fences to start any gpu work needed for signaling. Fall back to completing the work synchronously if all fences are already signalled, or if a concurrent invalidation is already using the embedded finish structure. Use invalidate_finish as the second pass to wait for the reservation fences to complete, invalidate the GPU TLB in fault mode, and unmap the gpusvm pages. Embed a struct mmu_interval_notifier_finish in struct xe_userptr to avoid dynamic allocation in the notifier callback. Use a finish_inuse flag to prevent two concurrent invalidations from using it simultaneously; fall back to the synchronous path for the second caller. Assisted-by: GitHub Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_userptr.c | 96 +++++++++++++++++++++++++-------- drivers/gpu/drm/xe/xe_userptr.h | 14 +++++ 2 files changed, 88 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_userptr.c b/drivers/gpu/drm/xe/xe_userpt= r.c index e120323c43bc..440b0a79d16f 100644 --- a/drivers/gpu/drm/xe/xe_userptr.c +++ b/drivers/gpu/drm/xe/xe_userptr.c @@ -73,18 +73,42 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvm= a) &ctx); } =20 -static void __vma_userptr_invalidate(struct xe_vm *vm, struct xe_userptr_v= ma *uvma) +static void xe_vma_userptr_do_inval(struct xe_vm *vm, struct xe_userptr_vm= a *uvma, + bool is_deferred) { struct xe_userptr *userptr =3D &uvma->userptr; struct xe_vma *vma =3D &uvma->vma; - struct dma_resv_iter cursor; - struct dma_fence *fence; struct drm_gpusvm_ctx ctx =3D { .in_notifier =3D true, .read_only =3D xe_vma_read_only(vma), }; long err; =20 + err =3D dma_resv_wait_timeout(xe_vm_resv(vm), + DMA_RESV_USAGE_BOOKKEEP, + false, MAX_SCHEDULE_TIMEOUT); + XE_WARN_ON(err <=3D 0); + + if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) { + err =3D xe_vm_invalidate_vma(vma); + XE_WARN_ON(err); + } + + if (is_deferred) + userptr->finish_inuse =3D false; + drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma->userptr.pages, + xe_vma_size(vma) >> PAGE_SHIFT, &ctx); +} + +static struct mmu_interval_notifier_finish * +xe_vma_userptr_invalidate_pass1(struct xe_vm *vm, struct xe_userptr_vma *u= vma) +{ + struct xe_userptr *userptr =3D &uvma->userptr; + struct xe_vma *vma =3D &uvma->vma; + struct dma_resv_iter cursor; + struct dma_fence *fence; + bool signaled =3D true; + /* * Tell exec and rebind worker they need to repin and rebind this * userptr. @@ -105,27 +129,32 @@ static void __vma_userptr_invalidate(struct xe_vm *vm= , struct xe_userptr_vma *uv */ dma_resv_iter_begin(&cursor, xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP); - dma_resv_for_each_fence_unlocked(&cursor, fence) + dma_resv_for_each_fence_unlocked(&cursor, fence) { dma_fence_enable_sw_signaling(fence); + if (signaled && !dma_fence_is_signaled(fence)) + signaled =3D false; + } dma_resv_iter_end(&cursor); =20 - err =3D dma_resv_wait_timeout(xe_vm_resv(vm), - DMA_RESV_USAGE_BOOKKEEP, - false, MAX_SCHEDULE_TIMEOUT); - XE_WARN_ON(err <=3D 0); - - if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) { - err =3D xe_vm_invalidate_vma(vma); - XE_WARN_ON(err); + /* + * Only one caller at a time can use the multi-pass state. + * If it's already in use, or all fences are already signaled, + * proceed directly to invalidation without deferring. + */ + if (signaled || userptr->finish_inuse) { + xe_vma_userptr_do_inval(vm, uvma, false); + return NULL; } =20 - drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma->userptr.pages, - xe_vma_size(vma) >> PAGE_SHIFT, &ctx); + userptr->finish_inuse =3D true; + + return &userptr->finish; } =20 -static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni, - const struct mmu_notifier_range *range, - unsigned long cur_seq) +static bool xe_vma_userptr_invalidate_start(struct mmu_interval_notifier *= mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq, + struct mmu_interval_notifier_finish **p_finish) { struct xe_userptr_vma *uvma =3D container_of(mni, typeof(*uvma), userptr.= notifier); struct xe_vma *vma =3D &uvma->vma; @@ -138,21 +167,40 @@ static bool vma_userptr_invalidate(struct mmu_interva= l_notifier *mni, return false; =20 vm_dbg(&xe_vma_vm(vma)->xe->drm, - "NOTIFIER: addr=3D0x%016llx, range=3D0x%016llx", + "NOTIFIER PASS1: addr=3D0x%016llx, range=3D0x%016llx", xe_vma_start(vma), xe_vma_size(vma)); =20 down_write(&vm->svm.gpusvm.notifier_lock); mmu_interval_set_seq(mni, cur_seq); =20 - __vma_userptr_invalidate(vm, uvma); + *p_finish =3D xe_vma_userptr_invalidate_pass1(vm, uvma); + up_write(&vm->svm.gpusvm.notifier_lock); - trace_xe_vma_userptr_invalidate_complete(vma); + if (!*p_finish) + trace_xe_vma_userptr_invalidate_complete(vma); =20 return true; } =20 +static void xe_vma_userptr_invalidate_finish(struct mmu_interval_notifier_= finish *finish) +{ + struct xe_userptr_vma *uvma =3D container_of(finish, typeof(*uvma), userp= tr.finish); + struct xe_vma *vma =3D &uvma->vma; + struct xe_vm *vm =3D xe_vma_vm(vma); + + vm_dbg(&xe_vma_vm(vma)->xe->drm, + "NOTIFIER PASS2: addr=3D0x%016llx, range=3D0x%016llx", + xe_vma_start(vma), xe_vma_size(vma)); + + down_write(&vm->svm.gpusvm.notifier_lock); + xe_vma_userptr_do_inval(vm, uvma, true); + up_write(&vm->svm.gpusvm.notifier_lock); + trace_xe_vma_userptr_invalidate_complete(vma); +} + static const struct mmu_interval_notifier_ops vma_userptr_notifier_ops =3D= { - .invalidate =3D vma_userptr_invalidate, + .invalidate_start =3D xe_vma_userptr_invalidate_start, + .invalidate_finish =3D xe_vma_userptr_invalidate_finish, }; =20 #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) @@ -164,6 +212,7 @@ static const struct mmu_interval_notifier_ops vma_userp= tr_notifier_ops =3D { */ void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma) { + static struct mmu_interval_notifier_finish *finish; struct xe_vm *vm =3D xe_vma_vm(&uvma->vma); =20 /* Protect against concurrent userptr pinning */ @@ -179,7 +228,10 @@ void xe_vma_userptr_force_invalidate(struct xe_userptr= _vma *uvma) if (!mmu_interval_read_retry(&uvma->userptr.notifier, uvma->userptr.pages.notifier_seq)) uvma->userptr.pages.notifier_seq -=3D 2; - __vma_userptr_invalidate(vm, uvma); + + finish =3D xe_vma_userptr_invalidate_pass1(vm, uvma); + if (finish) + xe_vma_userptr_do_inval(vm, uvma, true); } #endif =20 diff --git a/drivers/gpu/drm/xe/xe_userptr.h b/drivers/gpu/drm/xe/xe_userpt= r.h index ef801234991e..4f42db61fd62 100644 --- a/drivers/gpu/drm/xe/xe_userptr.h +++ b/drivers/gpu/drm/xe/xe_userptr.h @@ -57,12 +57,26 @@ struct xe_userptr { */ struct mmu_interval_notifier notifier; =20 + /** + * @finish: MMU notifier finish structure for two-pass invalidation. + * Embedded here to avoid allocation in the notifier callback. + * Protected by @vm::svm.gpusvm.notifier_lock. + */ + struct mmu_interval_notifier_finish finish; + /** + * @finish_inuse: Whether @finish is currently in use by an in-progress + * two-pass invalidation. + * Protected by @vm::svm.gpusvm.notifier_lock. + */ + bool finish_inuse; + /** * @initial_bind: user pointer has been bound at least once. * write: vm->svm.gpusvm.notifier_lock in read mode and vm->resv held. * read: vm->svm.gpusvm.notifier_lock in write mode or vm->resv held. */ bool initial_bind; + #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) u32 divisor; #endif --=20 2.53.0 From nobody Thu Apr 9 13:31:31 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8133E4218B9 for ; Mon, 2 Mar 2026 16:33:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469199; cv=none; b=sknO9vcYBkyC56uaGCOJCSSlnhTGgoDvWe8+bciZfWeyhTUptBXS8j7D/rtykfksx6q+ajipu9XCn5+bSWhRLinKb96eWwMJOj7dNLDvLUBshOYFja/jSxC2UP0H0bZnQorYhq8HVGLrdGy6+Mloo8T7UT6CDHSwFOtvnQwsWyM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469199; c=relaxed/simple; bh=REVi10SRu9z3ZPsNcbifTyehMK0hQmtR42Guy7n5z44=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hARZaKh7n2GPjZs6GF5s/d+zenuUyzJKzSSjtVToTVldDjRxH03gRAcg3kRLOg+zyyG+dyCcgYkkXU3E8Zquy6MItKgDA2q/MUFW7xZuxFYKhpkY4yDkM2kJ1Rw4h248f1mDn8rnOZsQTcu9skvm88qYiBYvDO39+n+rq6w8g44= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RNVMLb8X; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RNVMLb8X" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772469198; x=1804005198; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=REVi10SRu9z3ZPsNcbifTyehMK0hQmtR42Guy7n5z44=; b=RNVMLb8X3kBYmmiA2WPHwYDpcQg400rnpf5K7eVwTeZH9ZKnsfIimY2n IV7tFjZWEPnCuS1Vk72JeO5IWxDsFZ5xaPOqne+y5ZOJ4c+uKPIuhKuqt utdC+5FRyQ7BkfRnJ3VlKOgJ4+4ySYqeu8QrRyCmKYjuZnbr0rgplbMS1 Pe0rsmq8HEnkGAa5e3P1LnyADXsjRhpVS9JzxR8qqJf1TpVy4kjvDEgPK J+wDRJpZBYAwgSZseOcQu11F6qxsuJyeyvzBhs8MPF7o47nUwHDz2xvt/ 8+PsdvRkXoWf3qEgkJLTrVM02qMPcco6NeI7QP6vc0dSGTynYX+OU30NG A==; X-CSE-ConnectionGUID: sR91HJWgTDaf7TWxZn14wg== X-CSE-MsgGUID: ZSg7aY9zRxiAZPnPoZTd0w== X-IronPort-AV: E=McAfee;i="6800,10657,11717"; a="73447848" X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="73447848" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:18 -0800 X-CSE-ConnectionGUID: K+CRsj+RQZ6ru4vAsX+Wvw== X-CSE-MsgGUID: p97XZGPvQkm1YvytgY0cqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="255564533" Received: from smoticic-mobl1.ger.corp.intel.com (HELO fedora) ([10.245.244.81]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:16 -0800 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Matthew Brost , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, Jason Gunthorpe , Andrew Morton , Simona Vetter , Dave Airlie , Alistair Popple , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/4] drm/xe: Split TLB invalidation into submit and wait steps Date: Mon, 2 Mar 2026 17:32:47 +0100 Message-ID: <20260302163248.105454-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable xe_vm_range_tilemask_tlb_inval() submits TLB invalidation requests to all GTs in a tile mask and then immediately waits for them to complete before returning. This is fine for the existing callers, but a subsequent patch will need to defer the wait in order to overlap TLB invalidations across multiple VMAs. Introduce xe_tlb_inval_range_tilemask_submit() and xe_tlb_inval_batch_wait() in xe_tlb_inval.c as the submit and wait halves respectively. The batch of fences is carried in the new xe_tlb_inval_batch structure. Remove xe_vm_range_tilemask_tlb_inval() and convert all three call sites to the new API. Assisted-by: GitHub Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_svm.c | 6 +- drivers/gpu/drm/xe/xe_tlb_inval.c | 82 +++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_tlb_inval.h | 6 ++ drivers/gpu/drm/xe/xe_tlb_inval_types.h | 14 +++++ drivers/gpu/drm/xe/xe_vm.c | 69 +++------------------ drivers/gpu/drm/xe/xe_vm.h | 3 - drivers/gpu/drm/xe/xe_vm_madvise.c | 9 ++- drivers/gpu/drm/xe/xe_vm_types.h | 1 + 8 files changed, 123 insertions(+), 67 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c index 002b6c22ad3f..6ea4972c2791 100644 --- a/drivers/gpu/drm/xe/xe_svm.c +++ b/drivers/gpu/drm/xe/xe_svm.c @@ -19,6 +19,7 @@ #include "xe_pt.h" #include "xe_svm.h" #include "xe_tile.h" +#include "xe_tlb_inval.h" #include "xe_ttm_vram_mgr.h" #include "xe_vm.h" #include "xe_vm_types.h" @@ -225,6 +226,7 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, const struct mmu_notifier_range *mmu_range) { struct xe_vm *vm =3D gpusvm_to_vm(gpusvm); + struct xe_tlb_inval_batch _batch; struct xe_device *xe =3D vm->xe; struct drm_gpusvm_range *r, *first; struct xe_tile *tile; @@ -276,7 +278,9 @@ static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, =20 xe_device_wmb(xe); =20 - err =3D xe_vm_range_tilemask_tlb_inval(vm, adj_start, adj_end, tile_mask); + err =3D xe_tlb_inval_range_tilemask_submit(xe, vm->usm.asid, adj_start, a= dj_end, + tile_mask, &_batch); + xe_tlb_inval_batch_wait(&_batch); WARN_ON_ONCE(err); =20 range_notifier_event_end: diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_= inval.c index 933f30fb617d..343e37cfe715 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval.c +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c @@ -486,3 +486,85 @@ bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval) guard(spinlock_irq)(&tlb_inval->pending_lock); return list_is_singular(&tlb_inval->pending_fences); } + +/** + * xe_tlb_inval_batch_wait() - Wait for all fences in a TLB invalidation b= atch + * @batch: Batch of TLB invalidation fences to wait on + * + * Waits for every fence in @batch to signal, then resets @batch so it can= be + * reused for a subsequent invalidation. + */ +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch) +{ + struct xe_tlb_inval_fence *fence =3D &batch->fence[0]; + unsigned int i; + + for (i =3D 0; i < batch->num_fences; ++i) + xe_tlb_inval_fence_wait(fence++); + + batch->num_fences =3D 0; +} + +/** + * xe_tlb_inval_range_tilemask_submit() - Submit TLB invalidations for an + * address range on a tile mask + * @xe: The xe device + * @asid: Address space ID + * @start: start address + * @end: end address + * @tile_mask: mask for which gt's issue tlb invalidation + * @batch: Batch of tlb invalidate fences + * + * Issue a range based TLB invalidation for gt's in tilemask + * + * Returns 0 for success, negative error code otherwise. + */ +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, + u64 start, u64 end, u8 tile_mask, + struct xe_tlb_inval_batch *batch) +{ + struct xe_tlb_inval_fence *fence =3D &batch->fence[0]; + struct xe_tile *tile; + u32 fence_id =3D 0; + u8 id; + int err; + + batch->num_fences =3D 0; + if (!tile_mask) + return 0; + + for_each_tile(tile, xe, id) { + if (!(tile_mask & BIT(id))) + continue; + + xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, + &fence[fence_id], true); + + err =3D xe_tlb_inval_range(&tile->primary_gt->tlb_inval, + &fence[fence_id], start, end, + asid, NULL); + if (err) + goto wait; + ++fence_id; + + if (!tile->media_gt) + continue; + + xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, + &fence[fence_id], true); + + err =3D xe_tlb_inval_range(&tile->media_gt->tlb_inval, + &fence[fence_id], start, end, + asid, NULL); + if (err) + goto wait; + ++fence_id; + } + +wait: + batch->num_fences =3D fence_id; + if (err) + xe_tlb_inval_batch_wait(batch); + + return err; +} diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h b/drivers/gpu/drm/xe/xe_tlb_= inval.h index 62089254fa23..a76b7823a5f2 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval.h +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h @@ -45,4 +45,10 @@ void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_= inval, int seqno); =20 bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval); =20 +int xe_tlb_inval_range_tilemask_submit(struct xe_device *xe, u32 asid, + u64 start, u64 end, u8 tile_mask, + struct xe_tlb_inval_batch *batch); + +void xe_tlb_inval_batch_wait(struct xe_tlb_inval_batch *batch); + #endif /* _XE_TLB_INVAL_ */ diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/x= e_tlb_inval_types.h index 3b089f90f002..3d1797d186fd 100644 --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h @@ -9,6 +9,8 @@ #include #include =20 +#include "xe_device_types.h" + struct drm_suballoc; struct xe_tlb_inval; =20 @@ -132,4 +134,16 @@ struct xe_tlb_inval_fence { ktime_t inval_time; }; =20 +/** + * struct xe_tlb_inval_batch - Batch of TLB invalidation fences + * + * Holds one fence per GT covered by a TLB invalidation request. + */ +struct xe_tlb_inval_batch { + /** @fence: per-GT TLB invalidation fences */ + struct xe_tlb_inval_fence fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_T= ILE]; + /** @num_fences: number of valid entries in @fence */ + unsigned int num_fences; +}; + #endif diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 548b0769b3ef..7f29d2b2972d 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -3966,66 +3966,6 @@ void xe_vm_unlock(struct xe_vm *vm) dma_resv_unlock(xe_vm_resv(vm)); } =20 -/** - * xe_vm_range_tilemask_tlb_inval - Issue a TLB invalidation on this tilem= ask for an - * address range - * @vm: The VM - * @start: start address - * @end: end address - * @tile_mask: mask for which gt's issue tlb invalidation - * - * Issue a range based TLB invalidation for gt's in tilemask - * - * Returns 0 for success, negative error code otherwise. - */ -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, - u64 end, u8 tile_mask) -{ - struct xe_tlb_inval_fence - fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; - struct xe_tile *tile; - u32 fence_id =3D 0; - u8 id; - int err; - - if (!tile_mask) - return 0; - - for_each_tile(tile, vm->xe, id) { - if (!(tile_mask & BIT(id))) - continue; - - xe_tlb_inval_fence_init(&tile->primary_gt->tlb_inval, - &fence[fence_id], true); - - err =3D xe_tlb_inval_range(&tile->primary_gt->tlb_inval, - &fence[fence_id], start, end, - vm->usm.asid, NULL); - if (err) - goto wait; - ++fence_id; - - if (!tile->media_gt) - continue; - - xe_tlb_inval_fence_init(&tile->media_gt->tlb_inval, - &fence[fence_id], true); - - err =3D xe_tlb_inval_range(&tile->media_gt->tlb_inval, - &fence[fence_id], start, end, - vm->usm.asid, NULL); - if (err) - goto wait; - ++fence_id; - } - -wait: - for (id =3D 0; id < fence_id; ++id) - xe_tlb_inval_fence_wait(&fence[id]); - - return err; -} - /** * xe_vm_invalidate_vma - invalidate GPU mappings for VMA without a lock * @vma: VMA to invalidate @@ -4040,6 +3980,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) { struct xe_device *xe =3D xe_vma_vm(vma)->xe; struct xe_vm *vm =3D xe_vma_vm(vma); + struct xe_tlb_inval_batch _batch; struct xe_tile *tile; u8 tile_mask =3D 0; int ret =3D 0; @@ -4080,12 +4021,16 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) =20 xe_device_wmb(xe); =20 - ret =3D xe_vm_range_tilemask_tlb_inval(xe_vma_vm(vma), xe_vma_start(vma), - xe_vma_end(vma), tile_mask); + ret =3D xe_tlb_inval_range_tilemask_submit(xe, xe_vma_vm(vma)->usm.asid, + xe_vma_start(vma), xe_vma_end(vma), + tile_mask, &_batch); =20 /* WRITE_ONCE pairs with READ_ONCE in xe_vm_has_valid_gpu_mapping() */ WRITE_ONCE(vma->tile_invalidated, vma->tile_mask); =20 + if (!ret) + xe_tlb_inval_batch_wait(&_batch); + return ret; } =20 diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index f849e369432b..62f4b6fec0bc 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -240,9 +240,6 @@ struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm, struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, struct xe_svm_range *range); =20 -int xe_vm_range_tilemask_tlb_inval(struct xe_vm *vm, u64 start, - u64 end, u8 tile_mask); - int xe_vm_invalidate_vma(struct xe_vma *vma); =20 int xe_vm_validate_protected(struct xe_vm *vm); diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_= madvise.c index 95bf53cc29e3..39717026e84f 100644 --- a/drivers/gpu/drm/xe/xe_vm_madvise.c +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c @@ -12,6 +12,7 @@ #include "xe_pat.h" #include "xe_pt.h" #include "xe_svm.h" +#include "xe_tlb_inval.h" =20 struct xe_vmas_in_madvise_range { u64 addr; @@ -235,13 +236,19 @@ static u8 xe_zap_ptes_in_madvise_range(struct xe_vm *= vm, u64 start, u64 end) static int xe_vm_invalidate_madvise_range(struct xe_vm *vm, u64 start, u64= end) { u8 tile_mask =3D xe_zap_ptes_in_madvise_range(vm, start, end); + struct xe_tlb_inval_batch batch; + int err; =20 if (!tile_mask) return 0; =20 xe_device_wmb(vm->xe); =20 - return xe_vm_range_tilemask_tlb_inval(vm, start, end, tile_mask); + err =3D xe_tlb_inval_range_tilemask_submit(vm->xe, vm->usm.asid, start, e= nd, + tile_mask, &batch); + xe_tlb_inval_batch_wait(&batch); + + return err; } =20 static bool madvise_args_are_sane(struct xe_device *xe, const struct drm_x= e_madvise *args) diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_ty= pes.h index 1f6f7e30e751..de6544165cfa 100644 --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -18,6 +18,7 @@ #include "xe_device_types.h" #include "xe_pt_types.h" #include "xe_range_fence.h" +#include "xe_tlb_inval_types.h" #include "xe_userptr.h" =20 struct drm_pagemap; --=20 2.53.0 From nobody Thu Apr 9 13:31:31 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ACA8421EE7 for ; Mon, 2 Mar 2026 16:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469202; cv=none; b=C/8WYhi8EPLOuGGxQoGlKDzyJWaynwRgexJpVpFLkOeqfUb6YoAmwNIVvXwAgPUcQeQ0gWfE2LUimYaaOyCVGvGoKDCvhEcGHXj+JOL2rLfWklXQzC3fMgzwfXyUExD0eI3lLhHfgPLV62iSuhX9vDgUNAdv3XS/2CCFbePGqDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772469202; c=relaxed/simple; bh=gr/Rm/LmHwRqu0m9Cj42RM/GXccyftGf8yIDm0daZ4M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TWhZRPr4DIDh6+ToQYrIdLRW6W+lgure0ZDjpkK/d1R2upInFEneEi0SPzlwHX1TjzE0XM6RcuPXhUAIOJpUvfz71QMRB0xLel2tkEPrgIL7wcFqndMadMnT63q9yS/Pwog470NnsUL9RTu/r+2fZBmi0zGmM0XuRBy+T+v1c38= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=A1+2jv3f; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="A1+2jv3f" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772469201; x=1804005201; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gr/Rm/LmHwRqu0m9Cj42RM/GXccyftGf8yIDm0daZ4M=; b=A1+2jv3fSnffhpWDUFKsvxdtqL4aYo9WYNhfEPWuIZXWLIQNR5+Kh+Nq nvUyqZPUR7v/h+84KGPNEAgOXmYbi6BjWJ7Ov+PSPI0Si+PJh5BDjRkK4 6JQNV2iyZ+VTpIgUAXO1vDhDn47gA+ql5GIN5aZGRCS3GggWk1qiWKlUL HnVT+kvWA5LtdXAEukdLVqsSowRR8mF1LpH1XHnmtZmTuXwEes0lMNKwS zGKDwhl1ncxK84DSqZltZRbQXsVwecjNd3F0ooG12X5HRH5WA9tKwxQ8a Oj2sUD9+7EgyU1lMUHAktFZ04vqRfIZE/DEzvMh79IPbR25ZdVRdCKCu0 w==; X-CSE-ConnectionGUID: W3V0kF7wSuGS1X9ZUwQLYQ== X-CSE-MsgGUID: UScXhT8vT8yACF7tyhHZWg== X-IronPort-AV: E=McAfee;i="6800,10657,11717"; a="73447868" X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="73447868" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:21 -0800 X-CSE-ConnectionGUID: TpC3DWhCS0uNy94/GNPzYg== X-CSE-MsgGUID: vuRGhG7cSu+atzC1e2EzwQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,320,1763452800"; d="scan'208";a="255564537" Received: from smoticic-mobl1.ger.corp.intel.com (HELO fedora) ([10.245.244.81]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2026 08:33:19 -0800 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Matthew Brost , =?UTF-8?q?Christian=20K=C3=B6nig?= , dri-devel@lists.freedesktop.org, Jason Gunthorpe , Andrew Morton , Simona Vetter , Dave Airlie , Alistair Popple , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 4/4] drm/xe/userptr: Defer Waiting for TLB invalidation to the second pass if possible Date: Mon, 2 Mar 2026 17:32:48 +0100 Message-ID: <20260302163248.105454-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Now that the two-pass notifier flow uses xe_vma_userptr_do_inval() for the fence-wait + TLB-invalidate work, extend it to support a further deferred TLB wait: - xe_vma_userptr_do_inval(): when the embedded finish handle is free, submit the TLB invalidation asynchronously (xe_vm_invalidate_vma_submit) and return &userptr->finish so the mmu_notifier core schedules a third pass. When the handle is occupied by a concurrent invalidation, fall back to the synchronous xe_vm_invalidate_vma() path. - xe_vma_userptr_complete_tlb_inval(): new helper called from invalidate_finish when tlb_inval_submitted is set. Waits for the previously submitted batch and unmaps the gpusvm pages. xe_vma_userptr_invalidate_finish() dispatches between the two helpers via tlb_inval_submitted, making the three possible flows explicit: pass1 (fences pending) -> invalidate_finish -> do_inval (sync TLB) pass1 (fences done) -> do_inval -> invalidate_finish -> complete_tlb_inval (deferred TLB) pass1 (finish occupied) -> do_inval (sync TLB, inline) In multi-GPU scenarios this allows TLB flushes to be submitted on all GPUs in one pass before any of them are waited on. Also adds xe_vm_invalidate_vma_submit() which submits the TLB range invalidation without blocking, populating a xe_tlb_inval_batch that the caller waits on separately. Assisted-by: GitHub Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellstr=C3=B6m --- drivers/gpu/drm/xe/xe_userptr.c | 60 +++++++++++++++++++++++++++------ drivers/gpu/drm/xe/xe_userptr.h | 18 ++++++++++ drivers/gpu/drm/xe/xe_vm.c | 38 ++++++++++++++++----- drivers/gpu/drm/xe/xe_vm.h | 2 ++ 4 files changed, 99 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_userptr.c b/drivers/gpu/drm/xe/xe_userpt= r.c index 440b0a79d16f..a62b796afb93 100644 --- a/drivers/gpu/drm/xe/xe_userptr.c +++ b/drivers/gpu/drm/xe/xe_userptr.c @@ -8,6 +8,7 @@ =20 #include =20 +#include "xe_tlb_inval.h" #include "xe_trace_bo.h" =20 /** @@ -73,8 +74,8 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma) &ctx); } =20 -static void xe_vma_userptr_do_inval(struct xe_vm *vm, struct xe_userptr_vm= a *uvma, - bool is_deferred) +static struct mmu_interval_notifier_finish * +xe_vma_userptr_do_inval(struct xe_vm *vm, struct xe_userptr_vma *uvma, boo= l is_deferred) { struct xe_userptr *userptr =3D &uvma->userptr; struct xe_vma *vma =3D &uvma->vma; @@ -84,12 +85,23 @@ static void xe_vma_userptr_do_inval(struct xe_vm *vm, s= truct xe_userptr_vma *uvm }; long err; =20 - err =3D dma_resv_wait_timeout(xe_vm_resv(vm), - DMA_RESV_USAGE_BOOKKEEP, + err =3D dma_resv_wait_timeout(xe_vm_resv(vm), DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); XE_WARN_ON(err <=3D 0); =20 if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) { + if (!userptr->finish_inuse) { + /* + * Defer the TLB wait to an extra pass so the caller + * can pipeline TLB flushes across GPUs before waiting + * on any of them. + */ + userptr->finish_inuse =3D true; + userptr->tlb_inval_submitted =3D true; + err =3D xe_vm_invalidate_vma_submit(vma, &userptr->inval_batch); + XE_WARN_ON(err); + return &userptr->finish; + } err =3D xe_vm_invalidate_vma(vma); XE_WARN_ON(err); } @@ -98,6 +110,24 @@ static void xe_vma_userptr_do_inval(struct xe_vm *vm, s= truct xe_userptr_vma *uvm userptr->finish_inuse =3D false; drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma->userptr.pages, xe_vma_size(vma) >> PAGE_SHIFT, &ctx); + return NULL; +} + +static void +xe_vma_userptr_complete_tlb_inval(struct xe_vm *vm, struct xe_userptr_vma = *uvma) +{ + struct xe_userptr *userptr =3D &uvma->userptr; + struct xe_vma *vma =3D &uvma->vma; + struct drm_gpusvm_ctx ctx =3D { + .in_notifier =3D true, + .read_only =3D xe_vma_read_only(vma), + }; + + xe_tlb_inval_batch_wait(&userptr->inval_batch); + userptr->tlb_inval_submitted =3D false; + userptr->finish_inuse =3D false; + drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma->userptr.pages, + xe_vma_size(vma) >> PAGE_SHIFT, &ctx); } =20 static struct mmu_interval_notifier_finish * @@ -141,13 +171,11 @@ xe_vma_userptr_invalidate_pass1(struct xe_vm *vm, str= uct xe_userptr_vma *uvma) * If it's already in use, or all fences are already signaled, * proceed directly to invalidation without deferring. */ - if (signaled || userptr->finish_inuse) { - xe_vma_userptr_do_inval(vm, uvma, false); - return NULL; - } + if (signaled || userptr->finish_inuse) + return xe_vma_userptr_do_inval(vm, uvma, false); =20 + /* Defer: the notifier core will call invalidate_finish once done. */ userptr->finish_inuse =3D true; - return &userptr->finish; } =20 @@ -193,7 +221,15 @@ static void xe_vma_userptr_invalidate_finish(struct mm= u_interval_notifier_finish xe_vma_start(vma), xe_vma_size(vma)); =20 down_write(&vm->svm.gpusvm.notifier_lock); - xe_vma_userptr_do_inval(vm, uvma, true); + /* + * If a TLB invalidation was previously submitted (deferred from the + * synchronous pass1 fallback), wait for it and unmap pages. + * Otherwise, fences have now completed: invalidate the TLB and unmap. + */ + if (uvma->userptr.tlb_inval_submitted) + xe_vma_userptr_complete_tlb_inval(vm, uvma); + else + xe_vma_userptr_do_inval(vm, uvma, true); up_write(&vm->svm.gpusvm.notifier_lock); trace_xe_vma_userptr_invalidate_complete(vma); } @@ -231,7 +267,9 @@ void xe_vma_userptr_force_invalidate(struct xe_userptr_= vma *uvma) =20 finish =3D xe_vma_userptr_invalidate_pass1(vm, uvma); if (finish) - xe_vma_userptr_do_inval(vm, uvma, true); + finish =3D xe_vma_userptr_do_inval(vm, uvma, true); + if (finish) + xe_vma_userptr_complete_tlb_inval(vm, uvma); } #endif =20 diff --git a/drivers/gpu/drm/xe/xe_userptr.h b/drivers/gpu/drm/xe/xe_userpt= r.h index 4f42db61fd62..7477009651c2 100644 --- a/drivers/gpu/drm/xe/xe_userptr.h +++ b/drivers/gpu/drm/xe/xe_userptr.h @@ -14,6 +14,8 @@ =20 #include =20 +#include "xe_tlb_inval_types.h" + struct xe_vm; struct xe_vma; struct xe_userptr_vma; @@ -63,6 +65,15 @@ struct xe_userptr { * Protected by @vm::svm.gpusvm.notifier_lock. */ struct mmu_interval_notifier_finish finish; + + /** + * @inval_batch: TLB invalidation batch for deferred completion. + * Stores an in-flight TLB invalidation submitted during a two-pass + * notifier so the wait can be deferred to a subsequent pass, allowing + * multiple GPUs to be signalled before any of them are waited on. + * Protected by @vm::svm.gpusvm.notifier_lock. + */ + struct xe_tlb_inval_batch inval_batch; /** * @finish_inuse: Whether @finish is currently in use by an in-progress * two-pass invalidation. @@ -70,6 +81,13 @@ struct xe_userptr { */ bool finish_inuse; =20 + /** + * @tlb_inval_submitted: Whether a TLB invalidation has been submitted + * via @inval_batch and is pending completion. When set, the next pass + * must call xe_tlb_inval_batch_wait() before reusing @inval_batch. + * Protected by @vm::svm.gpusvm.notifier_lock. + */ + bool tlb_inval_submitted; /** * @initial_bind: user pointer has been bound at least once. * write: vm->svm.gpusvm.notifier_lock in read mode and vm->resv held. diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 7f29d2b2972d..fdad9329dfb4 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -3967,20 +3967,23 @@ void xe_vm_unlock(struct xe_vm *vm) } =20 /** - * xe_vm_invalidate_vma - invalidate GPU mappings for VMA without a lock + * xe_vm_invalidate_vma_submit - Submit a job to invalidate GPU mappings f= or + * VMA. * @vma: VMA to invalidate + * @batch: TLB invalidation batch to populate; caller must later call + * xe_tlb_inval_batch_wait() on it to wait for completion * * Walks a list of page tables leaves which it memset the entries owned by= this - * VMA to zero, invalidates the TLBs, and block until TLBs invalidation is - * complete. + * VMA to zero, invalidates the TLBs, but doesn't block waiting for TLB fl= ush + * to complete, but instead populates @batch which can be waited on using + * xe_tlb_inval_batch_wait(). * * Returns 0 for success, negative error code otherwise. */ -int xe_vm_invalidate_vma(struct xe_vma *vma) +int xe_vm_invalidate_vma_submit(struct xe_vma *vma, struct xe_tlb_inval_ba= tch *batch) { struct xe_device *xe =3D xe_vma_vm(vma)->xe; struct xe_vm *vm =3D xe_vma_vm(vma); - struct xe_tlb_inval_batch _batch; struct xe_tile *tile; u8 tile_mask =3D 0; int ret =3D 0; @@ -4023,14 +4026,33 @@ int xe_vm_invalidate_vma(struct xe_vma *vma) =20 ret =3D xe_tlb_inval_range_tilemask_submit(xe, xe_vma_vm(vma)->usm.asid, xe_vma_start(vma), xe_vma_end(vma), - tile_mask, &_batch); + tile_mask, batch); =20 /* WRITE_ONCE pairs with READ_ONCE in xe_vm_has_valid_gpu_mapping() */ WRITE_ONCE(vma->tile_invalidated, vma->tile_mask); + return ret; +} + +/** + * xe_vm_invalidate_vma - invalidate GPU mappings for VMA without a lock + * @vma: VMA to invalidate + * + * Walks a list of page tables leaves which it memset the entries owned by= this + * VMA to zero, invalidates the TLBs, and block until TLBs invalidation is + * complete. + * + * Returns 0 for success, negative error code otherwise. + */ +int xe_vm_invalidate_vma(struct xe_vma *vma) +{ + struct xe_tlb_inval_batch batch; + int ret; =20 - if (!ret) - xe_tlb_inval_batch_wait(&_batch); + ret =3D xe_vm_invalidate_vma_submit(vma, &batch); + if (ret) + return ret; =20 + xe_tlb_inval_batch_wait(&batch); return ret; } =20 diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 62f4b6fec0bc..0bc7ed23eeae 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -242,6 +242,8 @@ struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, =20 int xe_vm_invalidate_vma(struct xe_vma *vma); =20 +int xe_vm_invalidate_vma_submit(struct xe_vma *vma, struct xe_tlb_inval_ba= tch *batch); + int xe_vm_validate_protected(struct xe_vm *vm); =20 static inline void xe_vm_queue_rebind_worker(struct xe_vm *vm) --=20 2.53.0