From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3ADA36E48E for ; Tue, 3 Feb 2026 22:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156594; cv=none; b=P8QAPtIG7JRAjHGOQemusNOsU/FHPMNOW9cQEXBT9vDComfYGNDV5R6A36uT9ZHUWgI4u+53DjVvlZpooja4Spq4r3DZZYsqswmXUJ7xSGiaOn3Ys0j/y9R2ACuiCH+yxvRQK5RqS5I1fPWgjeNFKGeFSYE4BP8U1DfcsRzw8tY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156594; c=relaxed/simple; bh=foh5Qr5xp1j1ngJ9RXLNwUIMLFRxhtltxVjGHDfXeik=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=R0u6bKaO+ATvVxrCmTupwiU97kFQW1VBMyiEICMP4A6ATMgd6bLoy6KKBanGzMRV+VdP3bmC8B02sNKG0G/o8s8v3znyEEYVs4cj1/FVWUJqegznBIxqrUenXt5do+P4zB/6DafNNGnpW/t07FX2dx57GxfKiUbxNqDwBs7lyFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ADowE2in; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ADowE2in" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c613929d317so3727516a12.2 for ; Tue, 03 Feb 2026 14:09:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156592; x=1770761392; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=5eZO2KOvcnbxY94D0A6rezfq3j04mJBbysFnKRo/qQM=; b=ADowE2inC7ehLoLlW+hPIeFIdcqhvtoQ7nrdxKitGd4OtCz02I0ip2Al/5hWAUMzVv MwcEOsliYO3jq/j8BjveNXE/KvlK107pbdTcjfSprax39FSndV9M/fZfggohzCrwzYiA vEyjGSTsrWg5pYF0mYuvDOA2fjJikdo2xcClIqkD4TMjDx7hrumesEmfyXyR47iq+V1q /SOLqx4+z6cWqeWhFr9QVuntoUBl0pNgbJo1K08OOHphO6bDHcyi3jAOzT/WDw7R2/NE uVV4BSC0QhCfTW7eK0VyXbuyJIm1t84HqKQ+VpDDt2O5Pg1cyOy5lHgRbImZVYTUZ5Ul MnYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156592; x=1770761392; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5eZO2KOvcnbxY94D0A6rezfq3j04mJBbysFnKRo/qQM=; b=LZzhda7NtFjFnmmKJIfIeRbHr758YG9rtIgTklqMiEThTeehO+UEMRMvFEOzB+96jU 0+thzqFXv2NrB8cEOrzz7U57OrBrhOZH7s/jDjfnNmtgQcG8ERORJQQmIH3hITC0MpZr p7WXUlwGtOm1diVqm5+mD7Dt8wk0NQ9DGn9KiVKCeyiLm2aeaBAOyRjI3rQ6RSO05ucY wiF2hhL5B+zgeylLXeqf1QY4kyjIuG51jn9ykEP9qVoFwkk/QzwPbzFcxuLMpK2jscUk jbw3CnOleWRF6sufRTvhgfdf3WKTn/HpMSjFdn573F0e+zfAzBg2LL5vHnntpT8R2u2S 7dmQ== X-Forwarded-Encrypted: i=1; AJvYcCXuuaQiEN+sDSs7BPeHnZ8kLo1IMhQpx8JbF0wlb0aba4XFki5CdHdDZcMupIcLMB1EHFZoUt8VXJyG0jY=@vger.kernel.org X-Gm-Message-State: AOJu0YxhicWMRIKrAMq+L4P9S/z3Qaf79RfRpzaYZvS4lTF4ANMkfbGC gL/QdzPawoDsDKBOgIqtc2wZhdtt8qEU7cx6XtzJZcrKzi8xKmqMygWf+WkXpaXc0opatO+EE5s AdJFTs4jyQQ3+vQ== X-Received: from pfbdw20.prod.google.com ([2002:a05:6a00:3694:b0:822:4e8c:2c9e]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2d97:b0:81f:3fbd:ccf with SMTP id d2e1a72fcca58-8241c1e0874mr788987b3a.23.1770156592036; Tue, 03 Feb 2026 14:09:52 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:35 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-2-skhawaja@google.com> Subject: [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add liveupdate FLB for IOMMU state preservation. Use KHO preserve memory alloc/free helper functions to allocate memory for the IOMMU LU FLB object and the serialization structs for device, domain and iommu. During retrieve, walk through the preserved objs nodes and restore each folio. Also recreate the FLB obj. Signed-off-by: Samiullah Khawaja --- drivers/iommu/Kconfig | 11 +++ drivers/iommu/Makefile | 1 + drivers/iommu/liveupdate.c | 177 ++++++++++++++++++++++++++++++++++ include/linux/iommu-lu.h | 17 ++++ include/linux/kho/abi/iommu.h | 119 +++++++++++++++++++++++ 5 files changed, 325 insertions(+) create mode 100644 drivers/iommu/liveupdate.c create mode 100644 include/linux/iommu-lu.h create mode 100644 include/linux/kho/abi/iommu.h diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f86262b11416..fdcfbedee5ed 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -11,6 +11,17 @@ config IOMMUFD_DRIVER bool default n =20 +config IOMMU_LIVEUPDATE + bool "IOMMU live update state preservation support" + depends on LIVEUPDATE && IOMMUFD + help + Enable support for preserving IOMMU state across a kexec live update. + + This allows devices managed by iommufd to maintain their DMA mappings + during kexec base kernel update. + + If unsure, say N. + menuconfig IOMMU_SUPPORT bool "IOMMU Hardware Support" depends on MMU diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 0275821f4ef9..b3715c5a6b97 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,6 +15,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) +=3D io-pgtable-arm= -v7s.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) +=3D io-pgtable-arm.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE_KUNIT_TEST) +=3D io-pgtable-arm-selftes= ts.o obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) +=3D io-pgtable-dart.o +obj-$(CONFIG_IOMMU_LIVEUPDATE) +=3D liveupdate.o obj-$(CONFIG_IOMMU_IOVA) +=3D iova.o obj-$(CONFIG_OF_IOMMU) +=3D of_iommu.o obj-$(CONFIG_MSM_IOMMU) +=3D msm_iommu.o diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c new file mode 100644 index 000000000000..6189ba32ff2c --- /dev/null +++ b/drivers/iommu/liveupdate.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (C) 2025, Google LLC + * Author: Samiullah Khawaja + */ + +#define pr_fmt(fmt) "iommu: liveupdate: " fmt + +#include +#include +#include +#include +#include + +static void iommu_liveupdate_restore_objs(u64 next) +{ + struct iommu_objs_ser *objs; + + while (next) { + BUG_ON(!kho_restore_folio(next)); + objs =3D __va(next); + next =3D objs->next_objs; + } +} + +static void iommu_liveupdate_free_objs(u64 next, bool incoming) +{ + struct iommu_objs_ser *objs; + + while (next) { + objs =3D __va(next); + next =3D objs->next_objs; + + if (!incoming) + kho_unpreserve_free(objs); + else + folio_put(virt_to_folio(objs)); + } +} + +static void iommu_liveupdate_flb_free(struct iommu_lu_flb_obj *obj) +{ + if (obj->iommu_domains) + iommu_liveupdate_free_objs(obj->ser->iommu_domains_phys, false); + + if (obj->devices) + iommu_liveupdate_free_objs(obj->ser->devices_phys, false); + + if (obj->iommus) + iommu_liveupdate_free_objs(obj->ser->iommus_phys, false); + + kho_unpreserve_free(obj->ser); + kfree(obj); +} + +static int iommu_liveupdate_flb_preserve(struct liveupdate_flb_op_args *ar= gp) +{ + struct iommu_lu_flb_obj *obj; + struct iommu_lu_flb_ser *ser; + void *mem; + + obj =3D kzalloc(sizeof(*obj), GFP_KERNEL); + if (!obj) + return -ENOMEM; + + mutex_init(&obj->lock); + mem =3D kho_alloc_preserve(sizeof(*ser)); + if (IS_ERR(mem)) + goto err_free; + + ser =3D mem; + obj->ser =3D ser; + + mem =3D kho_alloc_preserve(PAGE_SIZE); + if (IS_ERR(mem)) + goto err_free; + + obj->iommu_domains =3D mem; + ser->iommu_domains_phys =3D virt_to_phys(obj->iommu_domains); + + mem =3D kho_alloc_preserve(PAGE_SIZE); + if (IS_ERR(mem)) + goto err_free; + + obj->devices =3D mem; + ser->devices_phys =3D virt_to_phys(obj->devices); + + mem =3D kho_alloc_preserve(PAGE_SIZE); + if (IS_ERR(mem)) + goto err_free; + + obj->iommus =3D mem; + ser->iommus_phys =3D virt_to_phys(obj->iommus); + + argp->obj =3D obj; + argp->data =3D virt_to_phys(ser); + return 0; + +err_free: + iommu_liveupdate_flb_free(obj); + return PTR_ERR(mem); +} + +static void iommu_liveupdate_flb_unpreserve(struct liveupdate_flb_op_args = *argp) +{ + iommu_liveupdate_flb_free(argp->obj); +} + +static void iommu_liveupdate_flb_finish(struct liveupdate_flb_op_args *arg= p) +{ + struct iommu_lu_flb_obj *obj =3D argp->obj; + + if (obj->iommu_domains) + iommu_liveupdate_free_objs(obj->ser->iommu_domains_phys, true); + + if (obj->devices) + iommu_liveupdate_free_objs(obj->ser->devices_phys, true); + + if (obj->iommus) + iommu_liveupdate_free_objs(obj->ser->iommus_phys, true); + + folio_put(virt_to_folio(obj->ser)); + kfree(obj); +} + +static int iommu_liveupdate_flb_retrieve(struct liveupdate_flb_op_args *ar= gp) +{ + struct iommu_lu_flb_obj *obj; + struct iommu_lu_flb_ser *ser; + + obj =3D kzalloc(sizeof(*obj), GFP_ATOMIC); + if (!obj) + return -ENOMEM; + + mutex_init(&obj->lock); + BUG_ON(!kho_restore_folio(argp->data)); + ser =3D phys_to_virt(argp->data); + obj->ser =3D ser; + + iommu_liveupdate_restore_objs(ser->iommu_domains_phys); + obj->iommu_domains =3D phys_to_virt(ser->iommu_domains_phys); + + iommu_liveupdate_restore_objs(ser->devices_phys); + obj->devices =3D phys_to_virt(ser->devices_phys); + + iommu_liveupdate_restore_objs(ser->iommus_phys); + obj->iommus =3D phys_to_virt(ser->iommus_phys); + + argp->obj =3D obj; + + return 0; +} + +static struct liveupdate_flb_ops iommu_flb_ops =3D { + .preserve =3D iommu_liveupdate_flb_preserve, + .unpreserve =3D iommu_liveupdate_flb_unpreserve, + .finish =3D iommu_liveupdate_flb_finish, + .retrieve =3D iommu_liveupdate_flb_retrieve, +}; + +static struct liveupdate_flb iommu_flb =3D { + .compatible =3D IOMMU_LUO_FLB_COMPATIBLE, + .ops =3D &iommu_flb_ops, +}; + +int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler) +{ + return liveupdate_register_flb(handler, &iommu_flb); +} +EXPORT_SYMBOL(iommu_liveupdate_register_flb); + +int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handle= r) +{ + return liveupdate_unregister_flb(handler, &iommu_flb); +} +EXPORT_SYMBOL(iommu_liveupdate_unregister_flb); diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h new file mode 100644 index 000000000000..59095d2f1bb2 --- /dev/null +++ b/include/linux/iommu-lu.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (C) 2025, Google LLC + * Author: Samiullah Khawaja + */ + +#ifndef _LINUX_IOMMU_LU_H +#define _LINUX_IOMMU_LU_H + +#include +#include + +int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler); +int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handle= r); + +#endif /* _LINUX_IOMMU_LU_H */ diff --git a/include/linux/kho/abi/iommu.h b/include/linux/kho/abi/iommu.h new file mode 100644 index 000000000000..8e1c05cfe7bb --- /dev/null +++ b/include/linux/kho/abi/iommu.h @@ -0,0 +1,119 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (C) 2025, Google LLC + * Author: Samiullah Khawaja + */ + +#ifndef _LINUX_KHO_ABI_IOMMU_H +#define _LINUX_KHO_ABI_IOMMU_H + +#include +#include +#include + +/** + * DOC: IOMMU File-Lifecycle Bound (FLB) Live Update ABI + * + * This header defines the ABI for preserving IOMMU state across kexec usi= ng + * Live Update File-Lifecycle Bound (FLB) data. + * + * This interface is a contract. Any modification to any of the serializat= ion + * structs defined here constitutes a breaking change. Such changes require + * incrementing the version number in the IOMMU_LUO_FLB_COMPATIBLE string. + */ + +#define IOMMU_LUO_FLB_COMPATIBLE "iommu-v1" + +enum iommu_lu_type { + IOMMU_INVALID, + IOMMU_INTEL, +}; + +struct iommu_obj_ser { + u32 idx; + u32 ref_count; + u32 deleted:1; + u32 incoming:1; +} __packed; + +struct iommu_domain_ser { + struct iommu_obj_ser obj; + u64 top_table; + u64 top_level; + struct iommu_domain *restored_domain; +} __packed; + +struct device_domain_iommu_ser { + u32 did; + u64 domain_phys; + u64 iommu_phys; +} __packed; + +struct device_ser { + struct iommu_obj_ser obj; + u64 token; + u32 devid; + u32 pci_domain; + struct device_domain_iommu_ser domain_iommu_ser; + enum iommu_lu_type type; +} __packed; + +struct iommu_intel_ser { + u64 phys_addr; + u64 root_table; +} __packed; + +struct iommu_ser { + struct iommu_obj_ser obj; + u64 token; + enum iommu_lu_type type; + union { + struct iommu_intel_ser intel; + }; +} __packed; + +struct iommu_objs_ser { + u64 next_objs; + u64 nr_objs; +} __packed; + +struct iommus_ser { + struct iommu_objs_ser objs; + struct iommu_ser iommus[]; +} __packed; + +struct iommu_domains_ser { + struct iommu_objs_ser objs; + struct iommu_domain_ser iommu_domains[]; +} __packed; + +struct devices_ser { + struct iommu_objs_ser objs; + struct device_ser devices[]; +} __packed; + +#define MAX_IOMMU_SERS ((PAGE_SIZE - sizeof(struct iommus_ser)) / sizeof(s= truct iommu_ser)) +#define MAX_IOMMU_DOMAIN_SERS \ + ((PAGE_SIZE - sizeof(struct iommu_domains_ser)) / sizeof(struct iommu_do= main_ser)) +#define MAX_DEVICE_SERS ((PAGE_SIZE - sizeof(struct devices_ser)) / sizeof= (struct device_ser)) + +struct iommu_lu_flb_ser { + u64 iommus_phys; + u64 nr_iommus; + u64 iommu_domains_phys; + u64 nr_domains; + u64 devices_phys; + u64 nr_devices; +} __packed; + +struct iommu_lu_flb_obj { + struct mutex lock; + struct iommu_lu_flb_ser *ser; + + struct iommu_domains_ser *iommu_domains; + struct iommus_ser *iommus; + struct devices_ser *devices; +} __packed; + +#endif /* _LINUX_KHO_ABI_IOMMU_H */ --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64A1A372B3F for ; Tue, 3 Feb 2026 22:09:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156597; cv=none; b=r635WJo+58jpwQ1j1ykO5UdmUpt0O4CsI3ERadA+IL9oJB+SQNzvdterVuaySoqi23qHB0rZRpbeImoING7JZSut1LnyslE9cYpKAMMUaLjb5pCT0QQ+zCAJ4zX1/9y3I4qXFGXxnCC9BbwUDg6VnmCX9ln8OsHcGNvytIda81c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156597; c=relaxed/simple; bh=EPIDbGh/tMkEW2OK2vMYjXt9eMqNUMuuIlOKoX9JYrk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JqGCP+2cnwxLJyl7O2B2pFd5g90lKarBLvFPvCgTQizLElDDk4v9VgXiq/Ggz2yus7gmK96Jsmz1juPB29fq6Fc/XdtxfMoWvzx4MKJnYvC/TJk7ApuS00GA8DYLb9otIRrNT3zIQvkAQ6W3ffxLjrvpcWQnlyD/u4BUifVkC58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=l4L4wvoX; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="l4L4wvoX" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-82343404fa0so2779532b3a.3 for ; Tue, 03 Feb 2026 14:09:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156594; x=1770761394; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ngtugd2l6eNG/KJf2LKmJyyZZ0IPLWVKkgim2iKhMrs=; b=l4L4wvoXQ5xebiAMnnRyo3Ueu9aUV9AJrWW7fUWUkgigqqTMxNwplMFD8F16iXNe9e kZNYLershX3fs+11irSn9JAzeISrUD/NR3JojHq+flmN8a3vi9OhiqBeO7HtZFoCNUhf yIKo6O3H8FC/7Eym1MPMqFiWioumReVjhBfxQ4gRxznuFsuYIr2LKUzCyvVMlP1CGN/n RU7F7NqZUtD/aLvHWcvyuNfABPllGTVd5puUMtCq4Gh8cvs61gIkhwklR1MoF6jLVVxi thdYlIGgr7YA9NpZ87Tj7C3vs8m/awjAkgX201Z9MQ9xUJm3n0SfwcUv0OkgucXIeYEn ba7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156594; x=1770761394; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ngtugd2l6eNG/KJf2LKmJyyZZ0IPLWVKkgim2iKhMrs=; b=PEGBBeTgEeeH3Jxs+Tt6jSxu2fD+M30SF5wITgvtO1EbBoZm5+Vb/GKFBpWsUm8I/w Bc0Ztf3BLC5hpUS+xdxb51tXSo0HTttkWndVpBV9LqVNqMi2BSKNlPBvueLUqXaFJRbC baHOlGjl+pmbawcmfoFVfTNIBv4cdOpr4h5uWmIZzLE/cq1KXhqZPK+Eld7j2r38pkma Vh3SDb/5E506fjHiFNTp+kpCmpYlSAsONcpxXEXUUnO7TzIxP2AtPepd4l02mVVNogR6 OsQJ2d/Qbk/Xb9JEoQb2kunxetKCwu+DSK9Q+pCCf2pLS1EmtK1mxgLClHbBJSdOePLm QSLw== X-Forwarded-Encrypted: i=1; AJvYcCWvdyCw725Els3gHKYNo7X3YAWbgkkS4R3GZfLwGKrvCRqf/Df1S6BbiFyMhT0FS1ZeFRjRhW1lq0PZ3JQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxgzWwWo0s5V/9+YKomII7FrQ2milmBcnzSg0ym5qNP7/TvShg0 qt1uli/bShkVlMfKszrzCsPD+ISGd2MURf8q3DvWoiaexGzxpBnZH8g1wYlvGDNt0zk3bLojs6K 3DY29lmy+gfrFWQ== X-Received: from pfde20.prod.google.com ([2002:aa7:8c54:0:b0:823:1513:f42d]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:a226:b0:823:30a1:d5ba with SMTP id d2e1a72fcca58-8241c6751aemr879693b3a.51.1770156593706; Tue, 03 Feb 2026 14:09:53 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:36 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-3-skhawaja@google.com> Subject: [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add IOMMU domain ops that can be implemented by the IOMMU drivers if they support IOMMU domain preservation across liveupdate. The new IOMMU domain preserve, unpreserve and restore APIs call these ops to perform respective live update operations. Similarly add IOMMU ops to preserve/unpreserve a device. These can be implemented by the IOMMU drivers that support preservation of devices that have their IOMMU domains preserved. During device preservation the state of the associated IOMMU is also preserved. The device can only be preserved if the attached iommu domain is preserved and the associated iommu supports preservation. The preserved state of the device and IOMMU needs to be fetched during shutdown and boot in the next kernel. Add APIs that can be used to fetch the preserved state of a device and IOMMU. The APIs will only be used during shutdown and after liveupdate so no locking needed. Signed-off-by: Samiullah Khawaja --- drivers/iommu/iommu.c | 3 + drivers/iommu/liveupdate.c | 326 +++++++++++++++++++++++++++++++++++++ include/linux/iommu-lu.h | 119 ++++++++++++++ include/linux/iommu.h | 32 ++++ 4 files changed, 480 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4926a43118e6..c0632cb5b570 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -389,6 +389,9 @@ static struct dev_iommu *dev_iommu_get(struct device *d= ev) =20 mutex_init(¶m->lock); dev->iommu =3D param; +#ifdef CONFIG_IOMMU_LIVEUPDATE + dev->iommu->device_ser =3D NULL; +#endif return param; } =20 diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c index 6189ba32ff2c..83eb609b3fd7 100644 --- a/drivers/iommu/liveupdate.c +++ b/drivers/iommu/liveupdate.c @@ -11,6 +11,7 @@ #include #include #include +#include #include =20 static void iommu_liveupdate_restore_objs(u64 next) @@ -175,3 +176,328 @@ int iommu_liveupdate_unregister_flb(struct liveupdate= _file_handler *handler) return liveupdate_unregister_flb(handler, &iommu_flb); } EXPORT_SYMBOL(iommu_liveupdate_unregister_flb); + +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, + void *arg) +{ + struct iommu_lu_flb_obj *obj; + struct devices_ser *devices; + int ret, i, idx; + + ret =3D liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); + if (ret) + return -ENOENT; + + devices =3D __va(obj->ser->devices_phys); + for (i =3D 0, idx =3D 0; i < obj->ser->nr_devices; ++i, ++idx) { + if (idx >=3D MAX_DEVICE_SERS) { + devices =3D __va(devices->objs.next_objs); + idx =3D 0; + } + + if (devices->devices[idx].obj.deleted) + continue; + + ret =3D fn(&devices->devices[idx], arg); + if (ret) + return ret; + } + + return 0; +} +EXPORT_SYMBOL(iommu_for_each_preserved_device); + +static inline bool device_ser_match(struct device_ser *match, + struct pci_dev *pdev) +{ + return match->devid =3D=3D pci_dev_id(pdev) && match->pci_domain =3D=3D p= ci_domain_nr(pdev->bus); +} + +struct device_ser *iommu_get_device_preserved_data(struct device *dev) +{ + struct iommu_lu_flb_obj *obj; + struct devices_ser *devices; + int ret, i, idx; + + if (!dev_is_pci(dev)) + return NULL; + + ret =3D liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); + if (ret) + return NULL; + + devices =3D __va(obj->ser->devices_phys); + for (i =3D 0, idx =3D 0; i < obj->ser->nr_devices; ++i, ++idx) { + if (idx >=3D MAX_DEVICE_SERS) { + devices =3D __va(devices->objs.next_objs); + idx =3D 0; + } + + if (devices->devices[idx].obj.deleted) + continue; + + if (device_ser_match(&devices->devices[idx], to_pci_dev(dev))) { + devices->devices[idx].obj.incoming =3D true; + return &devices->devices[idx]; + } + } + + return NULL; +} +EXPORT_SYMBOL(iommu_get_device_preserved_data); + +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type t= ype) +{ + struct iommu_lu_flb_obj *obj; + struct iommus_ser *iommus; + int ret, i, idx; + + ret =3D liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); + if (ret) + return NULL; + + iommus =3D __va(obj->ser->iommus_phys); + for (i =3D 0, idx =3D 0; i < obj->ser->nr_iommus; ++i, ++idx) { + if (idx >=3D MAX_IOMMU_SERS) { + iommus =3D __va(iommus->objs.next_objs); + idx =3D 0; + } + + if (iommus->iommus[idx].obj.deleted) + continue; + + if (iommus->iommus[idx].token =3D=3D token && + iommus->iommus[idx].type =3D=3D type) + return &iommus->iommus[idx]; + } + + return NULL; +} +EXPORT_SYMBOL(iommu_get_preserved_data); + +static int reserve_obj_ser(struct iommu_objs_ser **objs_ptr, u64 max_objs) +{ + struct iommu_objs_ser *next_objs, *objs =3D *objs_ptr; + int idx; + + if (objs->nr_objs =3D=3D max_objs) { + next_objs =3D kho_alloc_preserve(PAGE_SIZE); + if (IS_ERR(next_objs)) + return PTR_ERR(next_objs); + + objs->next_objs =3D virt_to_phys(next_objs); + objs =3D next_objs; + *objs_ptr =3D objs; + objs->nr_objs =3D 0; + objs->next_objs =3D 0; + } + + idx =3D objs->nr_objs++; + return idx; +} + +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain= _ser **ser) +{ + struct iommu_domain_ser *domain_ser; + struct iommu_lu_flb_obj *flb_obj; + int idx, ret; + + if (!domain->ops->preserve) + return -EOPNOTSUPP; + + ret =3D liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); + if (ret) + return ret; + + guard(mutex)(&flb_obj->lock); + idx =3D reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommu_domains, + MAX_IOMMU_DOMAIN_SERS); + if (idx < 0) + return idx; + + domain_ser =3D &flb_obj->iommu_domains->iommu_domains[idx]; + idx =3D flb_obj->ser->nr_domains++; + domain_ser->obj.idx =3D idx; + domain_ser->obj.ref_count =3D 1; + + ret =3D domain->ops->preserve(domain, domain_ser); + if (ret) { + domain_ser->obj.deleted =3D true; + return ret; + } + + domain->preserved_state =3D domain_ser; + *ser =3D domain_ser; + return 0; +} +EXPORT_SYMBOL_GPL(iommu_domain_preserve); + +void iommu_domain_unpreserve(struct iommu_domain *domain) +{ + struct iommu_domain_ser *domain_ser; + struct iommu_lu_flb_obj *flb_obj; + int ret; + + if (!domain->ops->unpreserve) + return; + + ret =3D liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); + if (ret) + return; + + guard(mutex)(&flb_obj->lock); + + /* + * There is no check for attached devices here. The correctness relies + * on the Live Update Orchestrator's session lifecycle. All resources + * (iommufd, vfio devices) are preserved within a single session. If the + * session is torn down, the .unpreserve callbacks for all files will be + * invoked, ensuring a consistent cleanup without needing explicit + * refcounting for the serialized objects here. + */ + domain_ser =3D domain->preserved_state; + domain->ops->unpreserve(domain, domain_ser); + domain_ser->obj.deleted =3D true; + domain->preserved_state =3D NULL; +} +EXPORT_SYMBOL_GPL(iommu_domain_unpreserve); + +static int iommu_preserve_locked(struct iommu_device *iommu) +{ + struct iommu_lu_flb_obj *flb_obj; + struct iommu_ser *iommu_ser; + int idx, ret; + + if (!iommu->ops->preserve) + return -EOPNOTSUPP; + + if (iommu->outgoing_preserved_state) { + iommu->outgoing_preserved_state->obj.ref_count++; + return 0; + } + + ret =3D liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); + if (ret) + return ret; + + idx =3D reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommus, + MAX_IOMMU_SERS); + if (idx < 0) + return idx; + + iommu_ser =3D &flb_obj->iommus->iommus[idx]; + idx =3D flb_obj->ser->nr_iommus++; + iommu_ser->obj.idx =3D idx; + iommu_ser->obj.ref_count =3D 1; + + ret =3D iommu->ops->preserve(iommu, iommu_ser); + if (ret) + iommu_ser->obj.deleted =3D true; + + iommu->outgoing_preserved_state =3D iommu_ser; + return ret; +} + +static void iommu_unpreserve_locked(struct iommu_device *iommu) +{ + struct iommu_ser *iommu_ser =3D iommu->outgoing_preserved_state; + + iommu_ser->obj.ref_count--; + if (iommu_ser->obj.ref_count) + return; + + iommu->outgoing_preserved_state =3D NULL; + iommu->ops->unpreserve(iommu, iommu_ser); + iommu_ser->obj.deleted =3D true; +} + +int iommu_preserve_device(struct iommu_domain *domain, + struct device *dev, u64 token) +{ + struct iommu_lu_flb_obj *flb_obj; + struct device_ser *device_ser; + struct dev_iommu *iommu; + struct pci_dev *pdev; + int ret, idx; + + if (!dev_is_pci(dev)) + return -EOPNOTSUPP; + + if (!domain->preserved_state) + return -EINVAL; + + pdev =3D to_pci_dev(dev); + iommu =3D dev->iommu; + if (!iommu->iommu_dev->ops->preserve_device || + !iommu->iommu_dev->ops->preserve) + return -EOPNOTSUPP; + + ret =3D liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); + if (ret) + return ret; + + guard(mutex)(&flb_obj->lock); + idx =3D reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->devices, + MAX_DEVICE_SERS); + if (idx < 0) + return idx; + + device_ser =3D &flb_obj->devices->devices[idx]; + idx =3D flb_obj->ser->nr_devices++; + device_ser->obj.idx =3D idx; + device_ser->obj.ref_count =3D 1; + + ret =3D iommu_preserve_locked(iommu->iommu_dev); + if (ret) { + device_ser->obj.deleted =3D true; + return ret; + } + + device_ser->domain_iommu_ser.domain_phys =3D __pa(domain->preserved_state= ); + device_ser->domain_iommu_ser.iommu_phys =3D __pa(iommu->iommu_dev->outgoi= ng_preserved_state); + device_ser->devid =3D pci_dev_id(pdev); + device_ser->pci_domain =3D pci_domain_nr(pdev->bus); + device_ser->token =3D token; + + ret =3D iommu->iommu_dev->ops->preserve_device(dev, device_ser); + if (ret) { + device_ser->obj.deleted =3D true; + iommu_unpreserve_locked(iommu->iommu_dev); + return ret; + } + + dev->iommu->device_ser =3D device_ser; + return 0; +} + +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *d= ev) +{ + struct iommu_lu_flb_obj *flb_obj; + struct device_ser *device_ser; + struct dev_iommu *iommu; + struct pci_dev *pdev; + int ret; + + if (!dev_is_pci(dev)) + return; + + pdev =3D to_pci_dev(dev); + iommu =3D dev->iommu; + if (!iommu->iommu_dev->ops->unpreserve_device || + !iommu->iommu_dev->ops->unpreserve) + return; + + ret =3D liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); + if (WARN_ON(ret)) + return; + + guard(mutex)(&flb_obj->lock); + device_ser =3D dev_iommu_preserved_state(dev); + if (WARN_ON(!device_ser)) + return; + + iommu->iommu_dev->ops->unpreserve_device(dev, device_ser); + dev->iommu->device_ser =3D NULL; + + iommu_unpreserve_locked(iommu->iommu_dev); +} diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h index 59095d2f1bb2..48c07514a776 100644 --- a/include/linux/iommu-lu.h +++ b/include/linux/iommu-lu.h @@ -8,9 +8,128 @@ #ifndef _LINUX_IOMMU_LU_H #define _LINUX_IOMMU_LU_H =20 +#include +#include #include #include =20 +typedef int (*iommu_preserved_device_iter_fn)(struct device_ser *ser, + void *arg); +#ifdef CONFIG_IOMMU_LIVEUPDATE +static inline void *dev_iommu_preserved_state(struct device *dev) +{ + struct device_ser *ser; + + if (!dev->iommu) + return NULL; + + ser =3D dev->iommu->device_ser; + if (ser && !ser->obj.incoming) + return ser; + + return NULL; +} + +static inline void *dev_iommu_restored_state(struct device *dev) +{ + struct device_ser *ser; + + if (!dev->iommu) + return NULL; + + ser =3D dev->iommu->device_ser; + if (ser && ser->obj.incoming) + return ser; + + return NULL; +} + +static inline void *iommu_domain_restored_state(struct iommu_domain *domai= n) +{ + struct iommu_domain_ser *ser; + + ser =3D domain->preserved_state; + if (ser && ser->obj.incoming) + return ser; + + return NULL; +} + +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_d= omain *domain) +{ + struct device_ser *ser =3D dev_iommu_restored_state(dev); + + if (ser && iommu_domain_restored_state(domain)) + return ser->domain_iommu_ser.did; + + return -1; +} + +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, + void *arg); +struct device_ser *iommu_get_device_preserved_data(struct device *dev); +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type t= ype); +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain= _ser **ser); +void iommu_domain_unpreserve(struct iommu_domain *domain); +int iommu_preserve_device(struct iommu_domain *domain, + struct device *dev, u64 token); +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *d= ev); +#else +static inline void *dev_iommu_preserved_state(struct device *dev) +{ + return NULL; +} + +static inline void *dev_iommu_restored_state(struct device *dev) +{ + return NULL; +} + +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_d= omain *domain) +{ + return -1; +} + +static inline void *iommu_domain_restored_state(struct iommu_domain *domai= n) +{ + return NULL; +} + +static inline int iommu_for_each_preserved_device(iommu_preserved_device_i= ter_fn fn, void *arg) +{ + return -EOPNOTSUPP; +} + +static inline struct device_ser *iommu_get_device_preserved_data(struct de= vice *dev) +{ + return NULL; +} + +static inline struct iommu_ser *iommu_get_preserved_data(u64 token, enum i= ommu_lu_type type) +{ + return NULL; +} + +static inline int iommu_domain_preserve(struct iommu_domain *domain, struc= t iommu_domain_ser **ser) +{ + return -EOPNOTSUPP; +} + +static inline void iommu_domain_unpreserve(struct iommu_domain *domain) +{ +} + +static inline int iommu_preserve_device(struct iommu_domain *domain, + struct device *dev, u64 token) +{ + return -EOPNOTSUPP; +} + +static inline void iommu_unpreserve_device(struct iommu_domain *domain, st= ruct device *dev) +{ +} +#endif + int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler); int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handle= r); =20 diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 54b8b48c762e..bd949c1ce7c5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,8 @@ #include #include #include +#include +#include #include =20 #define IOMMU_READ (1 << 0) @@ -248,6 +250,10 @@ struct iommu_domain { struct list_head next; }; }; + +#ifdef CONFIG_IOMMU_LIVEUPDATE + struct iommu_domain_ser *preserved_state; +#endif }; =20 static inline bool iommu_is_dma_domain(struct iommu_domain *domain) @@ -647,6 +653,10 @@ __iommu_copy_struct_to_user(const struct iommu_user_da= ta *dst_data, * resources shared/passed to user space IOMMU instance. Ass= ociate * it with a nesting @parent_domain. It is required for driv= er to * set @viommu->ops pointing to its own viommu_ops + * @preserve_device: Preserve state of a device for liveupdate. + * @unpreserve_device: Unpreserve state that was preserved earlier. + * @preserve: Preserve state of iommu translation hardware for liveupdate. + * @unpreserve: Unpreserve state of iommu that was preserved earlier. * @owner: Driver module providing these ops * @identity_domain: An always available, always attachable identity * translation. @@ -703,6 +713,11 @@ struct iommu_ops { struct iommu_domain *parent_domain, const struct iommu_user_data *user_data); =20 + int (*preserve_device)(struct device *dev, struct device_ser *device_ser); + void (*unpreserve_device)(struct device *dev, struct device_ser *device_s= er); + int (*preserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser); + void (*unpreserve)(struct iommu_device *iommu, struct iommu_ser *iommu_se= r); + const struct iommu_domain_ops *default_domain_ops; struct module *owner; struct iommu_domain *identity_domain; @@ -749,6 +764,11 @@ struct iommu_ops { * specific mechanisms. * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @free: Release the domain after use. + * @preserve: Preserve the iommu domain for liveupdate. + * Returns 0 on success, a negative errno on failure. + * @unpreserve: Unpreserve the iommu domain that was preserved earlier. + * @restore: Restore the iommu domain after liveupdate. + * Returns 0 on success, a negative errno on failure. */ struct iommu_domain_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev, @@ -779,6 +799,9 @@ struct iommu_domain_ops { unsigned long quirks); =20 void (*free)(struct iommu_domain *domain); + int (*preserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser= ); + void (*unpreserve)(struct iommu_domain *domain, struct iommu_domain_ser *= ser); + int (*restore)(struct iommu_domain *domain, struct iommu_domain_ser *ser); }; =20 /** @@ -790,6 +813,8 @@ struct iommu_domain_ops { * @singleton_group: Used internally for drivers that have only one group * @max_pasids: number of supported PASIDs * @ready: set once iommu_device_register() has completed successfully + * @outgoing_preserved_state: preserved iommu state of outgoing kernel for + * liveupdate. */ struct iommu_device { struct list_head list; @@ -799,6 +824,10 @@ struct iommu_device { struct iommu_group *singleton_group; u32 max_pasids; bool ready; + +#ifdef CONFIG_IOMMU_LIVEUPDATE + struct iommu_ser *outgoing_preserved_state; +#endif }; =20 /** @@ -853,6 +882,9 @@ struct dev_iommu { u32 pci_32bit_workaround:1; u32 require_direct:1; u32 shadow_on_flush:1; +#ifdef CONFIG_IOMMU_LIVEUPDATE + struct device_ser *device_ser; +#endif }; =20 int iommu_device_register(struct iommu_device *iommu, --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEC4137474D for ; Tue, 3 Feb 2026 22:09:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156598; cv=none; b=mqHiik5+gwtRc7KjJr0JBLIKD6t0cO2ocDOmOS/dxz+8Dy2fQ+4IzgxxFoFv5NeS3dzSra5RQYTz+qo5t2t/YLJT8Rpj0K2HzGMUQYpV38+FmWaydBiD+A8Qceaza+SJ+MfCFcPAonDjk71zlp6EbAb0lLeivxADUhAY7jHUcbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156598; c=relaxed/simple; bh=U2NeIRFaQ3Zq0bn+PGdbije6ufeaFsRzIb+El2Dmm70=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SKwfx4W1l6a/LmC+JBPPWpGFgIC7qjVzY9WIiP83D5s7pDGNkM758qf3Ea7J9jrIJqu5GYsXaZn/3ooWuzlfGtDIbc59HeIlneheDkjlbgWZ8RjILPNBZiHQBRWs/PzkPNVhxwpquelFdSQ+27Hu87U5sd1u24SmeeSXceCeVwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=X7SDxHg1; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="X7SDxHg1" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b6ce1b57b9cso3316009a12.1 for ; Tue, 03 Feb 2026 14:09:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156595; x=1770761395; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7JJwhexRKvGqKj8BmK4wLoDK8FPsWDJ7kCbEwN39qlQ=; b=X7SDxHg1hJz1i1K7/DGmt3D3V53aYOQ+7PoUFFv7giIT7gIphjv9b2zDeC1alQyfgp WfmeIIka5TQk7PzXmr2m7z1UlVxHkX+YoXbQQ8WBSxNNc6Wjege7+lVhf0vUwNRjedp3 0YevDou1ye9dVBobaIn0YMtLSI4QERzhgbbB+BcqAEFb3YJkzV1va5JHSWp75jW7HQ93 fT4A58apSzJsf48eQn3AWjr9UfxVvh/xxnj5zqaf5cgf8zTeYt3HGj5UnbmGY5p3c6kU dfBy5C0m1BtU/6LLU/6iDupzCLYb9v0abGSGxkqps0QuKZhSJJuA8zYC33lA8MOi02Bk 01sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156595; x=1770761395; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7JJwhexRKvGqKj8BmK4wLoDK8FPsWDJ7kCbEwN39qlQ=; b=j57Aoekr5YcS66l4bxbOfclpFqMkvR0Kv2DwYpqiPvhR4imh57bHEx1jc16xjXw5oV luhoMP9nbGgFNLM6ao26cUPp+XRJK3uKA8qcRGQct4mKpmrpGuEn/6Z+mrB3ltXYWpaF lLfVQszzA35vGi+LnEZibR7N+Up7vyB561k/JrlSdZ1lupEu6+U9y2yLs+Dz58B9S2W4 N5QMu4ct9Ixh2v5DiP61ORspOZDOKn769vH/+ibsn7vlgTEvhKWzpA9x8dS00eQf8kI1 KWQQ/x1jixHREMc3heyiGY8MV6eWFoyrOnMmNxDAKGZv1CyPSmYn7HQ7mOeCQywBGsme oF8A== X-Forwarded-Encrypted: i=1; AJvYcCVwFVJvJavxWzWuu0m9IKb/mtVmjMz46oVfBO4oFCCBac8yua5CZ6jLhuw6Flt2EdJlQzCB4uNBwD14KE8=@vger.kernel.org X-Gm-Message-State: AOJu0Ywnj7MOHKrpgi/aIkt5VT59tovqu2zft+5U5D0mqKcTV/mIERah xwxPBNVbfLhdvJreVK/oV6GTB/fwZ62dPLXmD9bDcfBYCCmAYVr3+238Pz3DAq6JBZnCOx/PUaJ Tm5fr+Cqjl1XGKg== X-Received: from pgbdv5.prod.google.com ([2002:a05:6a02:4465:b0:c65:e24e:cef1]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6b85:b0:364:be7:6ffc with SMTP id adf61e73a8af0-393720db58dmr925648637.18.1770156595253; Tue, 03 Feb 2026 14:09:55 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:37 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-4-skhawaja@google.com> Subject: [PATCH 03/14] liveupdate: luo_file: Add internal APIs for file preservation From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Pasha Tatashin , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Samiullah Khawaja , Pratyush Yadav , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pasha Tatashin The core liveupdate mechanism allows userspace to preserve file descriptors. However, kernel subsystems often manage struct file objects directly and need to participate in the preservation process programmatically without relying solely on userspace interaction. Signed-off-by: Pasha Tatashin --- include/linux/liveupdate.h | 21 ++++++++++ kernel/liveupdate/luo_file.c | 71 ++++++++++++++++++++++++++++++++ kernel/liveupdate/luo_internal.h | 16 +++++++ 3 files changed, 108 insertions(+) diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h index fe82a6c3005f..8e47504ba01e 100644 --- a/include/linux/liveupdate.h +++ b/include/linux/liveupdate.h @@ -23,6 +23,7 @@ struct file; /** * struct liveupdate_file_op_args - Arguments for file operation callbacks. * @handler: The file handler being called. + * @session: The session this file belongs to. * @retrieved: The retrieve status for the 'can_finish / finish' * operation. * @file: The file object. For retrieve: [OUT] The callback se= ts @@ -40,6 +41,7 @@ struct file; */ struct liveupdate_file_op_args { struct liveupdate_file_handler *handler; + struct liveupdate_session *session; bool retrieved; struct file *file; u64 serialized_data; @@ -234,6 +236,13 @@ int liveupdate_unregister_flb(struct liveupdate_file_h= andler *fh, =20 int liveupdate_flb_get_incoming(struct liveupdate_flb *flb, void **objp); int liveupdate_flb_get_outgoing(struct liveupdate_flb *flb, void **objp); +/* kernel can internally retrieve files */ +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token, + struct file **filep); + +/* Get a token for an outgoing file, or -ENOENT if file is not preserved */ +int liveupdate_get_token_outgoing(struct liveupdate_session *s, + struct file *file, u64 *tokenp); =20 #else /* CONFIG_LIVEUPDATE */ =20 @@ -281,5 +290,17 @@ static inline int liveupdate_flb_get_outgoing(struct l= iveupdate_flb *flb, return -EOPNOTSUPP; } =20 +static inline int liveupdate_get_file_incoming(struct liveupdate_session *= s, + u64 token, struct file **filep) +{ + return -EOPNOTSUPP; +} + +static inline int liveupdate_get_token_outgoing(struct liveupdate_session = *s, + struct file *file, u64 *tokenp) +{ + return -EOPNOTSUPP; +} + #endif /* CONFIG_LIVEUPDATE */ #endif /* _LINUX_LIVEUPDATE_H */ diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c index 32759e846bc9..7ac591542059 100644 --- a/kernel/liveupdate/luo_file.c +++ b/kernel/liveupdate/luo_file.c @@ -302,6 +302,7 @@ int luo_preserve_file(struct luo_file_set *file_set, u6= 4 token, int fd) mutex_init(&luo_file->mutex); =20 args.handler =3D fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D file; err =3D fh->ops->preserve(&args); if (err) @@ -355,6 +356,7 @@ void luo_file_unpreserve_files(struct luo_file_set *fil= e_set) struct luo_file, list); =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -383,6 +385,7 @@ static int luo_file_freeze_one(struct luo_file_set *fil= e_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -404,6 +407,7 @@ static void luo_file_unfreeze_one(struct luo_file_set *= file_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.private_data =3D luo_file->private_data; @@ -590,6 +594,7 @@ int luo_retrieve_file(struct luo_file_set *file_set, u6= 4 token, } =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.serialized_data =3D luo_file->serialized_data; err =3D luo_file->fh->ops->retrieve(&args); if (!err) { @@ -615,6 +620,7 @@ static int luo_file_can_finish_one(struct luo_file_set = *file_set, struct liveupdate_file_op_args args =3D {0}; =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.retrieved =3D luo_file->retrieved; @@ -632,6 +638,7 @@ static void luo_file_finish_one(struct luo_file_set *fi= le_set, guard(mutex)(&luo_file->mutex); =20 args.handler =3D luo_file->fh; + args.session =3D luo_session_from_file_set(file_set); args.file =3D luo_file->file; args.serialized_data =3D luo_file->serialized_data; args.retrieved =3D luo_file->retrieved; @@ -919,3 +926,67 @@ int liveupdate_unregister_file_handler(struct liveupda= te_file_handler *fh) return err; } EXPORT_SYMBOL_GPL(liveupdate_unregister_file_handler); + +/** + * liveupdate_get_token_outgoing - Get the token for a preserved file. + * @s: The outgoing liveupdate session. + * @file: The file object to search for. + * @tokenp: Output parameter for the found token. + * + * Searches the list of preserved files in an outgoing session for a match= ing + * file object. If found, the corresponding user-provided token is returne= d. + * + * This function is intended for in-kernel callers that need to correlate a + * file with its liveupdate token. + * + * Context: Can be called from any context that can acquire the session mu= tex. + * Return: 0 on success, -ENOENT if the file is not preserved in this sess= ion. + */ +int liveupdate_get_token_outgoing(struct liveupdate_session *s, + struct file *file, u64 *tokenp) +{ + struct luo_file_set *file_set =3D luo_file_set_from_session(s); + struct luo_file *luo_file; + int err =3D -ENOENT; + + list_for_each_entry(luo_file, &file_set->files_list, list) { + if (luo_file->file =3D=3D file) { + if (tokenp) + *tokenp =3D luo_file->token; + err =3D 0; + break; + } + } + + return err; +} + +/** + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel= use. + * @s: The incoming liveupdate session (restored from the previous ke= rnel). + * @token: The unique token identifying the file to retrieve. + * @filep: On success, this will be populated with a pointer to the retri= eved + * 'struct file'. + * + * Provides a kernel-internal API for other subsystems to retrieve their + * preserved files after a live update. This function is a simple wrapper + * around luo_retrieve_file(), allowing callers to find a file by its toke= n. + * + * The operation is idempotent; subsequent calls for the same token will r= eturn + * a pointer to the same 'struct file' object. + * + * The caller receives a new reference to the file and must call fput() wh= en it + * is no longer needed. The file's lifetime is managed by LUO and any user= space + * file descriptors. If the caller needs to hold a reference to the file b= eyond + * the immediate scope, it must call get_file() itself. + * + * Context: Can be called from any context in the new kernel that has a ha= ndle + * to a restored session. + * Return: 0 on success. Returns -ENOENT if no file with the matching toke= n is + * found, or any other negative errno on failure. + */ +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token, + struct file **filep) +{ + return luo_retrieve_file(luo_file_set_from_session(s), token, filep); +} diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_inter= nal.h index 8083d8739b09..a24933d24fd9 100644 --- a/kernel/liveupdate/luo_internal.h +++ b/kernel/liveupdate/luo_internal.h @@ -77,6 +77,22 @@ struct luo_session { struct mutex mutex; }; =20 +static inline struct liveupdate_session *luo_session_from_file_set(struct = luo_file_set *file_set) +{ + struct luo_session *session; + + session =3D container_of(file_set, struct luo_session, file_set); + + return (struct liveupdate_session *)session; +} + +static inline struct luo_file_set *luo_file_set_from_session(struct liveup= date_session *s) +{ + struct luo_session *session =3D (struct luo_session *)s; + + return &session->file_set; +} + int luo_session_create(const char *name, struct file **filep); int luo_session_retrieve(const char *name, struct file **filep); int __init luo_session_setup_outgoing(void *fdt); --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CD88377541 for ; Tue, 3 Feb 2026 22:09:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156599; cv=none; b=TUrrDCFTOhcv2NL9sPI8y7xSZCiCC2+0+BFAFxCfwoa/RzW7Xe636TM+uiiesrznjRZcFznjqUNfkHSF5Gg+8N+Tcnw5x+OORrxZy+2E+OtvHUXEkBGk715LsAl7lrF7MM5T/EXzWkeZb22RKBccD+q/yhEJQKbJ/Eic3IyGKjs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156599; c=relaxed/simple; bh=5oimeBpjR9ySAyBwN7duv7K3nB4HOiM5cRVJaWv274o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lQVUnz6KgJF9y6yzqGaPZIUo3YnrxitcZHBZ4zrkl37kF67n9YnQxcwhNbR7ZoMMsFDmsnPLOAlD4a4i6UGkGcUzPTT8rR4fy4vfsrFuV4QCoXVxQIvzHsNw9ay4qpYJ34o8zLKlJoBBdXFpujk63nTWrdfH8ii/pMrJiW86lvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VodX9BEL; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VodX9BEL" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a78c094ad6so62736205ad.1 for ; Tue, 03 Feb 2026 14:09:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156597; x=1770761397; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Lp/+9SAp3L8mfYjwbkh3T/evpw/YswqyzsO+xOxb2eg=; b=VodX9BELBn/e27xV70jzzO9+sOeUoNAPP9z8FQh07P0fTXS5f1q367/em3ZJxhFEaM hKzjkld4V0tYy480msA/MLdJfoIr1g7SOvliskoqU/WsYdPlVNqp2fxlZePMxYL+ucmT gTdKfUf+irKZQ96s5lKfcpVtKOb3uDxJYrsV3pY0qHWIxNMJUx9LP0f4tnmDXakhFFbz 0Kwog7lmmFflb2UJYZf4f27bu9Yx0fHYA0xq72+hl+bxHdnufl2Fy1VbXdj5+ra3sO+Y UIgFx0kYnoW+e+q12h7GNE1OZ45mKIOId0LEhNzxwOubr5937Qf7xqwCVW6sxPwHpU/w KAGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156597; x=1770761397; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Lp/+9SAp3L8mfYjwbkh3T/evpw/YswqyzsO+xOxb2eg=; b=a2qEsVPeq26luOp1Sh+sHouceCGxQygajYJ8ClWgim6CR4BkZGVpmL4+0VBmkCOu/Q Qj9Jskayru4yRYCYq+8ePsfW/9+PjRg5B7+QzmHDhXuBRDG18LBbjkXSpIKpVIqFR9O4 VlSnT0Zc9NOJlW5m0WzFUK0f7GWdZ9qcor5svNRd8jVJHBpF7fmY6lfC4jmUtjDrfI2d SK/NmsdQ0zSAppt8lJBERjsyMIEJNtT2EpsjY0sdgkntgVDk98bsX3cjj/cDtEZ5Flwm LR3tgvWrq652Wyc/+Vpnx2uLTs6uFfq6zaFtNvoPF3/VeT8Ni5jFcjkaBwstHRV1HZuW C/2Q== X-Forwarded-Encrypted: i=1; AJvYcCVeE7F5+fmUR8TAMlgCaxJPCAKSA1NTKntqI8hlNYgkcknJhLZdVsZTatXHB2h9votjBJnsMnXAqDM+vRo=@vger.kernel.org X-Gm-Message-State: AOJu0YzdbTX7kmustyJAFKDWSMYsm67SLu41aMQLzWcb8f7iLvQcVwom 42AAQQoJT/10a7AQDkB5x9vLUeyRhbT+broW5V8ZGoPc5iRKtf3F5x83XlhgSXpBA4lm6hFIe0T aHpcfRu0T+oRa6g== X-Received: from plbmf3.prod.google.com ([2002:a17:902:fc83:b0:2a1:1c0e:70b9]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:c411:b0:295:b46f:a6c2 with SMTP id d9443c01a7336-2a933fba80bmr6525215ad.37.1770156597010; Tue, 03 Feb 2026 14:09:57 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:38 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-5-skhawaja@google.com> Subject: [PATCH 04/14] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" IOMMU pages are allocated/freed using APIs using struct ioptdesc. For the proper preservation and restoration of ioptdesc add helper functions. Signed-off-by: Samiullah Khawaja --- drivers/iommu/iommu-pages.c | 74 +++++++++++++++++++++++++++++++++++++ drivers/iommu/iommu-pages.h | 30 +++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c index 3bab175d8557..588a8f19b196 100644 --- a/drivers/iommu/iommu-pages.c +++ b/drivers/iommu/iommu-pages.c @@ -6,6 +6,7 @@ #include "iommu-pages.h" #include #include +#include #include =20 #define IOPTDESC_MATCH(pg_elm, elm) \ @@ -131,6 +132,79 @@ void iommu_put_pages_list(struct iommu_pages_list *lis= t) } EXPORT_SYMBOL_GPL(iommu_put_pages_list); =20 +#if IS_ENABLED(CONFIG_IOMMU_LIVEUPDATE) +void iommu_unpreserve_page(void *virt) +{ + kho_unpreserve_folio(ioptdesc_folio(virt_to_ioptdesc(virt))); +} +EXPORT_SYMBOL_GPL(iommu_unpreserve_page); + +int iommu_preserve_page(void *virt) +{ + return kho_preserve_folio(ioptdesc_folio(virt_to_ioptdesc(virt))); +} +EXPORT_SYMBOL_GPL(iommu_preserve_page); + +void iommu_unpreserve_pages(struct iommu_pages_list *list, int count) +{ + struct ioptdesc *iopt; + + if (!count) + return; + + /* If less than zero then unpreserve all pages. */ + if (count < 0) + count =3D 0; + + list_for_each_entry(iopt, &list->pages, iopt_freelist_elm) { + kho_unpreserve_folio(ioptdesc_folio(iopt)); + if (count > 0 && --count =3D=3D 0) + break; + } +} +EXPORT_SYMBOL_GPL(iommu_unpreserve_pages); + +void iommu_restore_page(u64 phys) +{ + struct ioptdesc *iopt; + struct folio *folio; + unsigned long pgcnt; + unsigned int order; + + folio =3D kho_restore_folio(phys); + BUG_ON(!folio); + + iopt =3D folio_ioptdesc(folio); + + order =3D folio_order(folio); + pgcnt =3D 1UL << order; + mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt); + lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt); +} +EXPORT_SYMBOL_GPL(iommu_restore_page); + +int iommu_preserve_pages(struct iommu_pages_list *list) +{ + struct ioptdesc *iopt; + int count =3D 0; + int ret; + + list_for_each_entry(iopt, &list->pages, iopt_freelist_elm) { + ret =3D kho_preserve_folio(ioptdesc_folio(iopt)); + if (ret) { + iommu_unpreserve_pages(list, count); + return ret; + } + + ++count; + } + + return 0; +} +EXPORT_SYMBOL_GPL(iommu_preserve_pages); + +#endif + /** * iommu_pages_start_incoherent - Setup the page for cache incoherent oper= ation * @virt: The page to setup diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h index ae9da4f571f6..bd336fb56b5f 100644 --- a/drivers/iommu/iommu-pages.h +++ b/drivers/iommu/iommu-pages.h @@ -53,6 +53,36 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size= _t size); void iommu_free_pages(void *virt); void iommu_put_pages_list(struct iommu_pages_list *list); =20 +#if IS_ENABLED(CONFIG_IOMMU_LIVEUPDATE) +int iommu_preserve_page(void *virt); +void iommu_unpreserve_page(void *virt); +int iommu_preserve_pages(struct iommu_pages_list *list); +void iommu_unpreserve_pages(struct iommu_pages_list *list, int count); +void iommu_restore_page(u64 phys); +#else +static inline int iommu_preserve_page(void *virt) +{ + return -EOPNOTSUPP; +} + +static inline void iommu_unpreserve_page(void *virt) +{ +} + +static inline int iommu_preserve_pages(struct iommu_pages_list *list) +{ + return -EOPNOTSUPP; +} + +static inline void iommu_unpreserve_pages(struct iommu_pages_list *list, i= nt count) +{ +} + +static inline void iommu_restore_page(u64 phys) +{ +} +#endif + /** * iommu_pages_list_add - add the page to a iommu_pages_list * @list: List to add the page to --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D088378822 for ; Tue, 3 Feb 2026 22:09:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156600; cv=none; b=rAjG0gHYupKEQY38fm0QTfuhCYZmON2kcmRnQgcOj2bIyaSlXAMU4epHpdya3O8JZVrIh0ERIvvJUSCSa5TQuihZfA9OnMr+fjA3B1ahjP4QNRxdnDmW7Q3q62tuPvuy8YNJZaoho1jbvVEIngFhlLpHE9AHxmGI62LQKfUT+GI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156600; c=relaxed/simple; bh=XmH949+cFNYEHjPb6Flo1g0WANgdqrWkFIs/6XqOnbo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=c+J4Br+ZfuKtRPdjnTongQmEI+zPgdNqiYKe2BNAMAI8yOcL84PF9NY03ryZ91Wpz7WYfkf5Szqz28na/UCQ36UteZ5ajEaj9Uj91lhEZVrmNCYNZwfe/nyHKIE4YFudPdF/W33QBrF6r4y84GP0p1i/bvedXnJ5L5sQp8gZ1sQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=G0HCQ5zc; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="G0HCQ5zc" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a377e15716so162983795ad.3 for ; Tue, 03 Feb 2026 14:09:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156598; x=1770761398; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yiKJgtR9ONUoXQPN4oexiJxFq7KWJtr+8+pAjx8UEac=; b=G0HCQ5zcM0mFtsB9CivvieMXS54Ms68VJ3V+S54m4DUsk86SjMyjQNybGwxhaYfLAw EDoCajjptvM3QUe8Z5FX9QyX2+3/a8b5Z7W6FS0QHUmM4h6UDPpc40J9oD2EpKa9JkqP 1DSSS0D2HBlV/OZbn1FSfiD7Bsp2fqBYLISQnHiQfJaLS6hzDG6rtHH3er8KhmZsSb1q 0KbLtkb6DOg0ZFithjlzXfHWIzO2aPdF/6bp/0mV3s52EUYRwDHoSPeoRp/PHvafeZt3 07jY5YhGpIbCQEqahVlMtemoIMiThFMLICGIpxhSFXs90dGhAUxtsJJsDMCCPL8BpDMO ahVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156598; x=1770761398; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yiKJgtR9ONUoXQPN4oexiJxFq7KWJtr+8+pAjx8UEac=; b=kLrVSNNlk8jT2Xnc8+0mU7BhJZxTQBXVk4x3Y+fBNd9LHd4GSW59yEuXZYCZP23e7/ kxfN5t8ORcTBNIHejqp59MgFXV6fPc7gpfZwKZCnPEgiYrvkKJ3pKQ86y883KBjwRWEt QdSL7siYhMl4D4GUjf3+29OL5dq1gpI+RDeVMIg8vyZ6Xqc+M/W1QcepHqtx1VotDSrk MyS9d4xRb0y1vGnzTOt4bqIIBOQLRt5CO/p4jeOfTCNa7Ye+xVLFrOc8PTA+iHlD7IhJ 3CRa51ZN7FF8wLfk+PLwyAYFosiFNFzMbzerAZVpbD+wegr6L8vbAlsCx5xyxNIXrD1d tcOw== X-Forwarded-Encrypted: i=1; AJvYcCXJVibDZ67a7DC/xKEQsRNQPaKpC8k2ggCoTkNUuxodHcPLmhAEj5Cz4djdnc+UKqq6RPLVzIbUe43iPus=@vger.kernel.org X-Gm-Message-State: AOJu0YxgqY9k29ZKbyXLxzXhY4ce5kfjEXpwN14cF/WW1ekON2uarLva 8BHFx9AxTEJs2Q/PiOsZGfdZoXpfAJ3Wd9iukdFu7Z2plOqZsvBDw35Er/xKh6RfGCk7d+Z0LiE p+ijDMKndBdagmw== X-Received: from plgz17.prod.google.com ([2002:a17:903:191:b0:2a7:8c71:aa97]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1ac3:b0:2a0:d629:9035 with SMTP id d9443c01a7336-2a933bbe729mr7724415ad.3.1770156598471; Tue, 03 Feb 2026 14:09:58 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:39 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-6-skhawaja@google.com> Subject: [PATCH 05/14] iommupt: Implement preserve/unpreserve/restore callbacks From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the iommu domain ops for presevation, unpresevation and restoration of iommu domains for liveupdate. Use the existing page walker to preserve the ioptdesc of the top_table and the lower tables. Preserve the top_level also so it can be restored during boot. Signed-off-by: Samiullah Khawaja --- drivers/iommu/generic_pt/iommu_pt.h | 96 +++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 10 +++ 2 files changed, 106 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt= /iommu_pt.h index 3327116a441c..0a1adb6312dd 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -921,6 +921,102 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain,= unsigned long iova, } EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(map_pages), "GENERIC_PT_IOMMU"); =20 +/** + * unpreserve() - Unpreserve page tables and other state of a domain. + * @domain: Domain to unpreserve + */ +void DOMAIN_NS(unpreserve)(struct iommu_domain *domain, struct iommu_domai= n_ser *ser) +{ + struct pt_iommu *iommu_table =3D + container_of(domain, struct pt_iommu, domain); + struct pt_common *common =3D common_from_iommu(iommu_table); + struct pt_range range =3D pt_all_range(common); + struct pt_iommu_collect_args collect =3D { + .free_list =3D IOMMU_PAGES_LIST_INIT(collect.free_list), + }; + + iommu_pages_list_add(&collect.free_list, range.top_table); + pt_walk_range(&range, __collect_tables, &collect); + + iommu_unpreserve_pages(&collect.free_list, -1); +} +EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(unpreserve), "GENERIC_PT_IOMMU"); + +/** + * preserve() - Preserve page tables and other state of a domain. + * @domain: Domain to preserve + * + * Returns: -ERRNO on failure, on success. + */ +int DOMAIN_NS(preserve)(struct iommu_domain *domain, struct iommu_domain_s= er *ser) +{ + struct pt_iommu *iommu_table =3D + container_of(domain, struct pt_iommu, domain); + struct pt_common *common =3D common_from_iommu(iommu_table); + struct pt_range range =3D pt_all_range(common); + struct pt_iommu_collect_args collect =3D { + .free_list =3D IOMMU_PAGES_LIST_INIT(collect.free_list), + }; + int ret; + + iommu_pages_list_add(&collect.free_list, range.top_table); + pt_walk_range(&range, __collect_tables, &collect); + + ret =3D iommu_preserve_pages(&collect.free_list); + if (ret) + return ret; + + ser->top_table =3D virt_to_phys(range.top_table); + ser->top_level =3D range.top_level; + + return 0; +} +EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(preserve), "GENERIC_PT_IOMMU"); + +static int __restore_tables(struct pt_range *range, void *arg, + unsigned int level, struct pt_table_p *table) +{ + struct pt_state pts =3D pt_init(range, level, table); + int ret; + + for_each_pt_level_entry(&pts) { + if (pts.type =3D=3D PT_ENTRY_TABLE) { + iommu_restore_page(virt_to_phys(pts.table_lower)); + ret =3D pt_descend(&pts, arg, __restore_tables); + if (ret) + return ret; + } + } + return 0; +} + +/** + * restore() - Restore page tables and other state of a domain. + * @domain: Domain to preserve + * + * Returns: -ERRNO on failure, on success. + */ +int DOMAIN_NS(restore)(struct iommu_domain *domain, struct iommu_domain_se= r *ser) +{ + struct pt_iommu *iommu_table =3D + container_of(domain, struct pt_iommu, domain); + struct pt_common *common =3D common_from_iommu(iommu_table); + struct pt_range range =3D pt_all_range(common); + + iommu_restore_page(ser->top_table); + + /* Free new table */ + iommu_free_pages(range.top_table); + + /* Set the restored top table */ + pt_top_set(common, phys_to_virt(ser->top_table), ser->top_level); + + /* Restore all pages*/ + range =3D pt_all_range(common); + return pt_walk_range(&range, __restore_tables, NULL); +} +EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(restore), "GENERIC_PT_IOMMU"); + struct pt_unmap_args { struct iommu_pages_list free_list; pt_vaddr_t unmapped; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/io= mmu.h index 9eefbb74efd0..b824a8642571 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -13,6 +13,7 @@ struct iommu_iotlb_gather; struct pt_iommu_ops; struct pt_iommu_driver_ops; struct iommu_dirty_bitmap; +struct iommu_domain_ser; =20 /** * DOC: IOMMU Radix Page Table @@ -198,6 +199,12 @@ struct pt_iommu_cfg { unsigned long iova, phys_addr_t paddr, \ size_t pgsize, size_t pgcount, \ int prot, gfp_t gfp, size_t *mapped); \ + int pt_iommu_##fmt##_preserve(struct iommu_domain *domain, \ + struct iommu_domain_ser *ser); \ + void pt_iommu_##fmt##_unpreserve(struct iommu_domain *domain, \ + struct iommu_domain_ser *ser); \ + int pt_iommu_##fmt##_restore(struct iommu_domain *domain, \ + struct iommu_domain_ser *ser); \ size_t pt_iommu_##fmt##_unmap_pages( \ struct iommu_domain *domain, unsigned long iova, \ size_t pgsize, size_t pgcount, \ @@ -224,6 +231,9 @@ struct pt_iommu_cfg { #define IOMMU_PT_DOMAIN_OPS(fmt) \ .iova_to_phys =3D &pt_iommu_##fmt##_iova_to_phys, \ .map_pages =3D &pt_iommu_##fmt##_map_pages, \ + .preserve =3D &pt_iommu_##fmt##_preserve, \ + .unpreserve =3D &pt_iommu_##fmt##_unpreserve, \ + .restore =3D &pt_iommu_##fmt##_restore, \ .unmap_pages =3D &pt_iommu_##fmt##_unmap_pages #define IOMMU_PT_DIRTY_OPS(fmt) \ .read_and_clear_dirty =3D &pt_iommu_##fmt##_read_and_clear_dirty --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEEE237A4B5 for ; Tue, 3 Feb 2026 22:10:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156602; cv=none; b=Fo5haLIBXnxThpWK/5f2Vf9IjMOHT3zGm1UVoXm2FDlSnb50oYS9YsZ8SlkoG2BpzPyRyUAwD2c6Uqr04JnCZR8/GgHmPEU2Ac9qxiJXljzhKGKO+lvQS/8hO8ud7uIU47ZWdX9XT3jwPyu3nlLnpgCI31ezHvnf2XeVTNyuYpQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156602; c=relaxed/simple; bh=9LSa2PONwk7+FcHvoNLiH1UnVTuP4jXRhy/pPDerIH8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gqjxN9tr9WkLUQmlR9siczMqkaA2XHarcv6l86fuPYPcb0BM50+/gvPzlv0AMC05JQswzRAHUVrXjmhziuq03ye0nDhBnA0SxNcri3z3G7maga6C6StRU5kqg6gqCSO5G7n+8PC4E0dvpq9A864rGOubdsrhSD/4C4GOrL9vxU4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KCiM12B/; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KCiM12B/" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a0a4b748a0so130567675ad.1 for ; Tue, 03 Feb 2026 14:10:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156600; x=1770761400; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eGH1n4CmJ6QyV+GMAsNpvN02UPxRfAHS78Vd94dCot0=; b=KCiM12B/pFSG+zwSv4sSNXvSrDNEM3uem/8E6f9EkC6bjAuFZ6as5K86pO1cHCTaXk ECUX3VbBToPuXKQQyGatAs9WeFDcabJvqcRUJN9Fu/F+q+6hitVv1IqgBEYD4PYdE63T i3Hu3A++t5w+mKRIQw8sa+VL2GjpkREGLPzff+Azq4BI5x2JoF20FtykSZXGyr/hWW+O V3zMjovvv4hgnCdrJJW+JHwejxdOX1JjVkyfIA+m2BOXkQlTQff0uea32SVGrVTeOcLH M2WCdDc6b3mYvNCoXQZ7XUHTjk9TPJI4xxZgUhUgQS1qHMtnlkHNGXGIEPEYUvvYZQ1a vGNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156600; x=1770761400; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eGH1n4CmJ6QyV+GMAsNpvN02UPxRfAHS78Vd94dCot0=; b=an23maAfX6tA627gIDhkyuUqTKTOkomTBLfamRFosZtHMz0LtsxmbqtH8Cks2rpew6 skXcpOHKccAbZAvXsdC+nbYYb3whIqEIrpXvz6+T/vRbBHeoIYRdqYV69iqF/Gxxgo6l 6J93hUaqkN8u4p2419P+iDy7ONJUDKyYCoXu9eoLpPhufQDuvwDPrx6/jXSbEpApGRgd TUHSr7I/ODUaJUSxy14wtlcXvobtRCTUM3a6409rNLvW1sWMomRgLgXGkHE6pOWTFNVo 9okHf1eV2eVz3Xhqb8FeBMiLn7jQr0HAnTELYJxDmbbpemcSlPk5cORsGa+MLaIQUaaa +zJA== X-Forwarded-Encrypted: i=1; AJvYcCVv00cWWOsEx2D3Ce1nGcKuoUdX9GRsbRALR867AlFoSSjxUgBFI+bMPRJfMKCjkeBeqG/fDYFJY8T7TTA=@vger.kernel.org X-Gm-Message-State: AOJu0YzIxvo/fVstiEnaYfaDtt1Y9KuAmsP691o+MfK16s8+S/+il1jM Z17orQjBlnRk+wtQvRgY1AHYNp4ZYXgnv4KbvFq2mxWTN8jWMVkgXdchkLOVHwPegPXQQGTBLaE kO7IhmmpL8i7hXA== X-Received: from plbb8.prod.google.com ([2002:a17:903:c08:b0:2a7:cf29:aee1]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e94c:b0:2a7:fe78:a344 with SMTP id d9443c01a7336-2a933cdd07dmr7595525ad.6.1770156600054; Tue, 03 Feb 2026 14:10:00 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:40 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-7-skhawaja@google.com> Subject: [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add implementation of the device and iommu presevation in a separate file. Also set the device and iommu preserve/unpreserve ops in the struct iommu_ops. During normal shutdown the iommu translation is disabled. Since the root table is preserved during live update, it needs to be cleaned up and the context entries of the unpreserved devices need to be cleared. Signed-off-by: Samiullah Khawaja --- drivers/iommu/intel/Makefile | 1 + drivers/iommu/intel/iommu.c | 47 ++++++++++- drivers/iommu/intel/iommu.h | 27 +++++++ drivers/iommu/intel/liveupdate.c | 134 +++++++++++++++++++++++++++++++ 4 files changed, 205 insertions(+), 4 deletions(-) create mode 100644 drivers/iommu/intel/liveupdate.c diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile index ada651c4a01b..d38fc101bc35 100644 --- a/drivers/iommu/intel/Makefile +++ b/drivers/iommu/intel/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) +=3D debugfs.o obj-$(CONFIG_INTEL_IOMMU_SVM) +=3D svm.o obj-$(CONFIG_IRQ_REMAP) +=3D irq_remapping.o obj-$(CONFIG_INTEL_IOMMU_PERF_EVENTS) +=3D perfmon.o +obj-$(CONFIG_IOMMU_LIVEUPDATE) +=3D liveupdate.o diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 134302fbcd92..c95de93fb72f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -52,6 +53,8 @@ static int rwbf_quirk; =20 #define rwbf_required(iommu) (rwbf_quirk || cap_rwbf((iommu)->cap)) =20 +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *= iommu); + /* * set to 1 to panic kernel if can't successfully enable VT-d * (used when kernel is launched w/ TXT) @@ -60,8 +63,6 @@ static int force_on =3D 0; static int intel_iommu_tboot_noforce; static int no_platform_optin; =20 -#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry)) - /* * Take a root_entry and return the Lower Context Table Pointer (LCTP) * if marked present. @@ -2378,8 +2379,10 @@ void intel_iommu_shutdown(void) /* Disable PMRs explicitly here. */ iommu_disable_protect_mem_regions(iommu); =20 - /* Make sure the IOMMUs are switched off */ - iommu_disable_translation(iommu); + if (!__maybe_clean_unpreserved_context_entries(iommu)) { + /* Make sure the IOMMUs are switched off */ + iommu_disable_translation(iommu); + } } } =20 @@ -2902,6 +2905,38 @@ static const struct iommu_dirty_ops intel_second_sta= ge_dirty_ops =3D { .set_dirty_tracking =3D intel_iommu_set_dirty_tracking, }; =20 +#ifdef CONFIG_IOMMU_LIVEUPDATE +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *= iommu) +{ + struct device_domain_info *info; + struct pci_dev *pdev =3D NULL; + + if (!iommu->iommu.outgoing_preserved_state) + return false; + + for_each_pci_dev(pdev) { + info =3D dev_iommu_priv_get(&pdev->dev); + if (!info) + continue; + + if (info->iommu !=3D iommu) + continue; + + if (dev_iommu_preserved_state(&pdev->dev)) + continue; + + domain_context_clear(info); + } + + return true; +} +#else +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *= iommu) +{ + return false; +} +#endif + static struct iommu_domain * intel_iommu_domain_alloc_second_stage(struct device *dev, struct intel_iommu *iommu, u32 flags) @@ -3925,6 +3960,10 @@ const struct iommu_ops intel_iommu_ops =3D { .is_attach_deferred =3D intel_iommu_is_attach_deferred, .def_domain_type =3D device_def_domain_type, .page_response =3D intel_iommu_page_response, + .preserve_device =3D intel_iommu_preserve_device, + .unpreserve_device =3D intel_iommu_unpreserve_device, + .preserve =3D intel_iommu_preserve, + .unpreserve =3D intel_iommu_unpreserve, }; =20 static void quirk_iommu_igfx(struct pci_dev *dev) diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 25c5e22096d4..70032e86437d 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -557,6 +557,8 @@ struct root_entry { u64 hi; }; =20 +#define ROOT_ENTRY_NR (VTD_PAGE_SIZE / sizeof(struct root_entry)) + /* * low 64 bits: * 0: present @@ -1276,6 +1278,31 @@ static inline int iopf_for_domain_replace(struct iom= mu_domain *new, return 0; } =20 +#ifdef CONFIG_IOMMU_LIVEUPDATE +int intel_iommu_preserve_device(struct device *dev, struct device_ser *dev= ice_ser); +void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *= device_ser); +int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_ser *iom= mu_ser); +void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *= iommu_ser); +#else +static inline int intel_iommu_preserve_device(struct device *dev, struct d= evice_ser *device_ser) +{ + return -EOPNOTSUPP; +} + +static inline void intel_iommu_unpreserve_device(struct device *dev, struc= t device_ser *device_ser) +{ +} + +static inline int intel_iommu_preserve(struct iommu_device *iommu, struct = iommu_ser *iommu_ser) +{ + return -EOPNOTSUPP; +} + +static inline void intel_iommu_unpreserve(struct iommu_device *iommu, stru= ct iommu_ser *iommu_ser) +{ +} +#endif + #ifdef CONFIG_INTEL_IOMMU_SVM void intel_svm_check(struct intel_iommu *iommu); struct iommu_domain *intel_svm_domain_alloc(struct device *dev, diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupd= ate.c new file mode 100644 index 000000000000..82ba1daf1711 --- /dev/null +++ b/drivers/iommu/intel/liveupdate.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (C) 2025, Google LLC + * Author: Samiullah Khawaja + */ + +#define pr_fmt(fmt) "iommu: liveupdate: " fmt + +#include +#include +#include +#include +#include + +#include "iommu.h" +#include "../iommu-pages.h" + +static void unpreserve_iommu_context(struct intel_iommu *iommu, int end) +{ + struct context_entry *context; + int i; + + if (end < 0) + end =3D ROOT_ENTRY_NR; + + for (i =3D 0; i < end; i++) { + context =3D iommu_context_addr(iommu, i, 0, 0); + if (context) + iommu_unpreserve_page(context); + + if (!sm_supported(iommu)) + continue; + + context =3D iommu_context_addr(iommu, i, 0x80, 0); + if (context) + iommu_unpreserve_page(context); + } +} + +static int preserve_iommu_context(struct intel_iommu *iommu) +{ + struct context_entry *context; + int ret; + int i; + + for (i =3D 0; i < ROOT_ENTRY_NR; i++) { + context =3D iommu_context_addr(iommu, i, 0, 0); + if (context) { + ret =3D iommu_preserve_page(context); + if (ret) + goto error; + } + + if (!sm_supported(iommu)) + continue; + + context =3D iommu_context_addr(iommu, i, 0x80, 0); + if (context) { + ret =3D iommu_preserve_page(context); + if (ret) + goto error_sm; + } + } + + return 0; + +error_sm: + context =3D iommu_context_addr(iommu, i, 0, 0); + iommu_unpreserve_page(context); +error: + unpreserve_iommu_context(iommu, i); + return ret; +} + +int intel_iommu_preserve_device(struct device *dev, struct device_ser *dev= ice_ser) +{ + struct device_domain_info *info =3D dev_iommu_priv_get(dev); + + if (!dev_is_pci(dev)) + return -EOPNOTSUPP; + + if (!info) + return -EINVAL; + + device_ser->domain_iommu_ser.did =3D domain_id_iommu(info->domain, info->= iommu); + return 0; +} + +void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *= device_ser) +{ +} + +int intel_iommu_preserve(struct iommu_device *iommu_dev, struct iommu_ser = *ser) +{ + struct intel_iommu *iommu; + int ret; + + iommu =3D container_of(iommu_dev, struct intel_iommu, iommu); + + spin_lock(&iommu->lock); + ret =3D preserve_iommu_context(iommu); + if (ret) + goto err; + + ret =3D iommu_preserve_page(iommu->root_entry); + if (ret) { + unpreserve_iommu_context(iommu, -1); + goto err; + } + + ser->intel.phys_addr =3D iommu->reg_phys; + ser->intel.root_table =3D __pa(iommu->root_entry); + ser->type =3D IOMMU_INTEL; + ser->token =3D ser->intel.phys_addr; + spin_unlock(&iommu->lock); + + return 0; +err: + spin_unlock(&iommu->lock); + return ret; +} + +void intel_iommu_unpreserve(struct iommu_device *iommu_dev, struct iommu_s= er *iommu_ser) +{ + struct intel_iommu *iommu; + + iommu =3D container_of(iommu_dev, struct intel_iommu, iommu); + + spin_lock(&iommu->lock); + unpreserve_iommu_context(iommu, -1); + iommu_unpreserve_page(iommu->root_entry); + spin_unlock(&iommu->lock); +} --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 315D437BE9C for ; Tue, 3 Feb 2026 22:10:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156603; cv=none; b=XIAqLY/LnM/2wR24WGMvaI6RSXoUwuPuZ8AT/2msuhmHn8jArO9RI40ibPPWdDT9KpyOd2/77c/0iSzxT6W3RheELOzop/x1fwQUURawZjlW0L9o8kjUl1CDwUqUZrgP17Q+iDmbQyZLXrcRusBgB9gX2dTeSTlZkpPLx9E+NP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156603; c=relaxed/simple; bh=qPSLNvG5EvI+RYNmb6EVFUAJ74gpknCj2QGMG3t3XLU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=OVS1NqAoZZ1dTAgwCh26fwmsAKOoPnr8yqkpmSDVqTjdB1O60QIk/J9Uua2al2OXq1sgsauzKYgjwynJzYdGuOe39tBvrRjqrjXWVQKUq6WQdtAIkMwRF3LoIUq1JY+23yuSdtn3WVkRkf0vJLYCMkTVO6S0vMQPLzm/0pECBis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=qV5463o/; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qV5463o/" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a8c54bbe46so125909965ad.2 for ; Tue, 03 Feb 2026 14:10:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156602; x=1770761402; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=laeArOoHMmZZpQbmwN1GpGAXEElz6gu2HEz9aEbOa+M=; b=qV5463o/EF5ElAbpmg2OjciKn9xU3F2cSxfAE2emSGhKMBR/knegJgib142oyIDnCY BMQf6G5Dbdjls0/OvsRzZmvxLQbTnnYvu6mGrYdOQkwVNh6leJSn8uc03nIK1Md8InLp 8IZS8rJaisEUggCPIPYw5gAUqVx2JigzO80UXyMrpARUPC+3r8vTG9UZeEqSNUjwk8iO wTN3hketVxLpEasgOEAVbkWVADnmLEsu5Rt5vlS//K0kYm8mGFoRvrBVZDTYHAFtqmtW fwaop70yKDSOzV+YN1sA65BxC6uufJENzs0Lw6148FM9KgENxYd4QL658MVoO1QJG5FT 2Mfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156602; x=1770761402; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=laeArOoHMmZZpQbmwN1GpGAXEElz6gu2HEz9aEbOa+M=; b=KR2lfcvd7Zu0bpYVLqMbfL1acYfjmrP4T4fSz6PKWufmpph41FSEy67dYaMpcMUKpg wi3VyaX+4O5VZs9BbV/mmQZSPhqyHznRfUqV3/kqYiqC84In3QM+Xq7HrHYUyFeBHSrB rT8gjwWCtJF/RhUUtTooho/u9+jih9T5YUws7XbVEv26IknWeuIoUgqp0Fmu0Qqn8Fyv XdgDY6OnpBUNeKkGvKjbhC70QHgcnLKdd+dia9Hl8g//5rjwP+coesGwsgANkC9c7496 4LgXKBPY9TEPZ2fQUCoJHn9NRqm1UiYaBE1azhUUc2ov6a55//n5qTfYjrhyv7aP6GIW JP3Q== X-Forwarded-Encrypted: i=1; AJvYcCXIja6VOqLL2MLm4+9PBFYtXOxM0kE01tfexaqMp5bEkeULhoE31E98GqALdFTFhI81EdIXLwYnNO0VEu0=@vger.kernel.org X-Gm-Message-State: AOJu0YxrE5SkeMlyXUmq3cCooAaJSLxi22w1cWZqmeNTOPtBFgjbaiFy lupaVhRsR4HRfQuWO3Ui4qakNNKLe9Q0lD1L64dHs2KXUamGHNEUpGUlnEap4+uQseyLaWMLCel Bwi6xJ/gKiVSgUw== X-Received: from plbmh14.prod.google.com ([2002:a17:903:9ce:b0:2a0:a0e0:a9c3]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2448:b0:2a8:2c4a:3570 with SMTP id d9443c01a7336-2a933febfb8mr7195225ad.49.1770156601503; Tue, 03 Feb 2026 14:10:01 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:41 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-8-skhawaja@google.com> Subject: [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During boot fetch the preserved state of IOMMU unit and if found then restore the state. - Reuse the root_table that was preserved in the previous kernel. - Reclaim the domain ids of the preserved domains for each preserved devices so these are not acquired by another domain. Signed-off-by: Samiullah Khawaja --- drivers/iommu/intel/iommu.c | 26 +++++++++++++++------ drivers/iommu/intel/iommu.h | 7 ++++++ drivers/iommu/intel/liveupdate.c | 40 ++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index c95de93fb72f..8acb7f8a7627 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -222,12 +222,12 @@ static void clear_translation_pre_enabled(struct inte= l_iommu *iommu) iommu->flags &=3D ~VTD_FLAG_TRANS_PRE_ENABLED; } =20 -static void init_translation_status(struct intel_iommu *iommu) +static void init_translation_status(struct intel_iommu *iommu, bool restor= ing) { u32 gsts; =20 gsts =3D readl(iommu->reg + DMAR_GSTS_REG); - if (gsts & DMA_GSTS_TES) + if (!restoring && (gsts & DMA_GSTS_TES)) iommu->flags |=3D VTD_FLAG_TRANS_PRE_ENABLED; } =20 @@ -670,10 +670,16 @@ void dmar_fault_dump_ptes(struct intel_iommu *iommu, = u16 source_id, #endif =20 /* iommu handling */ -static int iommu_alloc_root_entry(struct intel_iommu *iommu) +static int iommu_alloc_root_entry(struct intel_iommu *iommu, struct iommu_= ser *restored_state) { struct root_entry *root; =20 + if (restored_state) { + intel_iommu_liveupdate_restore_root_table(iommu, restored_state); + __iommu_flush_cache(iommu, iommu->root_entry, ROOT_SIZE); + return 0; + } + root =3D iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC, SZ_4K); if (!root) { pr_err("Allocating root entry for %s failed\n", @@ -1614,6 +1620,7 @@ static int copy_translation_tables(struct intel_iommu= *iommu) =20 static int __init init_dmars(void) { + struct iommu_ser *iommu_ser =3D NULL; struct dmar_drhd_unit *drhd; struct intel_iommu *iommu; int ret; @@ -1636,8 +1643,10 @@ static int __init init_dmars(void) intel_pasid_max_id); } =20 + iommu_ser =3D iommu_get_preserved_data(iommu->reg_phys, IOMMU_INTEL); + intel_iommu_init_qi(iommu); - init_translation_status(iommu); + init_translation_status(iommu, !!iommu_ser); =20 if (translation_pre_enabled(iommu) && !is_kdump_kernel()) { iommu_disable_translation(iommu); @@ -1651,7 +1660,7 @@ static int __init init_dmars(void) * we could share the same root & context tables * among all IOMMU's. Need to Split it later. */ - ret =3D iommu_alloc_root_entry(iommu); + ret =3D iommu_alloc_root_entry(iommu, iommu_ser); if (ret) goto free_iommu; =20 @@ -2110,15 +2119,18 @@ int dmar_parse_one_satc(struct acpi_dmar_header *hd= r, void *arg) static int intel_iommu_add(struct dmar_drhd_unit *dmaru) { struct intel_iommu *iommu =3D dmaru->iommu; + struct iommu_ser *iommu_ser =3D NULL; int ret; =20 + iommu_ser =3D iommu_get_preserved_data(iommu->reg_phys, IOMMU_INTEL); + /* * Disable translation if already enabled prior to OS handover. */ - if (iommu->gcmd & DMA_GCMD_TE) + if (!iommu_ser && iommu->gcmd & DMA_GCMD_TE) iommu_disable_translation(iommu); =20 - ret =3D iommu_alloc_root_entry(iommu); + ret =3D iommu_alloc_root_entry(iommu, iommu_ser); if (ret) goto out; =20 diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 70032e86437d..d7bf63aff17d 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1283,6 +1283,8 @@ int intel_iommu_preserve_device(struct device *dev, s= truct device_ser *device_se void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *= device_ser); int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_ser *iom= mu_ser); void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *= iommu_ser); +void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu, + struct iommu_ser *iommu_ser); #else static inline int intel_iommu_preserve_device(struct device *dev, struct d= evice_ser *device_ser) { @@ -1301,6 +1303,11 @@ static inline int intel_iommu_preserve(struct iommu_= device *iommu, struct iommu_ static inline void intel_iommu_unpreserve(struct iommu_device *iommu, stru= ct iommu_ser *iommu_ser) { } + +static inline void intel_iommu_liveupdate_restore_root_table(struct intel_= iommu *iommu, + struct iommu_ser *iommu_ser) +{ +} #endif =20 #ifdef CONFIG_INTEL_IOMMU_SVM diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupd= ate.c index 82ba1daf1711..6dcb5783d1db 100644 --- a/drivers/iommu/intel/liveupdate.c +++ b/drivers/iommu/intel/liveupdate.c @@ -73,6 +73,46 @@ static int preserve_iommu_context(struct intel_iommu *io= mmu) return ret; } =20 +static void restore_iommu_context(struct intel_iommu *iommu) +{ + struct context_entry *context; + int i; + + for (i =3D 0; i < ROOT_ENTRY_NR; i++) { + context =3D iommu_context_addr(iommu, i, 0, 0); + if (context) + BUG_ON(!kho_restore_folio(virt_to_phys(context))); + + if (!sm_supported(iommu)) + continue; + + context =3D iommu_context_addr(iommu, i, 0x80, 0); + if (context) + BUG_ON(!kho_restore_folio(virt_to_phys(context))); + } +} + +static int __restore_used_domain_ids(struct device_ser *ser, void *arg) +{ + int id =3D ser->domain_iommu_ser.did; + struct intel_iommu *iommu =3D arg; + + ida_alloc_range(&iommu->domain_ida, id, id, GFP_ATOMIC); + return 0; +} + +void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu, + struct iommu_ser *iommu_ser) +{ + BUG_ON(!kho_restore_folio(iommu_ser->intel.root_table)); + iommu->root_entry =3D __va(iommu_ser->intel.root_table); + + restore_iommu_context(iommu); + iommu_for_each_preserved_device(__restore_used_domain_ids, iommu); + pr_info("Restored IOMMU[0x%llx] Root Table at: 0x%llx\n", + iommu->reg_phys, iommu_ser->intel.root_table); +} + int intel_iommu_preserve_device(struct device *dev, struct device_ser *dev= ice_ser) { struct device_domain_info *info =3D dev_iommu_priv_get(dev); --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0235137D11C for ; Tue, 3 Feb 2026 22:10:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156605; cv=none; b=M/0sfN/OsAVPglk37YXVD3Dyju8M5p8+bIepOZxgsEvjHBL9yR7LZDDsEqk3oll7ERaD+UgDrwugMOtBxD2KqFSZDgiHsdd8lnVOV8Eyb5fbN5ICtgpIejubZHGvRSylBPoDCA4F89abLFBHLy5pImp1cGaFOLvj0eiuNd+b1m4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156605; c=relaxed/simple; bh=Rz4XRn9dLnn8NMZc+hztQghLkCsGuHOtgIhEzdWTtVY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=J1Zf1LDtYX10mDUedR5YSy2/i9A2+pR022mjMfmMOQiEJtQEUuD6F0ozvYu3P5gFY+c9gL63AGpw311E52hXcm8lk6oBYSuwBj+cBXkyBuFzQ7l1Nilqv4OOpAyd0j/QEFrfhpubnivO7uQuihSlTBg0ciKpsJLsBGPjVRysCh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TXZ0+F6o; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TXZ0+F6o" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-81f8c209cfbso3314566b3a.2 for ; Tue, 03 Feb 2026 14:10:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156603; x=1770761403; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xjKGYOflzggluFVjGdAa7AEy+g4yCqhubL98cyjiky0=; b=TXZ0+F6oMMX7o6JP2Sbwc21vJ8tTsAigVLLmwDmTWGgqjh9G13OIRUs+IjJ3mrtuCC azyo4G7Vl1bHr7PER8aSWch8gR7PESevYH8npSt7r675vob5AKtyoIbVQv3Fh/Wu9HbN zSrPlsVJYvw41g+qqML1Xz7SXdD/M9Cnh8+UPAsjRodTgerazS0piyhFuq/AffLjH7Dk sOLvcSZSEK+v6d3ysKb6dkFdVspZRPxKbl48+rLItKkn/I/rwy3fMqM9ZIcxjCcr3+QR +f0MZxL+BB0xLym5LAfV2vHM4GeEAk56Nu64FHfpjete/5js9q7NqdV61sE/3aXXlV88 OxVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156603; x=1770761403; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xjKGYOflzggluFVjGdAa7AEy+g4yCqhubL98cyjiky0=; b=K8UQ9USYWa8cQvbw8yjoP1BIAySesEeNX8ZwOrOKxcbrct1rveFu2zuIjkVnVSMDIz AoxWP2MKtj0qjNmlSpkP/zF+32GyO/g5yGzbAw6kKL9RSXCoS/ovCR8IxXOIGbDloymJ snrY/y0g/R9Dns33fdonDEMXm0qJxqU2sCxG5HhP2xxEBRO48ib1ivwo7VBaEnCQV6+H iUMdKMKqYlWQ2MjkYWMkckwVQ2UeE1sqgheRAteW3RSWlquj9e+ZRGwFy8vhl1udEPB4 +TIxNHd6t3R2kiUTFYsjFaTU63GTbLw1/ZBfeA8SYKNtw3I6yNS3QNVl85cV5wRjZtz3 hV8Q== X-Forwarded-Encrypted: i=1; AJvYcCUJlP1Wr22T1SwOl/zCZ23WOyM2liw9ZeV6dUVwHjq6Vxdv1kieImET32VxnCrtjZGzR1oqsau//yi7MHs=@vger.kernel.org X-Gm-Message-State: AOJu0YzukbaDYgX+DIp6YdW+oXw5aU+/8jGJh6pOY9lXjlQ91QUbv+Sk kN4yrchI5HMtIxRsOzYjE8Zt3/qi6IL/1+5KyxzanRcMz1sjk2lXw1q/1BWBr4fdzHI6EAIOOBv q6AINf3nBZpex3g== X-Received: from pfbhk1.prod.google.com ([2002:a05:6a00:8781:b0:7e5:49a7:f55f]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3905:b0:7a2:7458:7fc8 with SMTP id d2e1a72fcca58-8241c1ad253mr831868b3a.13.1770156603016; Tue, 03 Feb 2026 14:10:03 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:42 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-9-skhawaja@google.com> Subject: [PATCH 08/14] iommu: Restore and reattach preserved domains to devices From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore the preserved domains by restoring the page tables using restore IOMMU domain op. Reattach the preserved domain to the device during default domain setup. While attaching, reuse the domain ID that was used in the previous kernel. The context entry setup is not needed as that is preserved during liveupdate. Signed-off-by: Samiullah Khawaja --- drivers/iommu/intel/iommu.c | 40 ++++++++++++++++++------------ drivers/iommu/intel/iommu.h | 3 ++- drivers/iommu/intel/nested.c | 2 +- drivers/iommu/iommu.c | 47 ++++++++++++++++++++++++++++++++++-- drivers/iommu/liveupdate.c | 31 ++++++++++++++++++++++++ include/linux/iommu-lu.h | 8 ++++++ 6 files changed, 112 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 8acb7f8a7627..83faad53f247 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1029,7 +1029,8 @@ static bool first_level_by_default(struct intel_iommu= *iommu) return true; } =20 -int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *io= mmu) +int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *io= mmu, + int restore_did) { struct iommu_domain_info *info, *curr; int num, ret =3D -ENOSPC; @@ -1049,8 +1050,11 @@ int domain_attach_iommu(struct dmar_domain *domain, = struct intel_iommu *iommu) return 0; } =20 - num =3D ida_alloc_range(&iommu->domain_ida, IDA_START_DID, - cap_ndoms(iommu->cap) - 1, GFP_KERNEL); + if (restore_did >=3D 0) + num =3D restore_did; + else + num =3D ida_alloc_range(&iommu->domain_ida, IDA_START_DID, + cap_ndoms(iommu->cap) - 1, GFP_KERNEL); if (num < 0) { pr_err("%s: No free domain ids\n", iommu->name); goto err_unlock; @@ -1321,10 +1325,14 @@ static int dmar_domain_attach_device(struct dmar_do= main *domain, { struct device_domain_info *info =3D dev_iommu_priv_get(dev); struct intel_iommu *iommu =3D info->iommu; + struct device_ser *device_ser =3D NULL; unsigned long flags; int ret; =20 - ret =3D domain_attach_iommu(domain, iommu); + device_ser =3D dev_iommu_restored_state(dev); + + ret =3D domain_attach_iommu(domain, iommu, + dev_iommu_restore_did(dev, &domain->domain)); if (ret) return ret; =20 @@ -1337,16 +1345,18 @@ static int dmar_domain_attach_device(struct dmar_do= main *domain, if (dev_is_real_dma_subdevice(dev)) return 0; =20 - if (!sm_supported(iommu)) - ret =3D domain_context_mapping(domain, dev); - else if (intel_domain_is_fs_paging(domain)) - ret =3D domain_setup_first_level(iommu, domain, dev, - IOMMU_NO_PASID, NULL); - else if (intel_domain_is_ss_paging(domain)) - ret =3D domain_setup_second_level(iommu, domain, dev, - IOMMU_NO_PASID, NULL); - else if (WARN_ON(true)) - ret =3D -EINVAL; + if (!device_ser) { + if (!sm_supported(iommu)) + ret =3D domain_context_mapping(domain, dev); + else if (intel_domain_is_fs_paging(domain)) + ret =3D domain_setup_first_level(iommu, domain, dev, + IOMMU_NO_PASID, NULL); + else if (intel_domain_is_ss_paging(domain)) + ret =3D domain_setup_second_level(iommu, domain, dev, + IOMMU_NO_PASID, NULL); + else if (WARN_ON(true)) + ret =3D -EINVAL; + } =20 if (ret) goto out_block_translation; @@ -3630,7 +3640,7 @@ domain_add_dev_pasid(struct iommu_domain *domain, if (!dev_pasid) return ERR_PTR(-ENOMEM); =20 - ret =3D domain_attach_iommu(dmar_domain, iommu); + ret =3D domain_attach_iommu(dmar_domain, iommu, -1); if (ret) goto out_free; =20 diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index d7bf63aff17d..057bd6035d85 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1174,7 +1174,8 @@ void __iommu_flush_iotlb(struct intel_iommu *iommu, u= 16 did, u64 addr, */ #define QI_OPT_WAIT_DRAIN BIT(0) =20 -int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *io= mmu); +int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *io= mmu, + int restore_did); void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *i= ommu); void device_block_translation(struct device *dev); int paging_domain_compatible(struct iommu_domain *domain, struct device *d= ev); diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index a3fb8c193ca6..4fed9f5981e5 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -40,7 +40,7 @@ static int intel_nested_attach_dev(struct iommu_domain *d= omain, return ret; } =20 - ret =3D domain_attach_iommu(dmar_domain, iommu); + ret =3D domain_attach_iommu(dmar_domain, iommu, -1); if (ret) { dev_err_ratelimited(dev, "Failed to attach domain to iommu\n"); return ret; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c0632cb5b570..8103b5372364 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -489,6 +490,10 @@ static int iommu_init_device(struct device *dev) goto err_free; } =20 +#ifdef CONFIG_IOMMU_LIVEUPDATE + dev->iommu->device_ser =3D iommu_get_device_preserved_data(dev); +#endif + iommu_dev =3D ops->probe_device(dev); if (IS_ERR(iommu_dev)) { ret =3D PTR_ERR(iommu_dev); @@ -2149,6 +2154,13 @@ static int __iommu_attach_device(struct iommu_domain= *domain, ret =3D domain->ops->attach_dev(domain, dev, old); if (ret) return ret; + +#ifdef CONFIG_IOMMU_LIVEUPDATE + /* The associated state can be unset once restored. */ + if (dev_iommu_restored_state(dev)) + WRITE_ONCE(dev->iommu->device_ser, NULL); +#endif + dev->iommu->attach_deferred =3D 0; trace_attach_device_to_domain(dev); return 0; @@ -3061,6 +3073,34 @@ int iommu_fwspec_add_ids(struct device *dev, const u= 32 *ids, int num_ids) } EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids); =20 +static struct iommu_domain *__iommu_group_maybe_restore_domain(struct iomm= u_group *group) +{ + struct device_ser *device_ser; + struct iommu_domain *domain; + struct device *dev; + + dev =3D iommu_group_first_dev(group); + if (!dev_is_pci(dev)) + return NULL; + + device_ser =3D dev_iommu_restored_state(dev); + if (!device_ser) + return NULL; + + domain =3D iommu_restore_domain(dev, device_ser); + if (WARN_ON(IS_ERR(domain))) + return NULL; + + /* + * The group is owned by the entity (a preserved iommufd) that provided + * this token in the previous kernel. It will be used to reclaim it + * later. + */ + group->owner =3D (void *)device_ser->token; + group->owner_cnt =3D 1; + return domain; +} + /** * iommu_setup_default_domain - Set the default_domain for the group * @group: Group to change @@ -3075,8 +3115,8 @@ static int iommu_setup_default_domain(struct iommu_gr= oup *group, int target_type) { struct iommu_domain *old_dom =3D group->default_domain; + struct iommu_domain *dom, *restored_domain; struct group_device *gdev; - struct iommu_domain *dom; bool direct_failed; int req_type; int ret; @@ -3120,6 +3160,9 @@ static int iommu_setup_default_domain(struct iommu_gr= oup *group, /* We must set default_domain early for __iommu_device_set_domain */ group->default_domain =3D dom; if (!group->domain) { + restored_domain =3D __iommu_group_maybe_restore_domain(group); + if (!restored_domain) + restored_domain =3D dom; /* * Drivers are not allowed to fail the first domain attach. * The only way to recover from this is to fail attaching the @@ -3127,7 +3170,7 @@ static int iommu_setup_default_domain(struct iommu_gr= oup *group, * in group->default_domain so it is freed after. */ ret =3D __iommu_group_set_domain_internal( - group, dom, IOMMU_SET_DOMAIN_MUST_SUCCEED); + group, restored_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED); if (WARN_ON(ret)) goto out_free_old; } else { diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c index 83eb609b3fd7..6b211436ad25 100644 --- a/drivers/iommu/liveupdate.c +++ b/drivers/iommu/liveupdate.c @@ -501,3 +501,34 @@ void iommu_unpreserve_device(struct iommu_domain *doma= in, struct device *dev) =20 iommu_unpreserve_locked(iommu->iommu_dev); } + +struct iommu_domain *iommu_restore_domain(struct device *dev, struct devic= e_ser *ser) +{ + struct iommu_domain_ser *domain_ser; + struct iommu_lu_flb_obj *flb_obj; + struct iommu_domain *domain; + int ret; + + domain_ser =3D __va(ser->domain_iommu_ser.domain_phys); + + ret =3D liveupdate_flb_get_incoming(&iommu_flb, (void **)&flb_obj); + if (ret) + return ERR_PTR(ret); + + guard(mutex)(&flb_obj->lock); + if (domain_ser->restored_domain) + return domain_ser->restored_domain; + + domain_ser->obj.incoming =3D true; + domain =3D iommu_paging_domain_alloc(dev); + if (IS_ERR(domain)) + return domain; + + ret =3D domain->ops->restore(domain, domain_ser); + if (ret) + return ERR_PTR(ret); + + domain->preserved_state =3D domain_ser; + domain_ser->restored_domain =3D domain; + return domain; +} diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h index 48c07514a776..4879abaf83d3 100644 --- a/include/linux/iommu-lu.h +++ b/include/linux/iommu-lu.h @@ -65,6 +65,8 @@ static inline int dev_iommu_restore_did(struct device *de= v, struct iommu_domain return -1; } =20 +struct iommu_domain *iommu_restore_domain(struct device *dev, + struct device_ser *ser); int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, void *arg); struct device_ser *iommu_get_device_preserved_data(struct device *dev); @@ -95,6 +97,12 @@ static inline void *iommu_domain_restored_state(struct i= ommu_domain *domain) return NULL; } =20 +static inline struct iommu_domain *iommu_restore_domain(struct device *dev, + struct device_ser *ser) +{ + return NULL; +} + static inline int iommu_for_each_preserved_device(iommu_preserved_device_i= ter_fn fn, void *arg) { return -EOPNOTSUPP; --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76A6F37E30C for ; Tue, 3 Feb 2026 22:10:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156607; cv=none; b=cIzNcsuFntxIM/O0ozjjsiA9ONABsxKEl2mOXj90kqIgbMQXp2Hz9KSwRDhkHAIJqjrlyEFHLLTZHyolWBuymxpKz5IEP5HTIBUReQ2HrBn8Mw5S4zbwaSrZv5JyasABa1TFaj3ftMVqcZiEpouVb1twptf1Ejx5lhQ16PXmKLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156607; c=relaxed/simple; bh=eZGL6r3Vcbmae1XqI15ANHTzSxVGmOogsWKg6UBegc0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EFACjnvZLM9iEZw6JUizHSjBC+vIsQCPyyUgGY/zIVIuRCmQJ5XzrQ4+jaMWViMm+SgATCZDW+7Twrc7poojOmwGJkBsH0DuRyuXpANjO0MJ/fb0/TsMK2jNEDm0qE7KHLY3j9ZxdCvnDKPJ+3HhORnoRwg+fJhUgrEKG1tUOXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cNwOofNu; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cNwOofNu" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c552d1f9eafso12340058a12.0 for ; Tue, 03 Feb 2026 14:10:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156605; x=1770761405; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VBDlj4MRkwRVhMDxpPQ8tTAs7Fk9pUNiAoP3hey8V2Y=; b=cNwOofNuVw7kVIEla4OgWVjELk6SxyWn8VCGrUmtJjXlWPXk6K0uzmriiHowzxMXuR 6/m1meR6e1RimOaSSX9WoZlEoXWY6w227ilDI0GxzD5TAXRafoMAtdjEXRZfVFSjAp6Z /UEix4Ry1CQ6PdpQgOm4LkKtB2Ici3AkXNmukN1Q0ADTYKmuTBI2K7MuNOAxb/Qxea81 7Hy0fSuBE3yuMtWIOLk+85qReeyDR1Et05j3Q/SR8R7hi58+etaGXxrBPQ0AfZ92ki+V axTL82iMD+EnvdTwAbZ87hTHw9OTIFh1ttJC/07yI79cGkyjPdO8TxXyFFLzCf5mDYb4 XNUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156605; x=1770761405; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VBDlj4MRkwRVhMDxpPQ8tTAs7Fk9pUNiAoP3hey8V2Y=; b=WnP9ECpWexzthuk0FF6YsmXSLC5z+wzsVi3LNrjl7hQe7B47Bl96nf2vDPLaRztHtk A2IyRy5Wgj+2lGrBiF3yswXS2114rd7PkuX7izQ/rRLF82CBT9yGTVMeAl9ktR54yO8o pzlWC1/YUF0NVtVIAUuDru6bjR0MgV/I0Ju6lVDLMcYCEMeVJtkhpO37sM1qLWk9UY4C xziFNlfchuvaw8pr8fTVRyb0s801EG3gZMSJWeI6qYaIB5ORHwRyyT1qWc2jKTLwmUye rsI/BckfjTALrwYvNR1xoUvj5CGiVUy89/kgGVHOUHM7lFLQeyZ3IktnV+iHW2qiON90 WxsA== X-Forwarded-Encrypted: i=1; AJvYcCVNoolB2TlE5zWkPoW0HZjf9NmKIceYyCU9197aT2BrJX5e8DoHznvPV/jL2Q2vnPXw01hde4/Pu/RoISA=@vger.kernel.org X-Gm-Message-State: AOJu0YxzWlyNsKvhzyeS5hEvDdUN6u1Lz1SW5DjTE8GE2D6D/+E3kG3N q/0dJND5TxDH6RAG4yUo/+bSaaPXOBglpFKVZNRSt+2zHLQrPOkdGERYkdUNGjwBWEZAMS5N5v8 bj1HaXA8FTS519w== X-Received: from pge6.prod.google.com ([2002:a05:6a02:2d06:b0:c66:f3a8:71ce]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:a10f:b0:393:7891:7ead with SMTP id adf61e73a8af0-393789182dcmr92483637.1.1770156604757; Tue, 03 Feb 2026 14:10:04 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:43 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-10-skhawaja@google.com> Subject: [PATCH 09/14] iommu/vt-d: preserve PASID table of preserved device From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In scalable mode the PASID table is used to fetch the io page tables. Preserve and restore the PASID table of the preserved devices. Signed-off-by: Samiullah Khawaja --- drivers/iommu/intel/iommu.c | 4 +- drivers/iommu/intel/iommu.h | 5 ++ drivers/iommu/intel/liveupdate.c | 130 +++++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.c | 7 +- drivers/iommu/intel/pasid.h | 9 +++ include/linux/kho/abi/iommu.h | 8 ++ 6 files changed, 160 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 83faad53f247..2d0dae57f5a2 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2944,8 +2944,10 @@ static bool __maybe_clean_unpreserved_context_entrie= s(struct intel_iommu *iommu) if (info->iommu !=3D iommu) continue; =20 - if (dev_iommu_preserved_state(&pdev->dev)) + if (dev_iommu_preserved_state(&pdev->dev)) { + pasid_cleanup_preserved_table(&pdev->dev); continue; + } =20 domain_context_clear(info); } diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 057bd6035d85..d24d6aeaacc0 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1286,6 +1286,7 @@ int intel_iommu_preserve(struct iommu_device *iommu, = struct iommu_ser *iommu_ser void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *= iommu_ser); void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu, struct iommu_ser *iommu_ser); +void pasid_cleanup_preserved_table(struct device *dev); #else static inline int intel_iommu_preserve_device(struct device *dev, struct d= evice_ser *device_ser) { @@ -1309,6 +1310,10 @@ static inline void intel_iommu_liveupdate_restore_ro= ot_table(struct intel_iommu struct iommu_ser *iommu_ser) { } + +static inline void pasid_cleanup_preserved_table(struct device *dev) +{ +} #endif =20 #ifdef CONFIG_INTEL_IOMMU_SVM diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupd= ate.c index 6dcb5783d1db..53bb5fe3a764 100644 --- a/drivers/iommu/intel/liveupdate.c +++ b/drivers/iommu/intel/liveupdate.c @@ -14,6 +14,7 @@ #include =20 #include "iommu.h" +#include "pasid.h" #include "../iommu-pages.h" =20 static void unpreserve_iommu_context(struct intel_iommu *iommu, int end) @@ -113,9 +114,89 @@ void intel_iommu_liveupdate_restore_root_table(struct = intel_iommu *iommu, iommu->reg_phys, iommu_ser->intel.root_table); } =20 +enum pasid_lu_op { + PASID_LU_OP_PRESERVE =3D 1, + PASID_LU_OP_UNPRESERVE, + PASID_LU_OP_RESTORE, + PASID_LU_OP_FREE, +}; + +static int pasid_lu_do_op(void *table, enum pasid_lu_op op) +{ + int ret =3D 0; + + switch (op) { + case PASID_LU_OP_PRESERVE: + ret =3D iommu_preserve_page(table); + break; + case PASID_LU_OP_UNPRESERVE: + iommu_unpreserve_page(table); + break; + case PASID_LU_OP_RESTORE: + iommu_restore_page(virt_to_phys(table)); + break; + case PASID_LU_OP_FREE: + iommu_free_pages(table); + break; + } + + return ret; +} + +static int pasid_lu_handle_pd(struct pasid_dir_entry *dir, enum pasid_lu_o= p op) +{ + struct pasid_entry *table; + int ret; + + /* Only preserve first table for NO_PASID. */ + table =3D get_pasid_table_from_pde(&dir[0]); + if (!table) + return -EINVAL; + + ret =3D pasid_lu_do_op(table, op); + if (ret) + return ret; + + ret =3D pasid_lu_do_op(dir, op); + if (ret) + goto err; + + return 0; +err: + if (op =3D=3D PASID_LU_OP_PRESERVE) + pasid_lu_do_op(table, PASID_LU_OP_UNPRESERVE); + + return ret; +} + +void pasid_cleanup_preserved_table(struct device *dev) +{ + struct pasid_table *pasid_table; + struct pasid_dir_entry *dir; + struct pasid_entry *table; + + pasid_table =3D intel_pasid_get_table(dev); + if (!pasid_table) + return; + + dir =3D pasid_table->table; + table =3D get_pasid_table_from_pde(&dir[0]); + if (!table) + return; + + /* Cleanup everything except the first entry. */ + memset(&table[1], 0, SZ_4K - sizeof(*table)); + memset(&dir[1], 0, SZ_4K - sizeof(struct pasid_dir_entry)); + + clflush_cache_range(&table[0], SZ_4K); + clflush_cache_range(&dir[0], SZ_4K); +} + int intel_iommu_preserve_device(struct device *dev, struct device_ser *dev= ice_ser) { struct device_domain_info *info =3D dev_iommu_priv_get(dev); + struct pasid_table *pasid_table; + int ret; =20 if (!dev_is_pci(dev)) return -EOPNOTSUPP; @@ -124,11 +205,42 @@ int intel_iommu_preserve_device(struct device *dev, s= truct device_ser *device_se return -EINVAL; =20 device_ser->domain_iommu_ser.did =3D domain_id_iommu(info->domain, info->= iommu); + + if (!sm_supported(info->iommu)) + return 0; + + pasid_table =3D intel_pasid_get_table(dev); + if (!pasid_table) + return -EINVAL; + + ret =3D pasid_lu_handle_pd(pasid_table->table, PASID_LU_OP_PRESERVE); + if (ret) + return ret; + + device_ser->intel.pasid_table =3D virt_to_phys(pasid_table->table); + device_ser->intel.max_pasid =3D pasid_table->max_pasid; return 0; } =20 void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *= device_ser) { + struct device_domain_info *info =3D dev_iommu_priv_get(dev); + struct pasid_table *pasid_table; + + if (!dev_is_pci(dev)) + return; + + if (!info) + return; + + if (!sm_supported(info->iommu)) + return; + + pasid_table =3D intel_pasid_get_table(dev); + if (!pasid_table) + return; + + pasid_lu_handle_pd(pasid_table->table, PASID_LU_OP_UNPRESERVE); } =20 int intel_iommu_preserve(struct iommu_device *iommu_dev, struct iommu_ser = *ser) @@ -172,3 +284,21 @@ void intel_iommu_unpreserve(struct iommu_device *iommu= _dev, struct iommu_ser *io iommu_unpreserve_page(iommu->root_entry); spin_unlock(&iommu->lock); } + +void *intel_pasid_try_restore_table(struct device *dev, u64 max_pasid) +{ + struct device_ser *ser =3D dev_iommu_restored_state(dev); + + if (!ser) + return NULL; + + BUG_ON(pasid_lu_handle_pd(phys_to_virt(ser->intel.pasid_table), + PASID_LU_OP_RESTORE)); + if (WARN_ON_ONCE(ser->intel.max_pasid !=3D max_pasid)) { + pasid_lu_handle_pd(phys_to_virt(ser->intel.pasid_table), + PASID_LU_OP_FREE); + return NULL; + } + + return phys_to_virt(ser->intel.pasid_table); +} diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 3e2255057079..96b9daf9083d 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -60,8 +60,11 @@ int intel_pasid_alloc_table(struct device *dev) =20 size =3D max_pasid >> (PASID_PDE_SHIFT - 3); order =3D size ? get_order(size) : 0; - dir =3D iommu_alloc_pages_node_sz(info->iommu->node, GFP_KERNEL, - 1 << (order + PAGE_SHIFT)); + + dir =3D intel_pasid_try_restore_table(dev, max_pasid); + if (!dir) + dir =3D iommu_alloc_pages_node_sz(info->iommu->node, GFP_KERNEL, + 1 << (order + PAGE_SHIFT)); if (!dir) { kfree(pasid_table); return -ENOMEM; diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h index b4c85242dc79..e8a626c47daf 100644 --- a/drivers/iommu/intel/pasid.h +++ b/drivers/iommu/intel/pasid.h @@ -287,6 +287,15 @@ static inline void pasid_set_eafe(struct pasid_entry *= pe) =20 extern unsigned int intel_pasid_max_id; int intel_pasid_alloc_table(struct device *dev); +#ifdef CONFIG_IOMMU_LIVEUPDATE +void *intel_pasid_try_restore_table(struct device *dev, u64 max_pasid); +#else +static inline void *intel_pasid_try_restore_table(struct device *dev, + u64 max_pasid) +{ + return NULL; +} +#endif void intel_pasid_free_table(struct device *dev); struct pasid_table *intel_pasid_get_table(struct device *dev); int intel_pasid_setup_first_level(struct intel_iommu *iommu, struct device= *dev, diff --git a/include/linux/kho/abi/iommu.h b/include/linux/kho/abi/iommu.h index 8e1c05cfe7bb..111a46c31d92 100644 --- a/include/linux/kho/abi/iommu.h +++ b/include/linux/kho/abi/iommu.h @@ -50,6 +50,11 @@ struct device_domain_iommu_ser { u64 iommu_phys; } __packed; =20 +struct device_intel_ser { + u64 pasid_table; + u64 max_pasid; +} __packed; + struct device_ser { struct iommu_obj_ser obj; u64 token; @@ -57,6 +62,9 @@ struct device_ser { u32 pci_domain; struct device_domain_iommu_ser domain_iommu_ser; enum iommu_lu_type type; + union { + struct device_intel_ser intel; + }; } __packed; =20 struct iommu_intel_ser { --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0174D37D11A for ; Tue, 3 Feb 2026 22:10:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156608; cv=none; b=us529KbsG25T/jSt/K8/aIvSh2wayz/YgJYp13vbd8Q0wGYHiVckQLbGzNxZ52lSjifMpmZ3+W+hmzT5dw5yDmdTfc+OreIZpCYcIkrcGDNHk4iZYplh0WCLqXj0DAnQFUw1d3LYZaY0t/3RM3cRqkdfCeUIlMSkRmhmhXH8WWs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156608; c=relaxed/simple; bh=L1gdkrfZpzHvz4nOMEwGAj6a2f4hvV7aHXOYdOFvgIo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=P+f66EnJszTsnBygocWE/BV+CaysC3Qf2J6KuotTgNkd8AgfwXdkw7BkEwt60BpEse20r6JZb7wuFezFwT5iGiSbIbbKdzy2j9PIvCEZ+1XOUramGpWKCoAwY/z4FAu69OzRdxddG3YCyzXwxSmFGZBQT4tADCBljxBpBiPvm5I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DYU3bnSx; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DYU3bnSx" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c613929d317so3727660a12.2 for ; Tue, 03 Feb 2026 14:10:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156606; x=1770761406; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FO4QdmXutfGlvqQphmse6DQSnrpdmmMLD/ha8Ivm6lo=; b=DYU3bnSxg6vbbejoZO9VJ4yF8NQzWtpKuiN7OkCLDWOpMS1zHEB5lIdmHHmIMjNMIZ JeftADerTXfMR9ifUGPU1jHUw2n6LFzyD5QdtF51OOWbEnsMobIX3UbbFo5ET6BlD+bM JAgh/UuNnunoAuJEjMS+N11aAlmT3EW43I1Ngp1TsV60j55HRY7gVmvHdIdStQ/BGJlk XPlEQlC0QkVW8yTd1lMPES/C4AfRm7k91mEhZYXa4EIbVrvh4XwfvYjFXJckoyKkyBqa o6epLl+RKudYhI6ESVA96s4yUfVaRN18vX1hHlYv+cI0V0NuKjJDQfIsVm7KepQVxY/B +3AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156606; x=1770761406; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FO4QdmXutfGlvqQphmse6DQSnrpdmmMLD/ha8Ivm6lo=; b=HpDXwvai939XDvpk2Trj/AVrKlHG7CzvC2OeXya7X7rcAkzugJMzzdJd0UvE2z9rA9 q4CzUGKQpO9DGl9bA0sYm8J31D/8nNFP6L0w0K2f+VpJCJFNbUBNqM9ZF3F4AIWMIfFZ sS21bXC7lMpbE8LcERnMp+HDW74tNpOedkINBm/dNg8mylv9EbaSR+VTDmhOgR9d5En+ QzNLqKY3fSL2dsI5zfSlOzcGokTqBa2Q1dB9NFF/Y6gZusY8ZidC24zHYAr53PWILmlV 8WxHYHCt4KypYo88g7P8my16WU7Yw8dOmZlaJLl1IOsmbl0u9Rwr/LIcpyJzUk72nXG2 QUWA== X-Forwarded-Encrypted: i=1; AJvYcCX8zObvXD+Y3QASc1FUB4KLWk6y3GsoAvzmD5W1VshfHwzm3bcbtwiyLb3eLhifnOxkgs3Chi5JV6u4LjQ=@vger.kernel.org X-Gm-Message-State: AOJu0Ywdurvpm3XyyDLLyeLSEXY5P4bq7xb7li9QZFTQor48prtgFcKC Rgv1iXNBsjpaeEJ/yEegiN6iSUXXqQ7FzY0+lSTIFBbL/TSViMOhIt2fsGVt3IHlq7wSJ8WNBHf 6SY6hdinK6onkAg== X-Received: from pgct9.prod.google.com ([2002:a05:6a02:5289:b0:c61:3a73:1448]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9185:b0:366:58cc:b74b with SMTP id adf61e73a8af0-393720cfdddmr834146637.21.1770156606372; Tue, 03 Feb 2026 14:10:06 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:44 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-11-skhawaja@google.com> Subject: [PATCH 10/14] iommufd-lu: Implement ioctl to let userspace mark an HWPT to be preserved From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: YiFei Zhu , Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: YiFei Zhu Userspace provides a token, which will then be used at restore to identify this HWPT. The restoration logic is not implemented and will be added later. Signed-off-by: YiFei Zhu Signed-off-by: Samiullah Khawaja --- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/iommufd_private.h | 13 +++++++ drivers/iommu/iommufd/liveupdate.c | 49 +++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 2 + include/uapi/linux/iommufd.h | 19 ++++++++++ 5 files changed, 84 insertions(+) create mode 100644 drivers/iommu/iommufd/liveupdate.c diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..c3bf0b6452d3 100644 --- a/drivers/iommu/iommufd/Makefile +++ b/drivers/iommu/iommufd/Makefile @@ -17,3 +17,4 @@ obj-$(CONFIG_IOMMUFD_DRIVER) +=3D iova_bitmap.o =20 iommufd_driver-y :=3D driver.o obj-$(CONFIG_IOMMUFD_DRIVER_CORE) +=3D iommufd_driver.o +obj-$(CONFIG_IOMMU_LIVEUPDATE) +=3D liveupdate.o diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index eb6d1a70f673..6424e7cea5b2 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -374,6 +374,10 @@ struct iommufd_hwpt_paging { bool auto_domain : 1; bool enforce_cache_coherency : 1; bool nest_parent : 1; +#ifdef CONFIG_IOMMU_LIVEUPDATE + bool lu_preserve : 1; + u32 lu_token; +#endif /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; struct iommufd_sw_msi_maps present_sw_msi; @@ -707,6 +711,15 @@ iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) struct iommufd_vdevice, obj); } =20 +#ifdef CONFIG_IOMMU_LIVEUPDATE +int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd); +#else +static inline int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd) +{ + return -ENOTTY; +} +#endif + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liv= eupdate.c new file mode 100644 index 000000000000..ae74f5b54735 --- /dev/null +++ b/drivers/iommu/iommufd/liveupdate.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define pr_fmt(fmt) "iommufd: " fmt + +#include +#include +#include + +#include "iommufd_private.h" + +int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd) +{ + struct iommu_hwpt_lu_set_preserve *cmd =3D ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_target, *hwpt; + struct iommufd_ctx *ictx =3D ucmd->ictx; + struct iommufd_object *obj; + unsigned long index; + int rc =3D 0; + + hwpt_target =3D iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_target)) + return PTR_ERR(hwpt_target); + + xa_lock(&ictx->objects); + xa_for_each(&ictx->objects, index, obj) { + if (obj->type !=3D IOMMUFD_OBJ_HWPT_PAGING) + continue; + + hwpt =3D container_of(obj, struct iommufd_hwpt_paging, common.obj); + + if (hwpt =3D=3D hwpt_target) + continue; + if (!hwpt->lu_preserve) + continue; + if (hwpt->lu_token =3D=3D cmd->hwpt_token) { + rc =3D -EADDRINUSE; + goto out; + } + } + + hwpt_target->lu_preserve =3D true; + hwpt_target->lu_token =3D cmd->hwpt_token; + +out: + xa_unlock(&ictx->objects); + iommufd_put_object(ictx, &hwpt_target->common.obj); + return rc; +} + diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 5cc4b08c25f5..e1a9b3051f65 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -493,6 +493,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[= ] =3D { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_HWPT_LU_SET_PRESERVE, iommufd_hwpt_lu_set_preserve, + struct iommu_hwpt_lu_set_preserve, hwpt_token), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 2c41920b641d..25d8cff987eb 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -57,6 +57,7 @@ enum { IOMMUFD_CMD_IOAS_CHANGE_PROCESS =3D 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC =3D 0x93, IOMMUFD_CMD_HW_QUEUE_ALLOC =3D 0x94, + IOMMUFD_CMD_HWPT_LU_SET_PRESERVE =3D 0x95, }; =20 /** @@ -1299,4 +1300,22 @@ struct iommu_hw_queue_alloc { __aligned_u64 length; }; #define IOMMU_HW_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HW_QUEUE_ALLOC) + +/** + * struct iommu_hwpt_lu_set_preserve - ioctl(IOMMU_HWPT_LU_SET_PRESERVE) + * @size: sizeof(struct iommu_hwpt_lu_set_preserve) + * @hwpt_id: Iommufd object ID of the target HWPT + * @hwpt_token: Token to identify this hwpt upon restore + * + * The target HWPT will be preserved during iommufd preservation. + * + * The hwpt_token is provided by userspace. If userspace enters a token + * already in use within this iommufd, -EADDRINUSE is returned from this i= octl. + */ +struct iommu_hwpt_lu_set_preserve { + __u32 size; + __u32 hwpt_id; + __u32 hwpt_token; +}; +#define IOMMU_HWPT_LU_SET_PRESERVE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_LU_S= ET_PRESERVE) #endif --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EC7137F8AB for ; Tue, 3 Feb 2026 22:10:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156612; cv=none; b=A1JUCOVrUdmXIpBGHeRDQLtxRTf+PVuQ3A9LlIB53zlhEq+ldPUe3TtKq1OJFS+FUmPKjOqAuuiKn+t53EhnRUThd1r5hhCdrnPyh+52Lo0g7eyY8DbqWl1O1Iag0pK78Se7LtreZUeUV+SyQmJSMgKlIxcgvMLxnAt6k/jDbf4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156612; c=relaxed/simple; bh=4pTQGXf59Tv5SczejFcsB0fW20aug+if299ScVeGJXg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UotQv1ykJee5GXAoyykt7K7zhq5RM9jjBPvaGK29dT5kSK2QtITqaPUmvFKo5a4BPXGtVRUGzynA75xPCrJLugQw27nvsmr1CG1vSTSKGs0KoC672M/KnJv51sBTVUFP8ZDsCdTQg11CREt8B6W7Wggh98WUEKNSiwiylLtUPWc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jeCoFqmp; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jeCoFqmp" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-8230d6d54a5so158721b3a.1 for ; Tue, 03 Feb 2026 14:10:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156608; x=1770761408; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ek4kI1rdNgBgSztIQqyckPruH667MMKt6lmcvu+Ljfo=; b=jeCoFqmpEUERoGxBmtlAVCASwtn2FlmfJp3VQYmCxArFNyDbPNyuG4hpJfSeHbI6w7 kmXExil1Iqn1h6gWkWM6Oj3bJOCoAz1F31DUwToQn0EsU3uHB3+wRjKyhrNwfw+TAJXp ivVjYXZNkARj+CudDQ0U9cYeR+nOIRPvaT+dbdi2sYIY+4Sz83g4yT0Q/tMuEEpuJsrC Aq9qThaIcA2qn0xk9gld2YiKCk8F/JW15iGlPeC0djY0QtHSsQsXgOz4A7JKajbtK2Ub bvpr8qOyG3EKegeBc3XpwBvACmNneb8PxBfnU+V+nd8XtdjOD4e7HNaxjIpjcoTxCwIz SgmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156608; x=1770761408; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ek4kI1rdNgBgSztIQqyckPruH667MMKt6lmcvu+Ljfo=; b=YxYj5wuDDCyd/F68+o5mGVlxs3PO9iIXQOBKyyPKjHk0S/CPFVldhu+ACnVMR+VuLU LMMWkEvjct5Sczv9/nz9WpaLIGPsh5oADJtw++4OC/RiU1Y66CTKmOeBjdU0ZBJLYgax Cc7VWgXgP6hVVNsMk8p2QuPzV7W3zlWTDNX/emzvq/Y8JHl/GPrek4myw6f/A+3an+NV /YAgfs5lAI2zXRKVCAjnQpKnJ0xbjTkhKqTByEXyUiVI4apMFnXGja2oqvremAo8bTfJ xxBD8SS0f/+HBEJL/H8gXq8PiXYQWd//pic15nIPB7flqRCDHeF6wQ6Qmdvv1yAA5M2V gpIw== X-Forwarded-Encrypted: i=1; AJvYcCWw4Es6bbuU8hGR1J2/NDZf3Bh8yCLkfpYlQhd+9BKgaDTTjLBsq/s+dMQhKMMYAAnHjxcWWTroBC4qCLc=@vger.kernel.org X-Gm-Message-State: AOJu0Yx5UFsNPEXMaHn6o55oSUwcP1BZS2mREiXViiY92aSO29gINKPd 8xborztHV6zXeChRIeAIXgwFpU8V3/4jcAL2NaJ9ch4PAoS/cYeRHf2A5SZGi5sVFmgeUondwSC kAaJHK45tFxxIDA== X-Received: from pfbhx8.prod.google.com ([2002:a05:6a00:8988:b0:7dd:8bba:63a2]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3a0b:b0:821:8ead:3706 with SMTP id d2e1a72fcca58-82404258523mr3716559b3a.4.1770156607937; Tue, 03 Feb 2026 14:10:07 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:45 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-12-skhawaja@google.com> Subject: [PATCH 11/14] iommufd-lu: Persist iommu hardware pagetables for live update From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: YiFei Zhu , Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: YiFei Zhu The caller is expected to mark each HWPT to be preserved with an ioctl call, with a token that will be used in restore. At preserve time, each HWPT's domain is then called with iommu_domain_preserve to preserve the iommu domain. The HWPTs containing dma mappings backed by unpreserved memory should not be preserved. During preservation check if the mappings contained in the HWPT being preserved are only file based and all the files are preserved. The memfd file preservation check is not enough when preserving iommufd. The memfd might have shrunk between the mapping and memfd preservation. This means that if it shrunk some pages that are right now pinned due to iommu mappings are not preserved with the memfd. Only allow iommufd preservation when all the iopt_pages are file backed and the memory file was seal sealed during mapping. This guarantees that all the pages that were backing memfd when it was mapped are preserved. Once HWPT is preserved the iopt associated with the HWPT is made immutable. Since the map and unmap ioctls operates directly on iopt, which contains an array of domains, while each hwpt contains only one domain. The logic then becomes that mapping and unmapping is prohibited if any of the domains in an iopt belongs to a preserved hwpt. However, tracing to the hwpt through the domain is a lot more tedious than tracing through the ioas, so if an hwpt is preserved, hwpt->ioas->iopt is made immutable. When undoing this (making the iopts mutable again), there's never a need to make some iopts mutable and some kept immutable, since the undo only happen on unpreserve and error path of preserve. Simply iterate all the ioas and clear the immutability flag on all their iopts. Signed-off-by: YiFei Zhu Signed-off-by: Samiullah Khawaja --- drivers/iommu/iommufd/io_pagetable.c | 17 ++ drivers/iommu/iommufd/io_pagetable.h | 1 + drivers/iommu/iommufd/iommufd_private.h | 25 ++ drivers/iommu/iommufd/liveupdate.c | 300 ++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 14 +- drivers/iommu/iommufd/pages.c | 8 + include/linux/kho/abi/iommufd.h | 39 +++ 7 files changed, 403 insertions(+), 1 deletion(-) create mode 100644 include/linux/kho/abi/iommufd.h diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/i= o_pagetable.c index 436992331111..43e8a2443793 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -270,6 +270,11 @@ static int iopt_alloc_area_pages(struct io_pagetable *= iopt, } =20 down_write(&iopt->iova_rwsem); + if (iopt_lu_map_immutable(iopt)) { + rc =3D -EBUSY; + goto out_unlock; + } + if ((length & (iopt->iova_alignment - 1)) || !length) { rc =3D -EINVAL; goto out_unlock; @@ -328,6 +333,7 @@ static void iopt_abort_area(struct iopt_area *area) WARN_ON(area->pages); if (area->iopt) { down_write(&area->iopt->iova_rwsem); + WARN_ON(iopt_lu_map_immutable(area->iopt)); interval_tree_remove(&area->node, &area->iopt->area_itree); up_write(&area->iopt->iova_rwsem); } @@ -755,6 +761,12 @@ static int iopt_unmap_iova_range(struct io_pagetable *= iopt, unsigned long start, again: down_read(&iopt->domains_rwsem); down_write(&iopt->iova_rwsem); + + if (iopt_lu_map_immutable(iopt)) { + rc =3D -EBUSY; + goto out_unlock_iova; + } + while ((area =3D iopt_area_iter_first(iopt, start, last))) { unsigned long area_last =3D iopt_area_last_iova(area); unsigned long area_first =3D iopt_area_iova(area); @@ -1398,6 +1410,11 @@ int iopt_cut_iova(struct io_pagetable *iopt, unsigne= d long *iovas, int i; =20 down_write(&iopt->iova_rwsem); + if (iopt_lu_map_immutable(iopt)) { + up_write(&iopt->iova_rwsem); + return -EBUSY; + } + for (i =3D 0; i < num_iovas; i++) { struct iopt_area *area; =20 diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/i= o_pagetable.h index 14cd052fd320..b64cb4cf300c 100644 --- a/drivers/iommu/iommufd/io_pagetable.h +++ b/drivers/iommu/iommufd/io_pagetable.h @@ -234,6 +234,7 @@ struct iopt_pages { struct { /* IOPT_ADDRESS_FILE */ struct file *file; unsigned long start; + u32 seals; }; /* IOPT_ADDRESS_DMABUF */ struct iopt_pages_dmabuf dmabuf; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommuf= d/iommufd_private.h index 6424e7cea5b2..f8366a23999f 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -94,6 +94,9 @@ struct io_pagetable { /* IOVA that cannot be allocated, struct iopt_reserved */ struct rb_root_cached reserved_itree; u8 disable_large_pages; +#ifdef CONFIG_IOMMU_LIVEUPDATE + bool lu_map_immutable; +#endif unsigned long iova_alignment; }; =20 @@ -712,12 +715,34 @@ iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) } =20 #ifdef CONFIG_IOMMU_LIVEUPDATE +int iommufd_liveupdate_register_lufs(void); +int iommufd_liveupdate_unregister_lufs(void); + int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd); +static inline bool iopt_lu_map_immutable(const struct io_pagetable *iopt) +{ + return iopt->lu_map_immutable; +} #else +static inline int iommufd_liveupdate_register_lufs(void) +{ + return 0; +} + +static inline int iommufd_liveupdate_unregister_lufs(void) +{ + return 0; +} + static inline int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd) { return -ENOTTY; } + +static inline bool iopt_lu_map_immutable(const struct io_pagetable *iopt) +{ + return false; +} #endif =20 #ifdef CONFIG_IOMMUFD_TEST diff --git a/drivers/iommu/iommufd/liveupdate.c b/drivers/iommu/iommufd/liv= eupdate.c index ae74f5b54735..ec11ae345fe7 100644 --- a/drivers/iommu/iommufd/liveupdate.c +++ b/drivers/iommu/iommufd/liveupdate.c @@ -4,9 +4,15 @@ =20 #include #include +#include +#include #include +#include +#include +#include =20 #include "iommufd_private.h" +#include "io_pagetable.h" =20 int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *ucmd) { @@ -47,3 +53,297 @@ int iommufd_hwpt_lu_set_preserve(struct iommufd_ucmd *u= cmd) return rc; } =20 +static void iommufd_set_ioas_mutable(struct iommufd_ctx *ictx) +{ + struct iommufd_object *obj; + struct iommufd_ioas *ioas; + unsigned long index; + + xa_lock(&ictx->objects); + xa_for_each(&ictx->objects, index, obj) { + if (obj->type !=3D IOMMUFD_OBJ_IOAS) + continue; + + ioas =3D container_of(obj, struct iommufd_ioas, obj); + + /* + * Not taking any IOAS lock here. All writers take LUO + * session mutex, and this writer racing with readers is not + * really a problem. + */ + WRITE_ONCE(ioas->iopt.lu_map_immutable, false); + } + xa_unlock(&ictx->objects); +} + +static int check_iopt_pages_preserved(struct liveupdate_session *s, + struct iommufd_hwpt_paging *hwpt) +{ + u32 req_seals =3D F_SEAL_SEAL | F_SEAL_GROW | F_SEAL_SHRINK; + struct iopt_area *area; + int ret; + + for (area =3D iopt_area_iter_first(&hwpt->ioas->iopt, 0, ULONG_MAX); area; + area =3D iopt_area_iter_next(area, 0, ULONG_MAX)) { + struct iopt_pages *pages =3D area->pages; + + /* Only allow file based mapping */ + if (pages->type !=3D IOPT_ADDRESS_FILE) + return -EINVAL; + + /* + * When this memory file was mapped it should be sealed and seal + * should be sealed. This means that since mapping was done the + * memory file was not grown or shrink and the pages being used + * until now remain pinnned and preserved. + */ + if ((pages->seals & req_seals) !=3D req_seals) + return -EINVAL; + + /* Make sure that the file was preserved. */ + ret =3D liveupdate_get_token_outgoing(s, pages->file, NULL); + if (ret) + return ret; + } + + return 0; +} + +static int iommufd_save_hwpts(struct iommufd_ctx *ictx, + struct iommufd_lu *iommufd_lu, + struct liveupdate_session *session) +{ + struct iommufd_hwpt_paging *hwpt, **hwpts =3D NULL; + struct iommu_domain_ser *domain_ser; + struct iommufd_hwpt_lu *hwpt_lu; + struct iommufd_object *obj; + unsigned int nr_hwpts =3D 0; + unsigned long index; + unsigned int i; + int rc =3D 0; + + if (iommufd_lu) { + hwpts =3D kcalloc(iommufd_lu->nr_hwpts, sizeof(*hwpts), + GFP_KERNEL); + if (!hwpts) + return -ENOMEM; + } + + xa_lock(&ictx->objects); + xa_for_each(&ictx->objects, index, obj) { + if (obj->type !=3D IOMMUFD_OBJ_HWPT_PAGING) + continue; + + hwpt =3D container_of(obj, struct iommufd_hwpt_paging, common.obj); + if (!hwpt->lu_preserve) + continue; + + if (hwpt->ioas) { + /* + * Obtain exclusive access to the IOAS and IOPT while we + * set immutability + */ + mutex_lock(&hwpt->ioas->mutex); + down_write(&hwpt->ioas->iopt.domains_rwsem); + down_write(&hwpt->ioas->iopt.iova_rwsem); + + hwpt->ioas->iopt.lu_map_immutable =3D true; + + up_write(&hwpt->ioas->iopt.iova_rwsem); + up_write(&hwpt->ioas->iopt.domains_rwsem); + mutex_unlock(&hwpt->ioas->mutex); + } + + if (!hwpt->common.domain) { + rc =3D -EINVAL; + xa_unlock(&ictx->objects); + goto out; + } + + if (!iommufd_lu) { + rc =3D check_iopt_pages_preserved(session, hwpt); + if (rc) { + xa_unlock(&ictx->objects); + goto out; + } + } else if (iommufd_lu) { + hwpts[nr_hwpts] =3D hwpt; + hwpt_lu =3D &iommufd_lu->hwpts[nr_hwpts]; + + hwpt_lu->token =3D hwpt->lu_token; + hwpt_lu->reclaimed =3D false; + } + + nr_hwpts++; + } + xa_unlock(&ictx->objects); + + if (WARN_ON(iommufd_lu && iommufd_lu->nr_hwpts !=3D nr_hwpts)) { + rc =3D -EFAULT; + goto out; + } + + if (iommufd_lu) { + /* + * iommu_domain_preserve may sleep and must be called + * outside of xa_lock + */ + for (i =3D 0; i < nr_hwpts; i++) { + hwpt =3D hwpts[i]; + hwpt_lu =3D &iommufd_lu->hwpts[i]; + + rc =3D iommu_domain_preserve(hwpt->common.domain, &domain_ser); + if (rc < 0) + goto out; + + hwpt_lu->domain_data =3D __pa(domain_ser); + } + } + + rc =3D nr_hwpts; + +out: + kfree(hwpts); + return rc; +} + +static int iommufd_liveupdate_preserve(struct liveupdate_file_op_args *arg= s) +{ + struct iommufd_ctx *ictx =3D iommufd_ctx_from_file(args->file); + struct iommufd_lu *iommufd_lu; + size_t serial_size; + void *mem; + int rc; + + if (IS_ERR(ictx)) + return PTR_ERR(ictx); + + rc =3D iommufd_save_hwpts(ictx, NULL, args->session); + if (rc < 0) + goto err_ioas_mutable; + + serial_size =3D struct_size(iommufd_lu, hwpts, rc); + + mem =3D kho_alloc_preserve(serial_size); + if (!mem) { + rc =3D -ENOMEM; + goto err_ioas_mutable; + } + + iommufd_lu =3D mem; + iommufd_lu->nr_hwpts =3D rc; + rc =3D iommufd_save_hwpts(ictx, iommufd_lu, args->session); + if (rc < 0) + goto err_free; + + args->serialized_data =3D virt_to_phys(iommufd_lu); + iommufd_ctx_put(ictx); + return 0; + +err_free: + kho_unpreserve_free(mem); +err_ioas_mutable: + iommufd_set_ioas_mutable(ictx); + iommufd_ctx_put(ictx); + return rc; +} + +static int iommufd_liveupdate_freeze(struct liveupdate_file_op_args *args) +{ + /* No-Op; everything should be made read-only */ + return 0; +} + +static void iommufd_liveupdate_unpreserve(struct liveupdate_file_op_args *= args) +{ + struct iommufd_ctx *ictx =3D iommufd_ctx_from_file(args->file); + struct iommufd_hwpt_paging *hwpt; + struct iommufd_object *obj; + unsigned long index; + + if (WARN_ON(IS_ERR(ictx))) + return; + + xa_lock(&ictx->objects); + xa_for_each(&ictx->objects, index, obj) { + if (obj->type !=3D IOMMUFD_OBJ_HWPT_PAGING) + continue; + + hwpt =3D container_of(obj, struct iommufd_hwpt_paging, common.obj); + if (!hwpt->lu_preserve) + continue; + if (!hwpt->common.domain) + continue; + + iommu_domain_unpreserve(hwpt->common.domain); + } + xa_unlock(&ictx->objects); + + kho_unpreserve_free(phys_to_virt(args->serialized_data)); + + iommufd_set_ioas_mutable(ictx); + iommufd_ctx_put(ictx); +} + +static int iommufd_liveupdate_retrieve(struct liveupdate_file_op_args *arg= s) +{ + return -EOPNOTSUPP; +} + +static bool iommufd_liveupdate_can_finish(struct liveupdate_file_op_args *= args) +{ + return false; +} + +static void iommufd_liveupdate_finish(struct liveupdate_file_op_args *args) +{ +} + +static bool iommufd_liveupdate_can_preserve(struct liveupdate_file_handler= *handler, + struct file *file) +{ + struct iommufd_ctx *ictx =3D iommufd_ctx_from_file(file); + + if (IS_ERR(ictx)) + return false; + + iommufd_ctx_put(ictx); + return true; +} + +static struct liveupdate_file_ops iommufd_lu_file_ops =3D { + .can_preserve =3D iommufd_liveupdate_can_preserve, + .preserve =3D iommufd_liveupdate_preserve, + .unpreserve =3D iommufd_liveupdate_unpreserve, + .freeze =3D iommufd_liveupdate_freeze, + .retrieve =3D iommufd_liveupdate_retrieve, + .can_finish =3D iommufd_liveupdate_can_finish, + .finish =3D iommufd_liveupdate_finish, +}; + +static struct liveupdate_file_handler iommufd_lu_handler =3D { + .compatible =3D IOMMUFD_LUO_COMPATIBLE, + .ops =3D &iommufd_lu_file_ops, +}; + +int iommufd_liveupdate_register_lufs(void) +{ + int ret; + + ret =3D liveupdate_register_file_handler(&iommufd_lu_handler); + if (ret) + return ret; + + ret =3D iommu_liveupdate_register_flb(&iommufd_lu_handler); + if (ret) + liveupdate_unregister_file_handler(&iommufd_lu_handler); + + return ret; +} + +int iommufd_liveupdate_unregister_lufs(void) +{ + WARN_ON(iommu_liveupdate_unregister_flb(&iommufd_lu_handler)); + + return liveupdate_unregister_file_handler(&iommufd_lu_handler); +} diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index e1a9b3051f65..d7683244c67a 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -775,11 +775,21 @@ static int __init iommufd_init(void) if (ret) goto err_misc; } + + if (IS_ENABLED(CONFIG_IOMMU_LIVEUPDATE)) { + ret =3D iommufd_liveupdate_register_lufs(); + if (ret) + goto err_vfio_misc; + } + ret =3D iommufd_test_init(); if (ret) - goto err_vfio_misc; + goto err_lufs; return 0; =20 +err_lufs: + if (IS_ENABLED(CONFIG_IOMMU_LIVEUPDATE)) + iommufd_liveupdate_unregister_lufs(); err_vfio_misc: if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) misc_deregister(&vfio_misc_dev); @@ -791,6 +801,8 @@ static int __init iommufd_init(void) static void __exit iommufd_exit(void) { iommufd_test_exit(); + if (IS_ENABLED(CONFIG_IOMMU_LIVEUPDATE)) + iommufd_liveupdate_unregister_lufs(); if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) misc_deregister(&vfio_misc_dev); misc_deregister(&iommu_misc_dev); diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index dbe51ecb9a20..cc0e3265ba4e 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -55,6 +55,7 @@ #include #include #include +#include #include =20 #include "double_span.h" @@ -1420,6 +1421,7 @@ struct iopt_pages *iopt_alloc_file_pages(struct file = *file, =20 { struct iopt_pages *pages; + int seals; =20 pages =3D iopt_alloc_pages(start_byte, length, writable); if (IS_ERR(pages)) @@ -1427,6 +1429,12 @@ struct iopt_pages *iopt_alloc_file_pages(struct file= *file, pages->file =3D get_file(file); pages->start =3D start - start_byte; pages->type =3D IOPT_ADDRESS_FILE; + + pages->seals =3D 0; + seals =3D memfd_get_seals(file); + if (seals > 0) + pages->seals =3D seals; + return pages; } =20 diff --git a/include/linux/kho/abi/iommufd.h b/include/linux/kho/abi/iommuf= d.h new file mode 100644 index 000000000000..f7393ac78aa9 --- /dev/null +++ b/include/linux/kho/abi/iommufd.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright (C) 2025, Google LLC + * Author: Samiullah Khawaja + */ + +#ifndef _LINUX_KHO_ABI_IOMMUFD_H +#define _LINUX_KHO_ABI_IOMMUFD_H + +#include +#include +#include + +/** + * DOC: IOMMUFD Live Update ABI + * + * This header defines the ABI for preserving the state of an IOMMUFD file + * across a kexec reboot using LUO. + * + * This interface is a contract. Any modification to any of the serializat= ion + * structs defined here constitutes a breaking change. Such changes require + * incrementing the version number in the IOMMUFD_LUO_COMPATIBLE string. + */ + +#define IOMMUFD_LUO_COMPATIBLE "iommufd-v1" + +struct iommufd_hwpt_lu { + u32 token; + u64 domain_data; + bool reclaimed; +} __packed; + +struct iommufd_lu { + unsigned int nr_hwpts; + struct iommufd_hwpt_lu hwpts[]; +}; + +#endif /* _LINUX_KHO_ABI_IOMMUFD_H */ --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DC073815C4 for ; Tue, 3 Feb 2026 22:10:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156612; cv=none; b=gNmzK4iBWdLwwOYxS0EtM3H8CrHBnY5ZSkRV3KAn45SOgd+eJ20AmAOeYEx7YAHl+sZ1oU5cw0lN/6eq0gXlG9FGniE41daZtbLAuoGlQds7JJZ0dMffSD8BnNXsX+GU/q+wa0HUvV0f5sGyrOoJg4cpYoA9iJEfQhGbeqJ+RtA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156612; c=relaxed/simple; bh=kib+TGW+Sn0wIp4wEuCVzIXogMh8yQTFjwVFCEPYhmY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=i0AE3hWbQhUInavUqKVMjk7dhPnAMUVG2r4LZRMEgMe/pubVpcinNKWzBFy2lCuzcMkpMB2e1pctNWzqn8YASvDzOKbrn9M2/tssLo+Oy6911l/OS6EaCx++LE8D0wGbjbWYjzSK+yW/l+yMcUgRRY6ExgudCF8XvBaOTNMKyww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yiWuZSL9; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yiWuZSL9" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3545cc84ab1so7281153a91.0 for ; Tue, 03 Feb 2026 14:10:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156609; x=1770761409; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cqqC2kaW5ecj2f4ECaQ1IWiF4359d2X2a1SnW9XYDSk=; b=yiWuZSL9Ap91iA5bMFdovEO9BPEwpndAd8N/1aB+qK5EDBFG/4Y/T0yxQQkB/QYekm 4nY+W2kGtDRy9X6kaOKPM4nhZo0qyARf3Esk8D6AXESAc2AEtsf0rzcWxDFBj8/g0Vb/ 8amiBC5LlhHUuhehZYUiGHyqu/DvItGh+9gVddAzhc0fSuWDYKyEyYh1vMOCYzKt+IyQ QLB+Xln7Fa0JukUiXpA037Yc4jfW4qtF7RRQgcrdg9DnARBEd4RhOWIBafbjT3yfEWQd SOr+1PwaXJJrUOd943ykhUWWPnLvNKkA12E+jTmZuXCPtAptL0pKBGXi9cJJ/blwF4RS dCbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156609; x=1770761409; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cqqC2kaW5ecj2f4ECaQ1IWiF4359d2X2a1SnW9XYDSk=; b=KlBWnHMeWhQOJwFnrBb1tzCj631JWPrynU+HmyJuh52LO2WsyEJd/N0gkxRIx/WQc1 0M7v5qc8yXgtoBYmHn5m+l+pfQTiN+owo0rQZltSrymQuJocJjofs+2S0+q2uWO0+DJ3 y/1ub8zII3mtEfRFqaQ8ZqRxLWP7EM858vB2HVif8hXBM/V2oHuGa2h8qsqgVZAxmMx5 odZOWghKor22IZhsfBvOit7P2Pn00/dTAZ1acYeMcWXXIjEkzZTBRGQ33AiHg+rnmGdJ +2mY05uXAtEqg9KhEDt6lai2wBvM5slRNJpR6ZvFCZmvA+Ovbxy7jfd21hXzEbBldnAX aaqw== X-Forwarded-Encrypted: i=1; AJvYcCUoSsrsYaucxGkk9RFAhKTuP4bE2PNx9vmSLW1WmvSdCsoXwu3/19B+GA9WAZl7yCTnRG8d/eqzsUPDecc=@vger.kernel.org X-Gm-Message-State: AOJu0YxL8nzFLMLFybFvKv63xC/vZwPUbrBV4SQatzhckLmH0CY9FA63 6dDs5l9IJ0jN/a1c1e8H7NgKferL766MZ1GWSPApv1DxQgGepZqNhPVzBWjHEJV4Qjm0LYipiag LIHNgdvgIJ9UAsQ== X-Received: from pjpo9.prod.google.com ([2002:a17:90a:9f89:b0:34c:f8b8:349b]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e70d:b0:34c:a35d:de19 with SMTP id 98e67ed59e1d1-354871ea85fmr658805a91.33.1770156609486; Tue, 03 Feb 2026 14:10:09 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:46 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-13-skhawaja@google.com> Subject: [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add APIs that can be used to preserve and unpreserve a vfio cdev. Use the APIs exported by the IOMMU core to preserve/unpreserve device. Pass the LUO preservation token of the attached iommufd into IOMMU preserve device API. This establishes the ownership of the device with the preserved iommufd. Signed-off-by: Samiullah Khawaja --- drivers/iommu/iommufd/device.c | 69 ++++++++++++++++++++++++++++++++++ include/linux/iommufd.h | 23 ++++++++++++ 2 files changed, 92 insertions(+) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 4c842368289f..30cb5218093b 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -2,6 +2,7 @@ /* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES */ #include +#include #include #include #include @@ -1661,3 +1662,71 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +#ifdef CONFIG_IOMMU_LIVEUPDATE +int iommufd_device_preserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 *tokenp) +{ + struct iommufd_group *igroup =3D idev->igroup; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_hw_pagetable *hwpt; + struct iommufd_attach *attach; + int ret; + + mutex_lock(&igroup->lock); + attach =3D xa_load(&igroup->pasid_attach, IOMMU_NO_PASID); + if (!attach) { + ret =3D -ENOENT; + goto out; + } + + hwpt =3D attach->hwpt; + hwpt_paging =3D find_hwpt_paging(hwpt); + if (!hwpt_paging || !hwpt_paging->lu_preserve) { + ret =3D -EINVAL; + goto out; + } + + ret =3D liveupdate_get_token_outgoing(s, idev->ictx->file, tokenp); + if (ret) + goto out; + + ret =3D iommu_preserve_device(hwpt_paging->common.domain, + idev->dev, + *tokenp); +out: + mutex_unlock(&igroup->lock); + return ret; +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_preserve, "IOMMUFD"); + +void iommufd_device_unpreserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 token) +{ + struct iommufd_group *igroup =3D idev->igroup; + struct iommufd_hwpt_paging *hwpt_paging; + struct iommufd_hw_pagetable *hwpt; + struct iommufd_attach *attach; + + mutex_lock(&igroup->lock); + attach =3D xa_load(&igroup->pasid_attach, IOMMU_NO_PASID); + if (!attach) { + WARN_ON(-ENOENT); + goto out; + } + + hwpt =3D attach->hwpt; + hwpt_paging =3D find_hwpt_paging(hwpt); + if (!hwpt_paging || !hwpt_paging->lu_preserve) { + WARN_ON(-EINVAL); + goto out; + } + + iommu_unpreserve_device(hwpt_paging->common.domain, idev->dev); +out: + mutex_unlock(&igroup->lock); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_unpreserve, "IOMMUFD"); +#endif diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 6e7efe83bc5d..c4b3ed5b518c 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -71,6 +72,28 @@ void iommufd_device_detach(struct iommufd_device *idev, = ioasid_t pasid); struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev); u32 iommufd_device_to_id(struct iommufd_device *idev); =20 +#ifdef CONFIG_IOMMU_LIVEUPDATE +int iommufd_device_preserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 *tokenp); +void iommufd_device_unpreserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 token); +#else +static inline int iommufd_device_preserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 *tokenp) +{ + return -EOPNOTSUPP; +} + +static inline void iommufd_device_unpreserve(struct liveupdate_session *s, + struct iommufd_device *idev, + u64 token) +{ +} +#endif + struct iommufd_access_ops { u8 needs_pin_pages : 1; void (*unmap)(void *data, unsigned long iova, unsigned long length); --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A870C376BE3 for ; Tue, 3 Feb 2026 22:10:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156613; cv=none; b=lrb2esHxprugudH4cVydzu0vTygrShVz0lE9mX93kdyxRaxaEocnJ1qNjfZmvfcJ98YNbJRJ7zuArOkg3cExajMfmXp/36YOGS4pmcmGCsVoAeLdJwimOlhbB68IYs8ROzfOUxW8BvI5HBkbBOGa38nacWdUak4p2LzsYJrBQ/E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156613; c=relaxed/simple; bh=OvnjKmtuP+/pnyWLrwtRVT2pKYDknDMFSdYGBTZusD8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IMsL+aMdMBWxW0gg2U5yd75ZQfgvm/uQl6GaZNNLeGDCCAyYJfgNHORUgJgxh3lFmhzNU1fSFKiQEkxB/JF7LQr+NVobhyTftH/eVQ04pKuV5Blli+6qoMKDeD0Ck51Kv4rJXwsFpp4l0WIakfw6sx+BzcdpnKK8o12XV8BNTh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aXQlxC9U; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aXQlxC9U" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-81f8c209cfbso3314668b3a.2 for ; Tue, 03 Feb 2026 14:10:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156611; x=1770761411; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/8ncc0FKC3TXUBMo9gNfkA/biCGmORiTT+jCw4LASx4=; b=aXQlxC9UVztklrIzfWpZev/rwSmUKlvraxtoY5lfgHt/QeO1hxJl+lnQRKrGJcKEGL Gk1/ca2Q7pf5Q11WoTGLYViBevkmo6aW454pmyCDGCaW8we/f6UxzbSKSCHQ4GaSH6gZ 8qizJ79pp2lbnZzUsa4iBg5NyymBPMFZRM6YEjtTKfGLAplSnTQ2Hxp9JSW4B5hmS/Yc eyXqvfyBpDiA4MooFTFuulxWEs0Vjykgjs/Cz7vD2ZNcu9pcJLn5vJCXtpcMMw1Eupr3 6B+alAOc5TAMWNwkkM9i99hbhVOJqWtP2+yKa+t7APffXiWcSRl7x1UN783xBBduPAQw kncA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156611; x=1770761411; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/8ncc0FKC3TXUBMo9gNfkA/biCGmORiTT+jCw4LASx4=; b=YBbpZ9GbnoYoQZwUtBNTD5XUi01jn1kRKhC29Qp9qwjwNooo0b3Ko0B3JCQ+VtCox7 S1d6fDzeV4z/tKRkaAZhUfnCXaujMBwjjB1aZ++wz2PfDngd6t8PLvuovebJtXclHs/n 5yWVRSHF0+NTFRJSC0ZHDNPj1xTyl5GZOutb9vay1J1lNX0NsYDAPILvEzhgMVLGMRlO MK8W7TVVrDRDGiY0JWai5WDIIMJW+ysnvsc0QN4FdVQaYhOuiazl/MzAB7QnEaYi2nsI viL/GWjL8EotfDuK3PdlEa9CHFYGuwtCKWjUPbsQYooPAncoqsEN5LC34LrxOb7ORt4D Tp3w== X-Forwarded-Encrypted: i=1; AJvYcCVmm0FCuNYz864SQ/s80cJXHZWl5mLJvSDai0WT+t721HAht4QTpU1KFaxC62QpgtsBwDbiQtvZylrWKqM=@vger.kernel.org X-Gm-Message-State: AOJu0Ywiw7xs2gqbxBxzRq3fAziHjzXzsIG8lc1KXU6Ayx7ZutvqhrLr hyooghQunO5FKl30Ba6XEIenC7qK9DT7iz6X0t+tIG+SUy5XjDuHH41jNgTXS8/dltHj2/S06Ef E6kmKX6QH072EKg== X-Received: from pfblr43.prod.google.com ([2002:a05:6a00:73ab:b0:7dd:8bba:6394]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2e8f:b0:821:81ef:5de8 with SMTP id d2e1a72fcca58-8241c1ab516mr847226b3a.12.1770156610959; Tue, 03 Feb 2026 14:10:10 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:47 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-14-skhawaja@google.com> Subject: [PATCH 13/14] vfio/pci: Preserve the iommufd state of the vfio cdev From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If the vfio cdev is attached to an iommufd, preserve the state of the attached iommufd also. Basically preserve the iommu state of the device and also the attached domain. The token returned by the preservation API will be used to restore/rebind to the iommufd state after liveupdate. Signed-off-by: Samiullah Khawaja --- drivers/vfio/pci/vfio_pci_liveupdate.c | 28 +++++++++++++++++++++++++- include/linux/kho/abi/vfio_pci.h | 10 +++++++++ 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio= _pci_liveupdate.c index c52d6bdb455f..af6fbfb7a65c 100644 --- a/drivers/vfio/pci/vfio_pci_liveupdate.c +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c @@ -15,6 +15,7 @@ #include #include #include +#include =20 #include "vfio_pci_priv.h" =20 @@ -39,6 +40,7 @@ static int vfio_pci_liveupdate_preserve(struct liveupdate= _file_op_args *args) struct vfio_pci_core_device_ser *ser; struct vfio_pci_core_device *vdev; struct pci_dev *pdev; + u64 token =3D 0; =20 vdev =3D container_of(device, struct vfio_pci_core_device, vdev); pdev =3D vdev->pdev; @@ -49,15 +51,32 @@ static int vfio_pci_liveupdate_preserve(struct liveupda= te_file_op_args *args) if (vfio_pci_is_intel_display(pdev)) return -EINVAL; =20 +#if CONFIG_IOMMU_LIVEUPDATE + /* If iommufd is attached, preserve the underlying domain */ + if (device->iommufd_attached) { + int err =3D iommufd_device_preserve(args->session, + device->iommufd_device, + &token); + if (err < 0) + return err; + } +#endif + ser =3D kho_alloc_preserve(sizeof(*ser)); - if (IS_ERR(ser)) + if (IS_ERR(ser)) { + if (device->iommufd_attached) + iommufd_device_unpreserve(args->session, + device->iommufd_device, token); + return PTR_ERR(ser); + } =20 pci_liveupdate_outgoing_preserve(pdev); =20 ser->bdf =3D pci_dev_id(pdev); ser->domain =3D pci_domain_nr(pdev->bus); ser->reset_works =3D vdev->reset_works; + ser->iommufd_ser.token =3D token; =20 args->serialized_data =3D virt_to_phys(ser); return 0; @@ -66,6 +85,13 @@ static int vfio_pci_liveupdate_preserve(struct liveupdat= e_file_op_args *args) static void vfio_pci_liveupdate_unpreserve(struct liveupdate_file_op_args = *args) { struct vfio_device *device =3D vfio_device_from_file(args->file); + struct vfio_pci_core_device_ser *ser; + + ser =3D phys_to_virt(args->serialized_data); + if (device->iommufd_attached) + iommufd_device_unpreserve(args->session, + device->iommufd_device, + ser->iommufd_ser.token); =20 pci_liveupdate_outgoing_unpreserve(to_pci_dev(device->dev)); kho_unpreserve_free(phys_to_virt(args->serialized_data)); diff --git a/include/linux/kho/abi/vfio_pci.h b/include/linux/kho/abi/vfio_= pci.h index 6c3d3c6dfc09..d01bd58711c2 100644 --- a/include/linux/kho/abi/vfio_pci.h +++ b/include/linux/kho/abi/vfio_pci.h @@ -28,6 +28,15 @@ =20 #define VFIO_PCI_LUO_FH_COMPATIBLE "vfio-pci-v1" =20 +/** + * struct vfio_iommufd_ser - Serialized state of the attached iommufd. + * + * @token: The token of the bound iommufd state. + */ +struct vfio_iommufd_ser { + u32 token; +} __packed; + /** * struct vfio_pci_core_device_ser - Serialized state of a single VFIO PCI * device. @@ -40,6 +49,7 @@ struct vfio_pci_core_device_ser { u16 bdf; u16 domain; u8 reset_works; + struct vfio_iommufd_ser iommufd_ser; } __packed; =20 #endif /* _LINUX_LIVEUPDATE_ABI_VFIO_PCI_H */ --=20 2.53.0.rc2.204.g2597b5adb4-goog From nobody Sun Feb 8 06:56:17 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32749387364 for ; Tue, 3 Feb 2026 22:10:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156614; cv=none; b=l3GTj64YX39Z2HwuhQnAcfIQ3ghvwL6Uv6Sj+K0MpIQa86LBhBlvlC7TafGmMOueEzuWy84U2dULTx9KQCsaPz0GhsIzYTMJiEg/jnYea+59SHtTzYoA6k8MH9hycWBSxXPPsnDCpx1UqdgQBmjG1I/R1ysXdkpdLhHKGaUa5GM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770156614; c=relaxed/simple; bh=+jbk5dN4Kg0hIbvLC9M49kRaC9JuOCkxvufz9fXJbrA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XWCF1nFHSg2rKgvqAQcnuJkFPA7f5xT5VU56Rv1TvNZU+cpzr6jwtqOeLRsLJYMoLopn4xdBIQ/8MbODd/2GT0WAqs+ubj8C0Fay+/05WQC4XW/fyygseq3EuMcJLoWnDiXxnRHqe37gCZUO8dqkcLFLpF7icjMBPZYkb7T3h3c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RTVKNeGW; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--skhawaja.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RTVKNeGW" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-81f8c209cfbso3314674b3a.2 for ; Tue, 03 Feb 2026 14:10:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770156613; x=1770761413; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0BadSv0cPbiUnIcaCmXI9N9Iq0BsivzkGFYVhjIgVFI=; b=RTVKNeGWKSqzkCZeYxarW9+99koTNKK42ZUM336PLgI3h6hz6Lj0vMWXC0YUPOYPVU 3mwk5orD3AyEYAp59IT0DMmltGNANDv0OYq6q6rzau8Ep5MIez1wjNfGd/prCsb2NxKb zAGHDCxt+k8fklWTZFMWCJ1+O4pgop6tPnHpxW9/km7Q7vhrp65dCvTVuXbRcBGo19NY Os+ZU/BPLDrZ3KK5oNSn5fm8Pmk/1NgPuv2bkcx4wy4hSbRAYvQNF1o0pklh0iB9afb9 DNdRxs1lUIFLvGeUEOuGCp9FutepGTh0nq0dQ6EgcOp3TZ/+6hust1c+2TR2znvhihPa MgYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770156613; x=1770761413; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0BadSv0cPbiUnIcaCmXI9N9Iq0BsivzkGFYVhjIgVFI=; b=sTCMztlHg5dUXY6tOmSsC9usadtHprf5T4RPi1SVVDhVpDhuu65DkQq1l7H15Rfzgv ObFpzPKd0wU4hpdk7Rw+xzHR/YDKSh0w1g9k4AzXJ62Mz8gv8m3sjgs9Iw284olkiHvd ZQnwcsMCoQyxbjdjpex6OATqJesIzyAO4HXuUy8JCZCoMY4u89LWIvdpxrEWqQDM9yP+ Og7cpxMrkYrC/hbQ6RxUVqMe3pZKWrkboTi+dC7Fs/3jkGAnNY2msIXv5KWosRchrrMg S3uGOvUxVar1XG5ZGTnpbCA9+/2pPTczN8GxAmzwwlRFOKkhBySk5UH/d2YEatmpzoWa 7uTA== X-Forwarded-Encrypted: i=1; AJvYcCUQVzUTQsJ3jLSPYiTvqAn6OiuwcYmrO3Ghlov3E0sKaDWXu04e0I1m77332l5Erd9cmWW8EWPruFi1Ojg=@vger.kernel.org X-Gm-Message-State: AOJu0YzlcU8jhB6o/YeAcuHbDDTbSo3yOVY5nGQxZREHQzBg9MNSQa5F /8/tHFHu+FzzA7/gy6zC9y52rTUfG4uVE67RSV3L/1/b/dH+LBdDPD689dOb9LDV+kuqfMutyLT sNRgSnCO4Y6feuA== X-Received: from pfld15.prod.google.com ([2002:a05:6a00:198f:b0:77f:33ea:96e9]) (user=skhawaja job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2307:b0:823:1276:9a86 with SMTP id d2e1a72fcca58-8241c5efdabmr771793b3a.39.1770156612602; Tue, 03 Feb 2026 14:10:12 -0800 (PST) Date: Tue, 3 Feb 2026 22:09:48 +0000 In-Reply-To: <20260203220948.2176157-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260203220948.2176157-1-skhawaja@google.com> X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog Message-ID: <20260203220948.2176157-15-skhawaja@google.com> Subject: [PATCH 14/14] iommufd/selftest: Add test to verify iommufd preservation From: Samiullah Khawaja To: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe Cc: Samiullah Khawaja , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test iommufd preservation by setting up an iommufd and vfio cdev and preserve it across live update. Test takes VFIO cdev path of a device bound to vfio-pci driver and binds it to an iommufd being preserved. It also preserves the vfio cdev so the iommufd state associated with it is also preserved. The restore path is tested by restoring the preserved vfio cdev only. Test tries to finish the session without restoring iommufd and confirms that it fails. Signed-off-by: Samiullah Khawaja Signed-off-by: YiFei Zhu --- tools/testing/selftests/iommu/Makefile | 12 + .../selftests/iommu/iommufd_liveupdate.c | 209 ++++++++++++++++++ 2 files changed, 221 insertions(+) create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate.c diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftes= ts/iommu/Makefile index 84abeb2f0949..263195af4d6a 100644 --- a/tools/testing/selftests/iommu/Makefile +++ b/tools/testing/selftests/iommu/Makefile @@ -7,4 +7,16 @@ TEST_GEN_PROGS :=3D TEST_GEN_PROGS +=3D iommufd TEST_GEN_PROGS +=3D iommufd_fail_nth =20 +TEST_GEN_PROGS_EXTENDED +=3D iommufd_liveupdate + include ../lib.mk +include ../liveupdate/lib/libliveupdate.mk + +CFLAGS +=3D -I$(top_srcdir)/tools/include +CFLAGS +=3D -MD +CFLAGS +=3D $(EXTRA_CFLAGS) + +$(TEST_GEN_PROGS_EXTENDED): %: %.o $(LIBLIVEUPDATE_O) + $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $< $(LIBLIVEUPDATE_= O) $(LDLIBS) -static -o $@ + +EXTRA_CLEAN +=3D $(LIBLIVEUPDATE_O) diff --git a/tools/testing/selftests/iommu/iommufd_liveupdate.c b/tools/tes= ting/selftests/iommu/iommufd_liveupdate.c new file mode 100644 index 000000000000..8b4ea9f2b7e9 --- /dev/null +++ b/tools/testing/selftests/iommu/iommufd_liveupdate.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (c) 2025, Google LLC. + * Samiullah Khawaja + */ + +#include +#include +#include +#include +#include + +#define __EXPORTED_HEADERS__ +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define ksft_assert(condition) \ + do { if (!(condition)) \ + ksft_exit_fail_msg("Failed: %s at %s %d: %s\n", \ + #condition, __FILE__, __LINE__, strerror(errno)); } while (0) + +int setup_cdev(const char *vfio_cdev_path) +{ + int cdev_fd; + + cdev_fd =3D open(vfio_cdev_path, O_RDWR); + if (cdev_fd < 0) + ksft_exit_skip("Failed to open VFIO cdev: %s\n", vfio_cdev_path); + + return cdev_fd; +} + +int open_iommufd(void) +{ + int iommufd; + + iommufd =3D open("/dev/iommu", O_RDWR); + if (iommufd < 0) + ksft_exit_skip("Failed to open /dev/iommu. IOMMUFD support not enabled.\= n"); + + return iommufd; +} + +int setup_iommufd(int iommufd, int memfd, int cdev_fd, int hwpt_token) +{ + int ret; + + struct vfio_device_bind_iommufd bind =3D { + .argsz =3D sizeof(bind), + .flags =3D 0, + }; + struct iommu_ioas_alloc alloc_data =3D { + .size =3D sizeof(alloc_data), + .flags =3D 0, + }; + struct iommu_hwpt_alloc hwpt_alloc =3D { + .size =3D sizeof(hwpt_alloc), + .flags =3D 0, + }; + struct vfio_device_attach_iommufd_pt attach_data =3D { + .argsz =3D sizeof(attach_data), + .flags =3D 0, + }; + struct iommu_hwpt_lu_set_preserve set_preserve =3D { + .size =3D sizeof(set_preserve), + .hwpt_token =3D hwpt_token, + }; + struct iommu_ioas_map_file map_file =3D { + .size =3D sizeof(map_file), + .length =3D SZ_1M, + .flags =3D IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE, + .iova =3D SZ_4G, + .fd =3D memfd, + .start =3D 0, + }; + + bind.iommufd =3D iommufd; + ret =3D ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); + ksft_assert(!ret); + + ret =3D ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); + ksft_assert(!ret); + + hwpt_alloc.dev_id =3D bind.out_devid; + hwpt_alloc.pt_id =3D alloc_data.out_ioas_id; + ret =3D ioctl(iommufd, IOMMU_HWPT_ALLOC, &hwpt_alloc); + ksft_assert(!ret); + + attach_data.pt_id =3D hwpt_alloc.out_hwpt_id; + ret =3D ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); + ksft_assert(!ret); + + map_file.ioas_id =3D alloc_data.out_ioas_id; + ret =3D ioctl(iommufd, IOMMU_IOAS_MAP_FILE, &map_file); + ksft_assert(!ret); + + set_preserve.hwpt_id =3D attach_data.pt_id; + ret =3D ioctl(iommufd, IOMMU_HWPT_LU_SET_PRESERVE, &set_preserve); + ksft_assert(!ret); + + return ret; +} + +static int create_sealed_memfd(size_t size) +{ + int fd, ret; + + fd =3D memfd_create("buffer", MFD_ALLOW_SEALING); + ksft_assert(fd > 0); + + ret =3D ftruncate(fd, size); + ksft_assert(!ret); + + ret =3D fcntl(fd, F_ADD_SEALS, + F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL); + ksft_assert(!ret); + + return fd; +} + +int main(int argc, char *argv[]) +{ + int iommufd, cdev_fd, memfd, luo, session, ret; + const int token =3D 0x123456; + const int cdev_token =3D 0x654321; + const int hwpt_token =3D 0x789012; + const int memfd_token =3D 0x890123; + + if (argc < 2) { + printf("Usage: ./iommufd_liveupdate \n"); + return 1; + } + + luo =3D luo_open_device(); + ksft_assert(luo > 0); + + session =3D luo_retrieve_session(luo, "iommufd-test"); + if (session =3D=3D -ENOENT) { + session =3D luo_create_session(luo, "iommufd-test"); + + iommufd =3D open_iommufd(); + memfd =3D create_sealed_memfd(SZ_1M); + cdev_fd =3D setup_cdev(argv[1]); + + ret =3D setup_iommufd(iommufd, memfd, cdev_fd, hwpt_token); + ksft_assert(!ret); + + /* Cannot preserve cdev without iommufd */ + ret =3D luo_session_preserve_fd(session, cdev_fd, cdev_token); + ksft_assert(ret); + + /* Cannot preserve iommufd without preserving memfd. */ + ret =3D luo_session_preserve_fd(session, iommufd, token); + ksft_assert(ret); + + ret =3D luo_session_preserve_fd(session, memfd, memfd_token); + ksft_assert(!ret); + + ret =3D luo_session_preserve_fd(session, iommufd, token); + ksft_assert(!ret); + + ret =3D luo_session_preserve_fd(session, cdev_fd, cdev_token); + ksft_assert(!ret); + + close(session); + session =3D luo_create_session(luo, "iommufd-test"); + + ret =3D luo_session_preserve_fd(session, memfd, memfd_token); + ksft_assert(!ret); + + ret =3D luo_session_preserve_fd(session, iommufd, token); + ksft_assert(!ret); + + ret =3D luo_session_preserve_fd(session, cdev_fd, cdev_token); + ksft_assert(!ret); + + daemonize_and_wait(); + } else { + struct vfio_device_bind_iommufd bind =3D { + .argsz =3D sizeof(bind), + .flags =3D 0, + }; + + cdev_fd =3D luo_session_retrieve_fd(session, cdev_token); + ksft_assert(cdev_fd > 0); + + iommufd =3D luo_session_retrieve_fd(session, token); + ksft_assert(iommufd < 0); + + iommufd =3D open_iommufd(); + + bind.iommufd =3D iommufd; + ret =3D ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); + ksft_assert(ret); + ksft_assert(errno =3D=3D EPERM); + + /* Should fail */ + ret =3D luo_session_finish(session); + ksft_assert(ret); + } + + return 0; +} --=20 2.53.0.rc2.204.g2597b5adb4-goog