From nobody Sun Jun 14 11:28:32 2026 Received: from DB3PR0202CU003.outbound.protection.outlook.com (mail-northeuropeazon11010047.outbound.protection.outlook.com [52.101.84.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 735113EDADB; Thu, 2 Apr 2026 16:13:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.84.47 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146402; cv=fail; b=G0IBmkrqjZnNO5D96++0Y4/VkDSSO+ynD6cDA/b5NoQnWmsAG2NaBkdOlWzNjXGykWNx8SFuOHxm1v3HICNrHyueisyOTImBrDsHHAxnpILFJXOlgxWKSl7Hq3vX1QMD/l0BNkegGq6UYJ3vrIS98MROT+YOPYa5tDUxoW3PPbc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146402; c=relaxed/simple; bh=dPOnEYWB5l+V4d7mO09kmDxlHkpGHiLdOX3/lEjFWAk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: Content-Type:MIME-Version; b=jFl+OgP5uJatMWgiloibZ0c5dhvIWHn0ptxWYOWSr1JLdIov0th9NeasDlURCdC/mP7q7FDcmj2i2jaHL/scyTkdFDiXwmOIok0xaq+RTHh2zM/ddVa/f2SWyJLrsDGyOZMxUYuXgKZBWUDsHy9Cklj3Hj/fYaWSlNFYkplOxs0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech; spf=pass smtp.mailfrom=est.tech; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b=Pa2AKbHT; arc=fail smtp.client-ip=52.101.84.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=est.tech Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b="Pa2AKbHT" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=i5Px2xyB26TAWFTnzV7ua3TcBXi2p5Q15z6UDxEoBMBVuB2DQHfrkQScIlt9dnFTQw6GNm250bIsAZKltlHW2+ZGog9bCbxEwFBYvxatync/sjs5DrYyeMJ5BCt1UbO1HAoDKFq0E3taJiCF5a04n4v2xErewWUBTtMYxIhDrAFA2w6jY2VuaPa+1xXXKs4OLgNPxx9/Jq1X+FN+esn/HHDLfhsDCNWDIyUxHPOG31BwvLh6/va8aVtQ10JmepwtvkAcKtTk19vPlOLXw/Q52MikILQHMo7UUYTREJQmGOtD1Cp537Slnv8SJqd2h0qHKl6gsqQT+6lmQGUtyywZgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W+J6M7Xf3RRkbvALcDgcTy2z46GpeZS26iQuUWBuKL0=; b=ax8BolZaZvYtH5cUsmRnUK3uunE6iMGuP5Y3R6UE3XfMDu5VQSWf5ohQpfg0WhBfw7n2mW8PeMqgeWaSPvHLFCy8K5pws2tBSzeNwB+B9nLpVsnB2AjgEFR3/JQEcuaJ4dSYBXF1LlOziu2oT4u3ZprZ8EZKSGLwfJOVcfP3VCHA/C92OwjDOWJXRNLNCZ7GLvQ8q6bi1ziQx4tJYKeYcWloeGtoFc78tgca6ypznee2LCmZrEkJwphIaWbaeOWqmuHpxqK64zVLEtL/o62oIzf48fHWUH6UzRsieJgw9o1DHbPd3CXazjRAlE8HxhDffpyq5DrzeJxT6plObKnzDA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=est.tech; dmarc=pass action=none header.from=est.tech; dkim=pass header.d=est.tech; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=est.tech; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W+J6M7Xf3RRkbvALcDgcTy2z46GpeZS26iQuUWBuKL0=; b=Pa2AKbHT7gyySk9LeZ4/HqPaIZTi5dKyG65mARvH3g14+xxyDdGrMQGPMuqEkO2eI9RFixQWykAtS8ux8FZRuqBaph5fJx/cabphMXafBqbXxBB2F+F3mw+WNEN6a2QkzAp0RC78um4kdFoui8VfrGOWJDX2rMQuZC1D50Hsy7QQeSi2k2JNkPyP1MydpFveuIjSMeclpXFUAhmIhjBU3jLujM/rRwpVKxJlLTw7EssfoeJutl7abo0U+F6nvh3HEwsp7Gc1yiyhQkfkTGsr745kq3WjeN4cH9G94RI0hRN+IJZToqeMyH+MXyPOI28WgzDoq4V2fdjmYa/ded3Uew== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=est.tech; Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) by PAXP189MB1952.EURP189.PROD.OUTLOOK.COM (2603:10a6:102:28c::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Thu, 2 Apr 2026 16:13:15 +0000 Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca]) by DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca%4]) with mapi id 15.20.9769.016; Thu, 2 Apr 2026 16:13:15 +0000 From: tugrul.kukul@est.tech To: gregkh@linuxfoundation.org, sashal@kernel.org, stable@vger.kernel.org Cc: alex.williamson@redhat.com, kevin.tian@intel.com, jgg@ziepe.ca, lorenzo.stoakes@oracle.com, david@redhat.com, akpm@linux-foundation.org, mike.kravetz@oracle.com, linmiaohe@huawei.com, yi.l.liu@intel.com, axelrasmussen@google.com, leah.rumancik@gmail.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, david.nystrom@est.tech Subject: [PATCH 6.6.y 1/4] vfio: Create vfio_fs_type with inode per device Date: Thu, 2 Apr 2026 18:13:08 +0200 Message-Id: <20260402161311.63484-2-tugrul.kukul@est.tech> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260402161311.63484-1-tugrul.kukul@est.tech> References: <20260402161311.63484-1-tugrul.kukul@est.tech> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: LO4P265CA0020.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2ae::16) To DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB8P189MB0966:EE_|PAXP189MB1952:EE_ X-MS-Office365-Filtering-Correlation-Id: fb8801af-eb37-4de1-9a0a-08de90d2bea8 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: jRucQv/Dol8/Rp/7uEb0B99+NmHoMcrAqc5Pm7e3fnnpd2OvT5ki0P4Lahg5k7gBbAQGbvSTtomKaOLioEf2wDT1shhPC9yavcIvLnVcUcn04UZT+1MUk/VIC2V+8Ymb7I0QNRrIfoGEk36G5VjMy25XRjbl0tIhCb/bN9rgjmlv0UldC0WfjMoZEHFXS7c1sWPmnf+nRl/bL/1aNe/DNWBJqD1/Ol5xzR7dC3B9+6JFw+HuQNcPnEQj0WKcmS633jvu/OxspjEhqVjedHr7fLh607Jc0tn4N3teRfsckeRcKMJdt4hD2fbUFkhJtPesWOo9OiUGMnquGcaTe4Msbkhe1MYfhCoequDtNKlG0xY6mHxrWt0+L3+5xaZ0aEcir9c72a4tvviHYVWMwc8MdDTZzAcMG1Iw7zfEMSG2/lguNOxpDeWXQsopWfzNEEDqornSAm0osxBqM4QJhiHRBZNW9lfOARFRgGObtS3cqy8duk/USwUE1JSmMiErDR1m2e736qA46id1F3g9HFXOBPLx2kxArLvpFgwZRPhzbCDXql+kdStauqdkQACf8DClUVofCGYEppvI4efuxhsZM0AFOwQIyPAWkAX9hPOcDFMKeflwu0RJypkS7LUlE7oWj9QyqFvjNZSE6MThk3NM+oK1pPrGloIywMMFuVPCGm8/d0KuuxPrkwNc3baalzIwqYK4zlhosk2kO9io4dIHxMdeJbnuW+IsGpw9koKWvO0= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB8P189MB0966.EURP189.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Y1i+vp7RUgfx/Km+Hd7eLCZJKXwmbnmXRknfBYHxoh0oZGGQ9t9rjhMOZ9cG?= =?us-ascii?Q?95eFuzb+9alUxHfIhN0XkAgHrKT02Ln7mYkJtKQldGck8DWHFvF4CQX9IQeR?= =?us-ascii?Q?ILxNjK8A4mlQfps/e5xAZigj6Q2UOlF5MxpQZy9i9nwpoqgntLyqwiEtody/?= =?us-ascii?Q?lXU4ChncWcf3aEYC/86cCMM9L3qU7FK+CeXGGk5mOQEzM0FVqVHCTrur72/7?= =?us-ascii?Q?GZLmyCpHHU3LJmFzY/LqAll91MNzUP9LPaEfKVo/QvbAqron5KE9JJfsiXw2?= =?us-ascii?Q?1iDxVx9AlRI5GLTa+i7x+BvCAmcUUd6SUWRLeshbErBN5aXlLqSoC+FM6xVy?= =?us-ascii?Q?9nEJVUKXBKkawSe0QLBEz/muyw6LvwpjcFy0cBemCqWTW0RErahtd00LSu3/?= =?us-ascii?Q?/d5CZFUkuPnERQ1vlTCrx3KhUNBy3ZJKYSW3RAWdvEOZZugN+jwFz1YIjlPY?= =?us-ascii?Q?N0/sHPBJXfKxtyirXRFXDU3iViCCFdLRFPzOKfrnN9uIDHezFhFLMe05n4sC?= =?us-ascii?Q?jG7xYu2nca/niCVRrqHY5p5Dvl7e4N2z/LKhHY9Wsq7VJVbvsuEjVYHf51BA?= =?us-ascii?Q?5L9bwErCLkqkVGKKbo827MpjaSZXsyWIgOB2LLfDlIIdsnt+GbXgZrziI2IO?= =?us-ascii?Q?ibO/qiui4fXs8zG+XeZ8yxAMKRCVZntTlvFVqIo7GqbhXp6QiFTo44LXX+mW?= =?us-ascii?Q?ba/X5oOLwQTzwAg0JmflXGxzCzEGnQfZwYRpuRa8985MclCVbp0opPRdEWRq?= =?us-ascii?Q?Ly/CKVVV519Im1n3hiO7wAdFiqUAet8vzPrcrBZStakcsblOJ07T/h1odfQu?= =?us-ascii?Q?Ngx4GDZZegQojQo2WODA9FNw5wR73aj2QqxFANGhSQ4XGHftnQIawlFYRnPF?= =?us-ascii?Q?dUh/+9KCAw1wgiiLBxBFe74whBNemC+q2ygAvMuxeaSUxj9LhcwB3Z+/vUVn?= =?us-ascii?Q?pYK6OKuH4K6XwJZnV+gabffO/BJekJW0NTcKqvzbwiv34C0XTIhZ9MmV+aPz?= =?us-ascii?Q?LgB0ECb65LGIZFWZjtUfx0EzcgljUAPOUDx+O0DC9wzaoOGYAtcqgi7mN0Va?= =?us-ascii?Q?Z4EuLdSWMbq5NhGKtzVkCt3Vv4wdDikjLuBgfjC8v0kEKoGKDKbSrePY0vZZ?= =?us-ascii?Q?4pN4kSKcu8x8tf1pvuhiX80C9MdCH5sl7cHSLkK3CLbwx+9ZICVzr9QYm6Dd?= =?us-ascii?Q?z4puiuWNC379M2Bw7ojXUHvnYEvX+PrUkJWZSLOyg/ESYOObRJtGaXbqcCDH?= =?us-ascii?Q?hu3/Nf+m1LgToK6IS4arbPlhc+Xju+zXIjSaLgPosWWGpEidjehb5beFJUQp?= =?us-ascii?Q?IpFap2flHWeQgI2Eg0Ykl8uwC07JONroP36S03p1orZ7o6SXemfhtbF/f+ss?= =?us-ascii?Q?u+WwK9u5VjW1RkxLPGKCjGX2eW1X+13kiz9NFDIs0+ENZkNUoDMtiyGUNdyT?= =?us-ascii?Q?3jJgviMS9ot4RSZmB8yXI2rfNcdnPhBTJbN0MfOZ2z3Jta7r2RNwfODbZE0v?= =?us-ascii?Q?WRITnWDY4LPUW5268WfAGuhZCU5yLOb7sCGJRB+ydtfLE1V9qDyET4DWR2x9?= =?us-ascii?Q?nNNS5qGVbkPLtW06S4BGx59rpWypptfXdJ2VV7CS9Rxm1JiSlPSkIHRGr/Jw?= =?us-ascii?Q?mgBgCJS3QG0yJA+q7Z/YgX4bcfMB+0L6Yjh32qlxTB6vvUwEk2YpYG0BjPxR?= =?us-ascii?Q?sDdmbnnaJmWOi9siXInsJOLFNulcakus22kl1zg2QMwbRWZpz94OCUnXuFSW?= =?us-ascii?Q?RX6ZLSm6SA=3D=3D?= X-OriginatorOrg: est.tech X-MS-Exchange-CrossTenant-Network-Message-Id: fb8801af-eb37-4de1-9a0a-08de90d2bea8 X-MS-Exchange-CrossTenant-AuthSource: DB8P189MB0966.EURP189.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2026 16:13:15.0671 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: d2585e63-66b9-44b6-a76e-4f4b217d97fd X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: +5y1cuFV8n2wRs+R1gt6ddeQDMn9wxrtBQ+6fY5UnbtLvzPOMieKXS3N9O3W3r03Ioa7Ac/SnJxQuDjEU1IATg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXP189MB1952 Content-Type: text/plain; charset="utf-8" From: Alex Williamson commit b7c5e64fecfa88764791679cca4786ac65de739e upstream. By linking all the device fds we provide to userspace to an address space through a new pseudo fs, we can use tools like unmap_mapping_range() to zap all vmas associated with a device. Suggested-by: Jason Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240530045236.1005864-2-alex.williamson@re= dhat.com Signed-off-by: Alex Williamson Signed-off-by: Axel Rasmussen Signed-off-by: Tugrul Kukul Acked-by: Alex Williamson --- drivers/vfio/device_cdev.c | 7 ++++++ drivers/vfio/group.c | 7 ++++++ drivers/vfio/vfio_main.c | 44 ++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 1 + 4 files changed, 59 insertions(+) diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index e75da0a70d1f8..bb1817bd4ff31 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -39,6 +39,13 @@ int vfio_device_fops_cdev_open(struct inode *inode, stru= ct file *filep) =20 filep->private_data =3D df; =20 + /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping =3D device->inode->i_mapping; + return 0; =20 err_put_registration: diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 54c3079031e16..4cd857ff0259b 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -285,6 +285,13 @@ static struct file *vfio_device_open_file(struct vfio_= device *device) */ filep->f_mode |=3D (FMODE_PREAD | FMODE_PWRITE); =20 + /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping =3D device->inode->i_mapping; + if (device->group->type =3D=3D VFIO_NO_IOMMU) dev_warn(device->dev, "vfio-noiommu device opened by user " "(%s:%d)\n", current->comm, task_pid_nr(current)); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 6dfb290c339f9..ec4fbd993bf00 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -22,8 +22,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -43,9 +45,13 @@ #define DRIVER_AUTHOR "Alex Williamson " #define DRIVER_DESC "VFIO - User Level meta-driver" =20 +#define VFIO_MAGIC 0x5646494f /* "VFIO" */ + static struct vfio { struct class *device_class; struct ida device_ida; + struct vfsmount *vfs_mount; + int fs_count; } vfio; =20 #ifdef CONFIG_VFIO_NOIOMMU @@ -186,6 +192,8 @@ static void vfio_device_release(struct device *dev) if (device->ops->release) device->ops->release(device); =20 + iput(device->inode); + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); kvfree(device); } =20 @@ -228,6 +236,34 @@ struct vfio_device *_vfio_alloc_device(size_t size, st= ruct device *dev, } EXPORT_SYMBOL_GPL(_vfio_alloc_device); =20 +static int vfio_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type vfio_fs_type =3D { + .name =3D "vfio", + .owner =3D THIS_MODULE, + .init_fs_context =3D vfio_fs_init_fs_context, + .kill_sb =3D kill_anon_super, +}; + +static struct inode *vfio_fs_inode_new(void) +{ + struct inode *inode; + int ret; + + ret =3D simple_pin_fs(&vfio_fs_type, &vfio.vfs_mount, &vfio.fs_count); + if (ret) + return ERR_PTR(ret); + + inode =3D alloc_anon_inode(vfio.vfs_mount->mnt_sb); + if (IS_ERR(inode)) + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); + + return inode; +} + /* * Initialize a vfio_device so it can be registered to vfio core. */ @@ -246,6 +282,11 @@ static int vfio_init_device(struct vfio_device *device= , struct device *dev, init_completion(&device->comp); device->dev =3D dev; device->ops =3D ops; + device->inode =3D vfio_fs_inode_new(); + if (IS_ERR(device->inode)) { + ret =3D PTR_ERR(device->inode); + goto out_inode; + } =20 if (ops->init) { ret =3D ops->init(device); @@ -260,6 +301,9 @@ static int vfio_init_device(struct vfio_device *device,= struct device *dev, return 0; =20 out_uninit: + iput(device->inode); + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); +out_inode: vfio_release_device_set(device); ida_free(&vfio.device_ida, device->index); return ret; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 5ac5f182ce0bb..514a7f9b3ef4b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -64,6 +64,7 @@ struct vfio_device { struct completion comp; struct iommufd_access *iommufd_access; void (*put_kvm)(struct kvm *kvm); + struct inode *inode; #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; u8 iommufd_attached:1; --=20 2.34.1 From nobody Sun Jun 14 11:28:32 2026 Received: from DB3PR0202CU003.outbound.protection.outlook.com (mail-northeuropeazon11010047.outbound.protection.outlook.com [52.101.84.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 880B83EF0B3; Thu, 2 Apr 2026 16:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.84.47 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146405; cv=fail; b=Gu8h/aYrq1CGutsBNxUzLx4Qk621zFCgf+tQ5j0a+dva7b8VAaSfQMH94aKGUJJMZPDqFRZPeVdSrfGdbNa1fXrtbJHggeuygudtcfHeTWsrl9txQLeybNjS7gVB12TKVOGXGQBoSvKlt7BLfwKBypLGOLrQBvKg+x9bmAYNcyU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146405; c=relaxed/simple; bh=wKjvX4A5ii3gy2R5kMntEbYOr8JYQWpmhCTx84xG590=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: Content-Type:MIME-Version; b=Yri5nuw7Ky8jvGv462dsQJCe+zRfW+CblL6snKmHxovmYzgHk1u/mJHfloynsOkM98PItV6u6xxh2iT2/mfn7gpj9YGcTGu18gCGkd66DKm+9o2hHMqQ7t7WDPRobIm20A6BHUKh28an0/nVI89kllLVh0DpUT1Ghn7f9HM+TbE= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech; spf=pass smtp.mailfrom=est.tech; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b=bs8QSxHL; arc=fail smtp.client-ip=52.101.84.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=est.tech Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b="bs8QSxHL" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DrJBCvpDwNN+gUY/w5vhsZypx2tFQGgNIauzSPjms2E02m/aSmvbuivRLDk9fe/3sPtwTIymVpxCgUKQ6pHKf/UjCrYYhWlkwy0OyD3PEWYdU2fakbmjXEr++0yRESy0vTZXrErSCOWX04YHjvg1Wm1IjpS1wdsvSy5QLk9Plm8wx6Uo3MKXi82vlhalNr18oRRrwqlD3wwyy/qGRoGLcC1I90IleKuhj+4ORaAWz5rdMpSvfNMt9jhHOTOoiBTjqTryCVeNkgaE8I4ZfQ4RFQatXiMAQYzX+eKYUmjT2VGH+baJw1s7/cPso2xSQWDOC+P5YCOM4a0WRw1crPMs2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kwwucJ8yKRf8flv1O4l7hOnMbE2qDtKIcAyEDWgAC9I=; b=Y+Qol9+4Je1VgiI+GY2i5v/Z6tAdq+DPgP5IgW0vXLBHMMsS97wEu5mRxHEgu8tx9a1zlS9R4Jy0c2YEY6TQeppuZzBV0yt9uNpVeRCzKEh59zj7t5GC7uQ60QXMZ82Gt1MsufFtQ6SaWo8N1IdOyD0Q0bFQAd8Iv5KGmBkgB5xbFpS7TxtfzxMBetlRqs8LCnzbShVUvw9oQ8a3/EQXThAhLmOybQWs4Qi7NjWvVnhs9k4d+5vqLbauE46lqqVAbXekUTit4LARUT0Zb5tUkTx671JAqigzsELSp5AfENuuplS3ElM34NxjMCpAsTG5fV+eQsnq5LFPqaokuAQPfw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=est.tech; dmarc=pass action=none header.from=est.tech; dkim=pass header.d=est.tech; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=est.tech; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kwwucJ8yKRf8flv1O4l7hOnMbE2qDtKIcAyEDWgAC9I=; b=bs8QSxHLQP0UJufUlHDJQB1sTkzBCyANjYB8H/47KKuIKTKsYBW3CYtORMEehxCxaajWjw2Hq1e+JOqOu2bb17+jgu2WETI9S0+LA7pM+JXbvqYzuGud2xORU3lqGYGmQvpuY8QI9Zk0TKcwqDb61jJUociX6JiMtho6ezQHm/FnVQGNEPHdyPttUWC7dJuJ9Clg0KuiHilHq3SifGq2HG35YULeqqJSONaTfCZop0TAAT7QEVKhx/JzFk7TpW/yHLVi5GtpeeUud3GzgrYK8CH1V078hcIvabVJL+/56Fs1pYmC8yqjapBryxN+263nCAA4r5DEw9iTbDt2sjsbjw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=est.tech; Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) by PAXP189MB1952.EURP189.PROD.OUTLOOK.COM (2603:10a6:102:28c::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Thu, 2 Apr 2026 16:13:16 +0000 Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca]) by DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca%4]) with mapi id 15.20.9769.016; Thu, 2 Apr 2026 16:13:16 +0000 From: tugrul.kukul@est.tech To: gregkh@linuxfoundation.org, sashal@kernel.org, stable@vger.kernel.org Cc: alex.williamson@redhat.com, kevin.tian@intel.com, jgg@ziepe.ca, lorenzo.stoakes@oracle.com, david@redhat.com, akpm@linux-foundation.org, mike.kravetz@oracle.com, linmiaohe@huawei.com, yi.l.liu@intel.com, axelrasmussen@google.com, leah.rumancik@gmail.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, david.nystrom@est.tech Subject: [PATCH 6.6.y 2/4] vfio/pci: Use unmap_mapping_range() Date: Thu, 2 Apr 2026 18:13:09 +0200 Message-Id: <20260402161311.63484-3-tugrul.kukul@est.tech> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260402161311.63484-1-tugrul.kukul@est.tech> References: <20260402161311.63484-1-tugrul.kukul@est.tech> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: LO4P265CA0167.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:312::12) To DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB8P189MB0966:EE_|PAXP189MB1952:EE_ X-MS-Office365-Filtering-Correlation-Id: c31d12bf-d45b-4648-3c73-08de90d2bf72 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: 8hcQXbTfPJ2cfGkUY0M3TmtvjLfqNGgVMa0xCQdndnUGLsgvHkwK/Z0WoK3KbGGxY14ukvYdN9yq9w2y/HlyjExJYtr6/GEDMbxILriAN2mscwZnXTHfNEAbKCbNvMv4At2mnqvdaBM/k5isEwCSalldkxP/k3K3UKfunJhpylMUhz8v4vk5NK1aLEOclXsWCQQuK+gfSjsmVdCAXWUCySCRznWMusqstL3BTr66iPZvoYyRRR+CLMu3fwYLAk8XTmxI+j8Y7pmgIYKPwRoa7Bws9C9QXFQNrdEQkz86dt36qikk+uGeGAbhNGmVHEtwr9cAypjlb4Z9K2Bv/4WIyjLXAYVpPJcOT/F74PgNHXJ9fIBkfLEzNOvQz6yVeCFSC7IQ1UrcfB07Hx6TlaYI1R+7K2jatc3HeaLKrNXNo08kS04QD76+Z7muMgNddhApX8sPC2s12OGF98tPD2JmyPG2A2xpg02qd9vIde4tTvF0qKxe7hGr7RqOqJDqYc7qDGlvKpVdg57PTajOiCLC7NH4Tm2sYlmwaTkVn4yhtvVuV0MSUFzYh2R4QkdudVAu/qtxqhenJJ18zUQvY9E6WdsgQQUMj2J9yxB4wJTzk44S74NYRFksH+dtSUj/km2eODQV8NOx+zkT4Bf58LSHuhJPKU75Ytb2Npp6q7XGUUICi7TYproH6qgnFfqnf156YlP0TEzy4+KQPTl9DydOhfXYA5UdnMcKbQbNfGxZ1N4= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB8P189MB0966.EURP189.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?kPoloO5x5UuoK5dadzwCDo3jRpJZbPagbqjlKf9bw19SXbdxHs+Jr4QAx4XU?= =?us-ascii?Q?pPO3n8Z8ok/a3c9aJTcfMqLh0MO9Zz9IFNuCHCoddhYK9kYzbT4PT4LFDYbJ?= =?us-ascii?Q?qlwZa/IO5Gzk8nkeoCczI1kBgfRkwJdWQJm6Q9iMmqIafcLccFTf8HdMBhoD?= =?us-ascii?Q?nFnWW7ASFZXQ3ivxnxYGIzTwXjEuJxdHmj2kjLT8Di7MT1wcQqEGqWyPHJ/A?= =?us-ascii?Q?1MZE5s7q0lNSpYWUalaLvjdU24YhWxyGFWs1VGNFhQKoAmnpAh/TXrRQdFMj?= =?us-ascii?Q?KN4fSamjGEdK5PaN0Uk9TcAs5hbSb2NEucxWej9M/GLfDHzSTbpQm/iMVBYe?= =?us-ascii?Q?ruBYuhqBbmFS51BmiSzi6/TLNnqf2tgSmfBCcop6Rd+/vmTafimNdxBTNuJ3?= =?us-ascii?Q?lCmc2SuIgIaEhF2icv+qmQ7pI8+gmjUK0VVWyWol4f6smBO24gG2o6woL6im?= =?us-ascii?Q?W9O1hX2pEZR8KTEAmt+zpqNOCHRXqs2Xc4Gsz3jmQIZjcB0DcN5CSx4bFFlG?= =?us-ascii?Q?3r07uxfN1jnOtkEjwS0kl2skJ6WYZUl7fsUgw2sKEfNbRzYT8XzMI8+Mh2+p?= =?us-ascii?Q?k4y7VxTs6jtdZwE0ddUFgdP3i1M+swR7/K9Xn7a+2G0AZm9V/40SUorJko8Z?= =?us-ascii?Q?zoy+MI9HxYeVx846eRy88chHNi1Ro9XYJv/douQdZ7UVLn2LpBRMIgBMgXcL?= =?us-ascii?Q?rdjhOxYX+SBy8qVQbmls7DagBOtSmEBq0K9LthOb/9ZQkwTTCoxXoOa+u8hS?= =?us-ascii?Q?Us1vHEI+LwoHsofBKLlOQjxURlKGNUYURnC7NY3W1APKMD/kd4XeNeE2rTS4?= =?us-ascii?Q?FYBay/lXdEdwdixDKSzf41Yvs+vFfQAghqFtZhG0vooHc+SsycG6B5KwJpN1?= =?us-ascii?Q?FtJoZ1HYr1FEutdTikgn4in617oNW+K5CQhfgsJJvSDgQr1mZakBwMiLIBzL?= =?us-ascii?Q?/LhS/boZaqBevCe+QDgo6xqajp5PVDvBYcb/NRq5XWsOgb92WNLDkmWoKhpR?= =?us-ascii?Q?ABaNMmEUeKcpZxkfuJ8dURhKl9jYZ+7DHXQyDe5CWZFTvK3C2iN1rGFTvYxp?= =?us-ascii?Q?CFNOze3i7CAs4KW4DvHBrS6IU21sNfCVvlYU9Pu7thLYb3W/LV7s8F0lepFZ?= =?us-ascii?Q?HIzorHZs6EWDxJDQeD6qew2oFZyAxKJAk+HPa8ttD8GN6FhwvfbHNrL2BiNW?= =?us-ascii?Q?zOBGK6RkBxI/BlqRNxAyV9ZxN7sfB2sxbyla4zZO49qcQEH66c6WNQcrw+qG?= =?us-ascii?Q?GlDnWigRs5yODRTkJ25Cy0jQ1zcOXQEUf2KbUNQsri1nqz8iXpzCCdVR5bnN?= =?us-ascii?Q?xvzVlPELOyv0X6uABCfgNjzuKdXhNdJ/zMRYIpAvhtEcHuLJXy5O7PdkTIxO?= =?us-ascii?Q?8VEfS+2OepaocIQ4DDcHqWYerquKfseCJ6uwfWad3O/5WAbptv+SpGuuilO0?= =?us-ascii?Q?iZo5SL1KL8C0JQtVY4OW7T36FyX1Q79b0fgl8le6URR9Pm7BhBvm9D1YBgOr?= =?us-ascii?Q?a79xtZHBudKjEtSpCP0HXh9Tgo7biqv7z76VDFi6WAUGjRPdZNqyi5wmCuBG?= =?us-ascii?Q?FedoWrYf6l7SZocGkRTGpiBtwf++hOjKAdjAKQkK9Z2W1+ce0zStkbT72OWY?= =?us-ascii?Q?9BzN7/wAAoEzJpkCB+0Pm6KymWE3pIvkU/ot7puc8OGuEcQH/Mj0u5IwqGGm?= =?us-ascii?Q?Se3xWEZnqBWTEzvw36BC7sjDdqbCIey/yoKXJMBo7LN/+wLL3k5XGaMmUXrZ?= =?us-ascii?Q?Er6qAZFMBQ=3D=3D?= X-OriginatorOrg: est.tech X-MS-Exchange-CrossTenant-Network-Message-Id: c31d12bf-d45b-4648-3c73-08de90d2bf72 X-MS-Exchange-CrossTenant-AuthSource: DB8P189MB0966.EURP189.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2026 16:13:16.3918 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: d2585e63-66b9-44b6-a76e-4f4b217d97fd X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Tpm2jbU+5AkW//bWqY1Dpl4Bbg9Cl5kzZXuD8kahg6JdC9nqBN1LxqCi+zogowwbqENvxYGNLmLB6r6ZzH5zlw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXP189MB1952 Content-Type: text/plain; charset="utf-8" From: Alex Williamson commit aac6db75a9fc2c7a6f73e152df8f15101dda38e6 upstream. With the vfio device fd tied to the address space of the pseudo fs inode, we can use the mm to track all vmas that might be mmap'ing device BARs, which removes our vma_list and all the complicated lock ordering necessary to manually zap each related vma. Note that we can no longer store the pfn in vm_pgoff if we want to use unmap_mapping_range() to zap a selective portion of the device fd corresponding to BAR mappings. This also converts our mmap fault handler to use vmf_insert_pfn() because we no longer have a vma_list to avoid the concurrency problem with io_remap_pfn_range(). The goal is to eventually use the vm_ops huge_fault handler to avoid the additional faulting overhead, but vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first. Also, Jason notes that a race exists between unmap_mapping_range() and the fops mmap callback if we were to call io_remap_pfn_range() to populate the vma on mmap. Specifically, mmap_region() does call_mmap() before it does vma_link_file() which gives a window where the vma is populated but invisible to unmap_mapping_range(). Suggested-by: Jason Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240530045236.1005864-3-alex.williamson@re= dhat.com Signed-off-by: Alex Williamson Signed-off-by: Axel Rasmussen Signed-off-by: Tugrul Kukul Acked-by: Alex Williamson --- drivers/vfio/pci/vfio_pci_core.c | 264 +++++++------------------------ include/linux/vfio_pci_core.h | 2 - 2 files changed, 55 insertions(+), 211 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 3f139360752e2..e05d6ee9d4cab 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1599,100 +1599,20 @@ ssize_t vfio_pci_core_write(struct vfio_device *co= re_vdev, const char __user *bu } EXPORT_SYMBOL_GPL(vfio_pci_core_write); =20 -/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try)= */ -static int vfio_pci_zap_and_vma_lock(struct vfio_pci_core_device *vdev, bo= ol try) +static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev) { - struct vfio_pci_mmap_vma *mmap_vma, *tmp; + struct vfio_device *core_vdev =3D &vdev->vdev; + loff_t start =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX); + loff_t end =3D VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX); + loff_t len =3D end - start; =20 - /* - * Lock ordering: - * vma_lock is nested under mmap_lock for vm_ops callback paths. - * The memory_lock semaphore is used by both code paths calling - * into this function to zap vmas and the vm_ops.fault callback - * to protect the memory enable state of the device. - * - * When zapping vmas we need to maintain the mmap_lock =3D> vma_lock - * ordering, which requires using vma_lock to walk vma_list to - * acquire an mm, then dropping vma_lock to get the mmap_lock and - * reacquiring vma_lock. This logic is derived from similar - * requirements in uverbs_user_mmap_disassociate(). - * - * mmap_lock must always be the top-level lock when it is taken. - * Therefore we can only hold the memory_lock write lock when - * vma_list is empty, as we'd need to take mmap_lock to clear - * entries. vma_list can only be guaranteed empty when holding - * vma_lock, thus memory_lock is nested under vma_lock. - * - * This enables the vm_ops.fault callback to acquire vma_lock, - * followed by memory_lock read lock, while already holding - * mmap_lock without risk of deadlock. - */ - while (1) { - struct mm_struct *mm =3D NULL; - - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) - return 0; - } else { - mutex_lock(&vdev->vma_lock); - } - while (!list_empty(&vdev->vma_list)) { - mmap_vma =3D list_first_entry(&vdev->vma_list, - struct vfio_pci_mmap_vma, - vma_next); - mm =3D mmap_vma->vma->vm_mm; - if (mmget_not_zero(mm)) - break; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - mm =3D NULL; - } - if (!mm) - return 1; - mutex_unlock(&vdev->vma_lock); - - if (try) { - if (!mmap_read_trylock(mm)) { - mmput(mm); - return 0; - } - } else { - mmap_read_lock(mm); - } - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) { - mmap_read_unlock(mm); - mmput(mm); - return 0; - } - } else { - mutex_lock(&vdev->vma_lock); - } - list_for_each_entry_safe(mmap_vma, tmp, - &vdev->vma_list, vma_next) { - struct vm_area_struct *vma =3D mmap_vma->vma; - - if (vma->vm_mm !=3D mm) - continue; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - - zap_vma_ptes(vma, vma->vm_start, - vma->vm_end - vma->vm_start); - } - mutex_unlock(&vdev->vma_lock); - mmap_read_unlock(mm); - mmput(mm); - } + unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true); } =20 void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *= vdev) { - vfio_pci_zap_and_vma_lock(vdev, false); down_write(&vdev->memory_lock); - mutex_unlock(&vdev->vma_lock); + vfio_pci_zap_bars(vdev); } =20 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev) @@ -1714,99 +1634,41 @@ void vfio_pci_memory_unlock_and_restore(struct vfio= _pci_core_device *vdev, u16 c up_write(&vdev->memory_lock); } =20 -/* Caller holds vma_lock */ -static int __vfio_pci_add_vma(struct vfio_pci_core_device *vdev, - struct vm_area_struct *vma) -{ - struct vfio_pci_mmap_vma *mmap_vma; - - mmap_vma =3D kmalloc(sizeof(*mmap_vma), GFP_KERNEL_ACCOUNT); - if (!mmap_vma) - return -ENOMEM; - - mmap_vma->vma =3D vma; - list_add(&mmap_vma->vma_next, &vdev->vma_list); - - return 0; -} - -/* - * Zap mmaps on open so that we can fault them in on access and therefore - * our vma_list only tracks mappings accessed since last zap. - */ -static void vfio_pci_mmap_open(struct vm_area_struct *vma) -{ - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); -} - -static void vfio_pci_mmap_close(struct vm_area_struct *vma) +static unsigned long vma_to_pfn(struct vm_area_struct *vma) { struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - struct vfio_pci_mmap_vma *mmap_vma; + int index =3D vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + u64 pgoff; =20 - mutex_lock(&vdev->vma_lock); - list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { - if (mmap_vma->vma =3D=3D vma) { - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - break; - } - } - mutex_unlock(&vdev->vma_lock); + pgoff =3D vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + + return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; } =20 static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; struct vfio_pci_core_device *vdev =3D vma->vm_private_data; - struct vfio_pci_mmap_vma *mmap_vma; - vm_fault_t ret =3D VM_FAULT_NOPAGE; + unsigned long pfn, pgoff =3D vmf->pgoff - vma->vm_pgoff; + vm_fault_t ret =3D VM_FAULT_SIGBUS; =20 - mutex_lock(&vdev->vma_lock); - down_read(&vdev->memory_lock); + pfn =3D vma_to_pfn(vma); =20 - /* - * Memory region cannot be accessed if the low power feature is engaged - * or memory access is disabled. - */ - if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) { - ret =3D VM_FAULT_SIGBUS; - goto up_out; - } + down_read(&vdev->memory_lock); =20 - /* - * We populate the whole vma on fault, so we need to test whether - * the vma has already been mapped, such as for concurrent faults - * to the same vma. io_remap_pfn_range() will trigger a BUG_ON if - * we ask it to fill the same range again. - */ - list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { - if (mmap_vma->vma =3D=3D vma) - goto up_out; - } + if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) + goto out_disabled; =20 - if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - vma->vm_end - vma->vm_start, - vma->vm_page_prot)) { - ret =3D VM_FAULT_SIGBUS; - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); - goto up_out; - } + ret =3D vmf_insert_pfn(vma, vmf->address, pfn + pgoff); =20 - if (__vfio_pci_add_vma(vdev, vma)) { - ret =3D VM_FAULT_OOM; - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); - } - -up_out: +out_disabled: up_read(&vdev->memory_lock); - mutex_unlock(&vdev->vma_lock); + return ret; } =20 static const struct vm_operations_struct vfio_pci_mmap_ops =3D { - .open =3D vfio_pci_mmap_open, - .close =3D vfio_pci_mmap_close, .fault =3D vfio_pci_mmap_fault, }; =20 @@ -1869,11 +1731,12 @@ int vfio_pci_core_mmap(struct vfio_device *core_vde= v, struct vm_area_struct *vma =20 vma->vm_private_data =3D vdev; vma->vm_page_prot =3D pgprot_noncached(vma->vm_page_prot); - vma->vm_pgoff =3D (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; + vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); =20 /* - * See remap_pfn_range(), called from vfio_pci_fault() but we can't - * change vm_flags within the fault handler. Set them now. + * Set vm_flags now, they should not be changed in the fault handler. + * We want the same flags and page protection (decrypted above) as + * io_remap_pfn_range() would set. */ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); vma->vm_ops =3D &vfio_pci_mmap_ops; @@ -2173,8 +2036,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_v= dev) mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); - mutex_init(&vdev->vma_lock); - INIT_LIST_HEAD(&vdev->vma_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx); @@ -2190,7 +2051,6 @@ void vfio_pci_core_release_dev(struct vfio_device *co= re_vdev) =20 mutex_destroy(&vdev->igate); mutex_destroy(&vdev->ioeventfds_lock); - mutex_destroy(&vdev->vma_lock); kfree(vdev->region); kfree(vdev->pm_save); } @@ -2468,26 +2328,15 @@ static int vfio_pci_dev_set_pm_runtime_get(struct v= fio_device_set *dev_set) return ret; } =20 -/* - * We need to get memory_lock for each device, but devices can share mmap_= lock, - * therefore we need to zap and hold the vma_lock for each device, and onl= y then - * get each memory_lock. - */ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, struct vfio_pci_group_info *groups, struct iommufd_ctx *iommufd_ctx) { - struct vfio_pci_core_device *cur_mem; - struct vfio_pci_core_device *cur_vma; - struct vfio_pci_core_device *cur; + struct vfio_pci_core_device *vdev; struct pci_dev *pdev; - bool is_mem =3D true; int ret; =20 mutex_lock(&dev_set->lock); - cur_mem =3D list_first_entry(&dev_set->device_list, - struct vfio_pci_core_device, - vdev.dev_set_list); =20 pdev =3D vfio_pci_dev_set_resettable(dev_set); if (!pdev) { @@ -2504,7 +2353,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_dev= ice_set *dev_set, if (ret) goto err_unlock; =20 - list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) { + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) { bool owned; =20 /* @@ -2528,38 +2377,38 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_d= evice_set *dev_set, * Otherwise, reset is not allowed. */ if (iommufd_ctx) { - int devid =3D vfio_iommufd_get_dev_id(&cur_vma->vdev, + int devid =3D vfio_iommufd_get_dev_id(&vdev->vdev, iommufd_ctx); =20 owned =3D (devid > 0 || devid =3D=3D -ENOENT); } else { - owned =3D vfio_dev_in_groups(&cur_vma->vdev, groups); + owned =3D vfio_dev_in_groups(&vdev->vdev, groups); } =20 if (!owned) { ret =3D -EINVAL; - goto err_undo; + break; } =20 /* - * Locking multiple devices is prone to deadlock, runaway and - * unwind if we hit contention. + * Take the memory write lock for each device and zap BAR + * mappings to prevent the user accessing the device while in + * reset. Locking multiple devices is prone to deadlock, + * runaway and unwind if we hit contention. */ - if (!vfio_pci_zap_and_vma_lock(cur_vma, true)) { + if (!down_write_trylock(&vdev->memory_lock)) { ret =3D -EBUSY; - goto err_undo; + break; } + + vfio_pci_zap_bars(vdev); } - cur_vma =3D NULL; =20 - list_for_each_entry(cur_mem, &dev_set->device_list, vdev.dev_set_list) { - if (!down_write_trylock(&cur_mem->memory_lock)) { - ret =3D -EBUSY; - goto err_undo; - } - mutex_unlock(&cur_mem->vma_lock); + if (!list_entry_is_head(vdev, + &dev_set->device_list, vdev.dev_set_list)) { + vdev =3D list_prev_entry(vdev, vdev.dev_set_list); + goto err_undo; } - cur_mem =3D NULL; =20 /* * The pci_reset_bus() will reset all the devices in the bus. @@ -2570,25 +2419,22 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_d= evice_set *dev_set, * cause the PCI config space reset without restoring the original * state (saved locally in 'vdev->pm_save'). */ - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) - vfio_pci_set_power_state(cur, PCI_D0); + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + vfio_pci_set_power_state(vdev, PCI_D0); =20 ret =3D pci_reset_bus(pdev); =20 + vdev =3D list_last_entry(&dev_set->device_list, + struct vfio_pci_core_device, vdev.dev_set_list); + err_undo: - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { - if (cur =3D=3D cur_mem) - is_mem =3D false; - if (cur =3D=3D cur_vma) - break; - if (is_mem) - up_write(&cur->memory_lock); - else - mutex_unlock(&cur->vma_lock); - } + list_for_each_entry_from_reverse(vdev, &dev_set->device_list, + vdev.dev_set_list) + up_write(&vdev->memory_lock); + + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + pm_runtime_put(&vdev->pdev->dev); =20 - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) - pm_runtime_put(&cur->pdev->dev); err_unlock: mutex_unlock(&dev_set->lock); return ret; diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 562e8754869da..4f283514a1ed6 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -93,8 +93,6 @@ struct vfio_pci_core_device { struct list_head sriov_pfs_item; struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; - struct mutex vma_lock; - struct list_head vma_list; struct rw_semaphore memory_lock; }; =20 --=20 2.34.1 From nobody Sun Jun 14 11:28:32 2026 Received: from DB3PR0202CU003.outbound.protection.outlook.com (mail-northeuropeazon11010047.outbound.protection.outlook.com [52.101.84.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6E643EE1DD; Thu, 2 Apr 2026 16:13:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.84.47 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146407; cv=fail; b=fN400nRohEO3hTM2QhLLir4BDVhc4aQLN0bw38e0wBtMO/RpS0LGfR+dyomG5NddauRT+BvEEiqNas1lic9dfcD0cKrIk1aH6//sGiTEN+S8Fuk8mnzkoMh2I2a2tFtUlUNJE+dw5/xzajc6UlEea8CAluh+ALsa+fSKKgiTaf0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146407; c=relaxed/simple; bh=oBzArlmjR4usYswws4I/x63xAxch4WVnn/1SaHxOZ0Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: Content-Type:MIME-Version; b=o9GIk7ltLCYfnJANz0lTVmpVvnk+bSsyP1InfH+Z9+axtZ/3cxTmxEaJGiDybPqq2L5sVPcaUKy0lovtwNhxrUgB6aKwh+jDw/O8sBtbpcAXlS7K/nng7x7Iqj7TijQtiW7H5hk5Uhiti6CW+eNdr0bIP/pkF/RCkQ+jZtTH1JU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech; spf=pass smtp.mailfrom=est.tech; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b=b6XH1mHR; arc=fail smtp.client-ip=52.101.84.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=est.tech Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b="b6XH1mHR" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=XrwLxfeakeOd9KshkY6AxVK/0z5TCu5sTVWDLKt0x5gV5CS7uQJtbWXcOePQEjALlaOUo+CFs1bzJRjv5iH0W2jscJaZY4RV+qHrz1uJ/kA9Isi1L3UbSB1QaXpfCGlV+ej1igrScJ4/EONRDGEg3WDZFNs0Oxoa8kSoxFSxvOk5SL1nXG58NGkn/mEogyEmD7wHsZwIJxDLADOQ6M9h/BfECBX0Lo/Bb2zBrnrddpovgDeG+Omw6AkdfCFA/2guF9aJp+IS/QNBuGQNDqw74g/yGfKf9s27/bKmj0llexxIN5pmSaKB9RZPxWW+4EpWdqSF4pDOAybjMjzXJtgc6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8ch+VTCEL45Oyf+kL2YZu6gHBiDZrlQ7lp8S0h1I544=; b=QptZVda5/Cggw5xRgq3rp2o6dBkJPDoHRR9INzSYEi11VqDc/+s7/X5CH0rRzk8l3qQ6m+Q8dKQY/v3myk6unbzc+OwFdqbkLoheNTAGMuHQdUHex2Tc7q+cInZsKQL5DjiYu6yGa5al8IncCISkDOe+t0zzAPK5z2T8JNS8iw7etHgfg109LiSh7eJbyAgcNlc7N6KaBkfJWbjJLoNMqQnfRPzqyCiimtnh0aObh/IeLPmqUl9pMCZVxRE3huYlR8Tjd0a1MAAG0LhXaoHXIwIAz2VZ9sdrYxPIFFUDiG/99z2uotzsOpqi09ssmY9vALd2rLuxInAbIucpiYF8jA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=est.tech; dmarc=pass action=none header.from=est.tech; dkim=pass header.d=est.tech; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=est.tech; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8ch+VTCEL45Oyf+kL2YZu6gHBiDZrlQ7lp8S0h1I544=; b=b6XH1mHRTNY44TR90SDTy5FNySlWdDmySCE+3BDNqwNzNavAxZ0z8mQUysb3tT+3c5gw9Zz/ITz80z7Z2JxGVqMVmD3p5ov3HZ9o4v0F3bgA9mDoLDvhEsvp3aWuty/86TVYoxw+iXTkJcgx9sfhrxnUSOQ3H3tfs17WSF7UiQO75Eg3oI9kcyjweMCpFQoAgcZ36fTU4YLxmoytHq81WE0zl3+4nb01rDChQhm61LqT72IuvZYgMxZrYc8NVUqzIHDpWobZiicbxlegIaXC68LvjNERB8cdPpQYn37JmdlyRZ9/P86Zb3//hV2m1Oiezftsiovq1REwUfnd4TTeqg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=est.tech; Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) by PAXP189MB1952.EURP189.PROD.OUTLOOK.COM (2603:10a6:102:28c::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Thu, 2 Apr 2026 16:13:17 +0000 Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca]) by DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca%4]) with mapi id 15.20.9769.016; Thu, 2 Apr 2026 16:13:17 +0000 From: tugrul.kukul@est.tech To: gregkh@linuxfoundation.org, sashal@kernel.org, stable@vger.kernel.org Cc: alex.williamson@redhat.com, kevin.tian@intel.com, jgg@ziepe.ca, lorenzo.stoakes@oracle.com, david@redhat.com, akpm@linux-foundation.org, mike.kravetz@oracle.com, linmiaohe@huawei.com, yi.l.liu@intel.com, axelrasmussen@google.com, leah.rumancik@gmail.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, david.nystrom@est.tech Subject: [PATCH 6.6.y 3/4] vfio/pci: Insert full vma on mmap'd MMIO fault Date: Thu, 2 Apr 2026 18:13:10 +0200 Message-Id: <20260402161311.63484-4-tugrul.kukul@est.tech> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260402161311.63484-1-tugrul.kukul@est.tech> References: <20260402161311.63484-1-tugrul.kukul@est.tech> Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: LO2P265CA0396.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:f::24) To DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB8P189MB0966:EE_|PAXP189MB1952:EE_ X-MS-Office365-Filtering-Correlation-Id: a18d9703-799a-4b8b-4936-08de90d2c040 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: //Hk9jMSweveNEpn9rZz7w3ujZcbo9TZU5dkDzj1Yviay1Q7rNxxR98h9NpxW0TBaGd2AvbXFWcw03cNkmMtadLK6KI3AGnQpgCUxl7DVT+oqTsP0NjwiEBKOzocSMlhj02otuQib5H8es1s9YU4pLHDzhn+sxVoIe5zZPXtrWNSKrV4zHlQ3yXtnMlHXL74U5+e8jSVOTykhhwTE0gT2IsCItQifCQqm4MnAU/KvDIocXDLr9Mg4FIxSaZKDAiwjqx+BObR6qUvJbyhE1wRwUh8HOVHfNVb9A4F1UapWZ0IWEJkgATwMjh8GjjS6BXRK7nwtb6ZzZ+nGD4M+08aXsoKxaTqUA2SkqAQJ9pzsgBF/Qp2df4pyOgjkOXFMTjYrzIcDeFzNPOgMU6LzOLMlGrELGqFS01CI4a1L9ficYr/DRsYzeBuaGlhl3B1a5b1CCKugR/beG1CvyP+cCoo89p9griZhay/k2I+ugErHlaiCs4+9BWvBEfklQEdDgs8HA17Dv+ObRDRbjR7jx3ZlRAfzRKTcPOXsE95NDyLlbz6WLyK9sNbyeOhYOvMzq+UG29pTFAwlR/GLfjuJIK85H37uoCH33MdiFBJDprgUqT+9A3Vp3ZNLYr/F6wgpYM9nSGkqiPAKhpmdT0Sd6UAIxjUKdbZ36t0j36Nfwmmg5lTrziQlsZ+PwzYmWdLzyIz X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB8P189MB0966.EURP189.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?cCMxZWjX1Z26EQ5K45jlKrl4pXURVlkIN+9YlCyjXjknNKwMmBBkxCNvjN2+?= =?us-ascii?Q?m4zJ/SN5UeAjei9W+HEOZF/KsvomMJ4ecgaVKzAcprnjpISrjxIT8MYk4YgZ?= =?us-ascii?Q?Zg/LBaqKabL8zc6H5501EouBWWIfvGcc/srUsCBsLv1AHprJOWsQZXqRn+KH?= =?us-ascii?Q?HbtbxJzEBoE2g/5xb2nedaauhT1L2LuoHNd0Yi8mAAL/RWyeQMDMzNKDEcng?= =?us-ascii?Q?w8dRiGQHt3RJkaZd98DRFM6LEEhBYHE7Y6rHlArpwqc8RRTQ/UoW+t/XciU2?= =?us-ascii?Q?ywuXNVlQwjhRpPYnIlTlMSccgEO72IjrH0402KFXJE7cnh3kjcQoZv4w+2ug?= =?us-ascii?Q?hYktJciAegTeAA5DgE3wNRz2oPgkp3PrYOp51+XuSxPHz+YZz6jL0dxaZzvf?= =?us-ascii?Q?7EGq4IkwuxgtPa/LtpphDoljKxnRPFRHMuwKaY11M7coovTDCwg4oPOxA/hg?= =?us-ascii?Q?JaCzoFqlKUCU8MaqhL7V4Rwz/FVolBwcZvM4Noml9c9wUeQDEKPl6Sx2SuQb?= =?us-ascii?Q?zP1QHx2cbszRR+Vbqn66F+Ij1cpBKX3feQnhAAkoHkPoW0HqmqUy11G2L2bG?= =?us-ascii?Q?qk0md3jrrMQZTMbUvwqhGP9BZ1WlNMSJz+NsdbepYxxdN1h4eDM0xFiOnvQ5?= =?us-ascii?Q?apruzcIQT8kQ8h3cvmnTzHckdOkZwPuPR5NwSJZoURiuFV/TRXcniJ7cUcZ4?= =?us-ascii?Q?3UHeVULiOtReEGXEKFQZDP49aKYNrpmWKXHaBn9vggyDoerpO5NR/nPh9eEb?= =?us-ascii?Q?cNjttnFT5SXarjauJgVLBZFqfiv2oNhehZJzeNCBXQXG0f5aB87HD2CRkZ/i?= =?us-ascii?Q?n1N/hQSLJ2ABb9SHxc9DW8nCBExTn9Jn+hCkSo/mXLAoyHknCSCPmwpqaXW7?= =?us-ascii?Q?j5ikkD/s8UPymrYYLnY3j3Uk6hi8mQaBvcw9gvcgAFx65xKoJ/AVnoEgQqwN?= =?us-ascii?Q?KTK09tzOzmtxkZMJqDEmZF7oBrdGHrNHQBachgE8loxGt01R6euj5a7o/LFX?= =?us-ascii?Q?++DXvb3MWj8awUW/c4J3AhbGZaD92lTAsSxBnR58fNKgA5qwjih0zTnWzbbN?= =?us-ascii?Q?RsYDjGj6FcvPCGIbLIls+LNeiEm/0d78/bBeuelCOZyCXUFtUzeimMX/ekcL?= =?us-ascii?Q?4DZBqZ4nHTXiOEUmLbsSJFZNwYiv0VoiW98P1IaNjYEyJ1sTwvRnWF6Ijpzi?= =?us-ascii?Q?0sFsxsSZgUrwwyWltcBd8YXitCCv8vCVNiaijMy/dHqnM0CyCvSQsaQihI1P?= =?us-ascii?Q?vNItNTcCDAdKIvQkq+V6Q31d/fgDyXgMppizD12JLaSDPdcUHax+gStkmeHt?= =?us-ascii?Q?zbbYxg/JkjjgwLjbGYlGluNBFPXWRs4iKon7ShSce/uJ0QUX/7Np6Mbqt4iS?= =?us-ascii?Q?tOoLWOL/YltYrPSNLg30h0vhRn2JU3MxBO5Gv8+x7XR7NZXUzSu4qdqTtxOo?= =?us-ascii?Q?tCcqmjUslniWqVPREby9GazmO81o4fJzZ4iCiOzXFxyGakRBrAZaxRLehW9i?= =?us-ascii?Q?hya0dSd8lmN7dOwxb6f1oXVr8Q03JlEatBk2jTr9lLF7Y1n1zaStSo3N0Neb?= =?us-ascii?Q?6/8cUr1TOQNC9ShaJBQMCWxup++GMNtVnipzW6x3lHcjcdF1EOyO+LmQI2Ij?= =?us-ascii?Q?yDm0e7ueaQLtd36z9JXmqfBwrs8H1QVjLQ3i5ehqcw7oIZqWZQYDb7EVMf7J?= =?us-ascii?Q?LeCAdGFZs5zjUodyEFX+ZdJPJ4ys5gLypTaQxeGhnfmPpTmVyTWMMXou5MWY?= =?us-ascii?Q?br8DzviVQA=3D=3D?= X-OriginatorOrg: est.tech X-MS-Exchange-CrossTenant-Network-Message-Id: a18d9703-799a-4b8b-4936-08de90d2c040 X-MS-Exchange-CrossTenant-AuthSource: DB8P189MB0966.EURP189.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2026 16:13:17.7986 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: d2585e63-66b9-44b6-a76e-4f4b217d97fd X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: jjdEtyMM9l95fCn86p89k/+jaeq+GMBe4xq8ekQXJwJ5huOHywqKKGYpN9/06yJxnuHNfxxqYDN9jgiRlqm94A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXP189MB1952 Content-Type: text/plain; charset="utf-8" From: Alex Williamson commit d71a989cf5d961989c273093cdff2550acdde314 upstream. In order to improve performance of typical scenarios we can try to insert the entire vma on fault. This accelerates typical cases, such as when the MMIO region is DMA mapped by QEMU. The vfio_iommu_type1 driver will fault in the entire DMA mapped range through fixup_user_fault(). In synthetic testing, this improves the time required to walk a PCI BAR mapping from userspace by roughly 1/3rd. This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain support for pfnmaps. Suggested-by: Yan Zhao Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.= com/ Reviewed-by: Yan Zhao Link: https://lore.kernel.org/r/20240607035213.2054226-1-alex.williamson@re= dhat.com Signed-off-by: Alex Williamson Signed-off-by: Axel Rasmussen Signed-off-by: Tugrul Kukul Acked-by: Alex Williamson --- drivers/vfio/pci/vfio_pci_core.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index e05d6ee9d4cab..55e28feba475e 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1651,6 +1651,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault= *vmf) struct vm_area_struct *vma =3D vmf->vma; struct vfio_pci_core_device *vdev =3D vma->vm_private_data; unsigned long pfn, pgoff =3D vmf->pgoff - vma->vm_pgoff; + unsigned long addr =3D vma->vm_start; vm_fault_t ret =3D VM_FAULT_SIGBUS; =20 pfn =3D vma_to_pfn(vma); @@ -1658,11 +1659,25 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fau= lt *vmf) down_read(&vdev->memory_lock); =20 if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) - goto out_disabled; + goto out_unlock; =20 ret =3D vmf_insert_pfn(vma, vmf->address, pfn + pgoff); + if (ret & VM_FAULT_ERROR) + goto out_unlock; =20 -out_disabled: + /* + * Pre-fault the remainder of the vma, abort further insertions and + * supress error if fault is encountered during pre-fault. + */ + for (; addr < vma->vm_end; addr +=3D PAGE_SIZE, pfn++) { + if (addr =3D=3D vmf->address) + continue; + + if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR) + break; + } + +out_unlock: up_read(&vdev->memory_lock); =20 return ret; --=20 2.34.1 From nobody Sun Jun 14 11:28:32 2026 Received: from DB3PR0202CU003.outbound.protection.outlook.com (mail-northeuropeazon11010047.outbound.protection.outlook.com [52.101.84.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8DD143EE1F8; Thu, 2 Apr 2026 16:13:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.84.47 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146409; cv=fail; b=NBFZHC9DWKYeeJiGb7ydsjZtvp7Aksce0d0hWDoTRsJ7FWG1oQlZlylKcwnzEJcXYuIPgLS26NP8JVjQpzntCe2ezUNzCHMlA9AhTI+XKMAg+hAJkIkOoT9IiH150A9FxxFph7P7x3t62o97Rxp7k2OKlm1zFLq0Mcx78mZu2mA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775146409; c=relaxed/simple; bh=9RDPJUb0S+jQHNEhRef7VeiFmmcUIv4Ruyzn6SnHdww=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: Content-Type:MIME-Version; b=bDAslWf2LvXISCMcq0GoivhLE9ZhTAMQNyU9nQrHD5aBM/zyEdZn6p7pJMrI3JCuOMsQaEOEXkr270YSmnIjTtRXhrkfrZYo46R4P9f4Mon46zftAhdlCT5ckYypN4VnZIlRSyULcrRvInYeF5PCKCkPuzIN6Br3SPq0NefD5SU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech; spf=pass smtp.mailfrom=est.tech; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b=RyBnpRRn; arc=fail smtp.client-ip=52.101.84.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=est.tech Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=est.tech Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=est.tech header.i=@est.tech header.b="RyBnpRRn" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=zGVjIc/2gM0k9eYP3F2/SPehFBZBg9MK1NT+lgoHxmh0glbiKMnZ2nycU4Gx40R35S4RHEma1vFRGfyCYKNKNafMH8VxH5pRZ//btRj7Q1jL5x+t8mvijIKBqGjtprV82gKCOkyjKef9H7yhhp7VMzAiWxYtVOkliQ+jV3D3j9Kpe+p/bQ3lcfiG9JOEebdBJgZnW4iC2LbLEIJ7vKDcLwrum2FpOx4E7pdsy9moD4Nn5pBD9YNnbcszo6C1z0hAGVMOaIAzp6uNDST/sFJe/lgUkNccbw14UeDlh8L8u7lilW6N+LTTFIO0uFkGcLe78PTOvdcuJDuyEpCGwifx2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=y4xCsxJ0MBE7GBrBWV8HPVo7BkpNhA2dCnSrsm/3FkY=; b=razHzH66usdtFF7u115UIiUt01+O0Z8Sw/a+PpT3ugfvrWGgUTB9vlpVb89nNgmp0boanEofXeyfwOAZpQJYiEGE2bO0xZCES3BUEMHKgosVZ2NcEzsQf3l8ygB3d7YFNUs6H1/AwA2FDpgYFxiNTa0nRzr8XKO4S7Sx+mtC+J4CRRtY2edmwbIkeSEwtOz0yifr6smnWHNIXZnHtLX7Kzff1Y8zz3rs/xH1TkCyMX1D9cJE1HIaviFtL8HjW7u622tKJK/drEf/k9VEOpeClQv9hLwp1aR4XzsDNYcYD/31vJzCs5plf6tnDGNpMwHhX/7kOpm9JglLo7beMbsz2g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=est.tech; dmarc=pass action=none header.from=est.tech; dkim=pass header.d=est.tech; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=est.tech; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=y4xCsxJ0MBE7GBrBWV8HPVo7BkpNhA2dCnSrsm/3FkY=; b=RyBnpRRn7JLgbhi33e23En/5z8zslGi8pFBdu4gRhUevYGRWI2aPn920xiL31zSR/ycimaEoagcL3uSTcClOmy2m+LfjhUEOqiFkQl1w5JySBgpu2qVj1VFXyDONxyPZYSnl6VG87CCZ0GvVksAfl8oCgGSry3SEtyoUlJuH9wSfX0o4vh3YT7yI7bG3DizVQUtHNv8DC7wyO7lXo1/xIk6c6hUO3C/KUKJ47dUyP/v69cT2QiaWVXf+DXGN38CuSjIqD3llZWahC48B+BMi035kpt0JQy9OPE6KpCY3zjBEV2jrcsMfjsycOGB2LGXcD8cVsVDNHEH4XmUQF0bwkw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=est.tech; Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) by PAXP189MB1952.EURP189.PROD.OUTLOOK.COM (2603:10a6:102:28c::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Thu, 2 Apr 2026 16:13:20 +0000 Received: from DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca]) by DB8P189MB0966.EURP189.PROD.OUTLOOK.COM ([fe80::48c:33b2:d870:d0ca%4]) with mapi id 15.20.9769.016; Thu, 2 Apr 2026 16:13:19 +0000 From: tugrul.kukul@est.tech To: gregkh@linuxfoundation.org, sashal@kernel.org, stable@vger.kernel.org Cc: alex.williamson@redhat.com, kevin.tian@intel.com, jgg@ziepe.ca, lorenzo.stoakes@oracle.com, david@redhat.com, akpm@linux-foundation.org, mike.kravetz@oracle.com, linmiaohe@huawei.com, yi.l.liu@intel.com, axelrasmussen@google.com, leah.rumancik@gmail.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, david.nystrom@est.tech Subject: [PATCH 6.6.y 4/4] fork: defer linking file vma until vma is fully initialized Date: Thu, 2 Apr 2026 18:13:11 +0200 Message-Id: <20260402161311.63484-5-tugrul.kukul@est.tech> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20260402161311.63484-1-tugrul.kukul@est.tech> References: <20260402161311.63484-1-tugrul.kukul@est.tech> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: LO4P265CA0155.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2c7::7) To DB8P189MB0966.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:16b::8) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB8P189MB0966:EE_|PAXP189MB1952:EE_ X-MS-Office365-Filtering-Correlation-Id: 66476987-2efc-475b-df29-08de90d2c125 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: I77YrM9BmaTb9CvioHQieFnzGBkb2J5AkaOlCKz56sOTftLXQQGqIvlf5f0GwdCKd1Yj1gua6kVj9epa2EukmJac355qYoNF1ejQk1tUqtChnqdNCf2c4mr06SX0dyXrRXtb2LAVutB+jCGV/smqIAUeqphXMfIbWnp/A7lHBbo5yLb15zhth3NSjTKgiMz2biEGO0N3gp4UXxgBVR6+0BBYwDuqNgNDVNPpLqW9v6gZCeJDZMXb5Nc3buMsz61Am36GyBymfzXI11YDNpEpuJbT8d26JgJLpalfihWhR7YAWy162qbuZaxFRwZqFDHqfAd1EgQAkB/RQYujRa10FCnoELzpxMHrMDtcF5Bc3m/H3gFqrTvRWYwKLZLKHMzkW/y0mXzSTbNQM/tVPSZ664tqtFqwUOtA1WJZrh9Oki5Au7p5FSMzHw7MkfAM55+GS5GSr70htxRoDyjV+N55wdIcIaeoPtoVp7VjCr+lpLUs1yGWA3cMqCeAgE4zy4ceTZr9AvfjVVsCxWat7SkeYB0pCR7ro5Ku930oLoGeutEJPZ5qkZS/X9ww5eFjXH94dvXk/dV0QCdJKrH/BUVgg0yHPcmQFkPkcJ1kh8pmZ0DdMvXVaSG0KX3P7rXOD2cGMK8/6vlM24gS/TWdZaXiPoCbePlgNWI0xXQqY6Jd3t1o4OSClvSrzoExPlLkz2DauEZqQ/IzmVivRPmcqjcbwdR9nij3dG+wI/O5U2QBwos= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DB8P189MB0966.EURP189.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?NDBZdWZHQlB2OU41TjhDSzN3dnkyazU0T0Q1bjB0NHpDbVh0ZXYzYmo2VFBa?= =?utf-8?B?aklBWldVMFNDb0o2U0ducFhIU3dsY1N2SDZGcVdFblNlT0U1RFF3R1BYY2Jj?= =?utf-8?B?S2JKWGR2WTRmemw1S1JHRk5DWWRZY21jSlQxTG9sajlEbGh4djNjaWhpeXhR?= =?utf-8?B?bW9TOFU2eXdPSVVqNnVPMFZkQ2R2VXFiWjNWRUhNZkU4L0EwUTROOFpKajdi?= =?utf-8?B?Qjc0aW9oT0N3UWFQdGtpazJMOHFKNHVuMmZLTzVaQk92TWxYcXBLY05BczJq?= =?utf-8?B?bVZTK1VIeHI2N2dXNHFlRTlKY0llcU1aVUt5TmV1bGtWSUNJazFKVFNicWZz?= =?utf-8?B?L1E0THZLZWgrY0tOaDQ1OEw0YnA2UFhjL2d2eFpWNnRIYTRxODhRYXBzN0Zn?= =?utf-8?B?NWMxaVZlNUU3NGVvMXcvWC92Rndjc0VuRDBBRTdhc1JJQXgxZXFoMUcxeWww?= =?utf-8?B?MWQyQjhDVWdkQ2Z0eGwxVFlveXNUcUwzNkZ5dDQyNEd0SU9NcnNQSGN4aW5p?= =?utf-8?B?bnM3WnFTaXJ6bkZjZWc3ZmwyRExvWVRhVUhsL3RMT1BBZmFiL1BKS3J2eDRN?= =?utf-8?B?UXErUTlTdzBsNzRPMEIxVkk5c3g4QklUWU95Y1oxTnFSbHhlZGNLN25jS2VS?= =?utf-8?B?TjdoVTdhYW9FTEYrNDQzV2IxVnl1UzhDaHhwK2ZCY2t2cm4xUHFPcTNuTzJ4?= =?utf-8?B?cWhzdFJidDVnaXo5ZzNtU083dzk5M0hzVXRxazRMRFNHMEVEdlhRT1lTTk9F?= =?utf-8?B?YlNsNlYrWFpkeUdyOUZzd2orWlYzNlRnR1UxelI3RFBpRHIwYURxOXBwdHd0?= =?utf-8?B?eDdwSWI4cDdMdFMyeUhjOHZFRm1mR1NFbEZuQ2c2S0pUMFVaZzFSNmdhQmxV?= =?utf-8?B?OXkvUjVqVWdTb3N1K2F5eDFQZnhycEk2d0xKYmtSdGRTcW95SFdHMUJTOVR1?= =?utf-8?B?eGNRWG51clpGblRGN1ROWnFmQVVkMWFXblpocUtVblZFb2I5UUlzekFJUStv?= =?utf-8?B?cVdtanUrRURIU3dzSjc5eHc5Vk15aVRFaXB4cURxdzVmaXF2U1phcXJvZy93?= =?utf-8?B?Sm1jVURvTm5Jd3daU20xOXpHZjUyeWtkZ093WnQ0dHd1SVVQSnIyR01uempB?= =?utf-8?B?bWNFVFFOMWRXVEFnaUdFaG13Z1NyQTJrMzJoOVVYT2d5bDJrWTNCL2xMdWF6?= =?utf-8?B?bDNaek45RnVtRmhHTHJ2UVFBMTFhQXRiL09saklBa1U0ZWZLQWRnQ252QTk2?= =?utf-8?B?RWtOdTBsVnEwK2Y4OGdkZ0oyc0dpL0xvRUt5ZEJSNUVLMHcrRWNXNjgvTjhn?= =?utf-8?B?VDF1NHJTN05YdDVVaGJCTStIb0VJbmp5Z1FnM21ZNmhUc0VXaU01VDEvcXQy?= =?utf-8?B?YWNSdVhFZEhHWlF1REFUaXpDRUhwZzVJdzBRY2xZVk5HTG11b2hpb1NzUk1o?= =?utf-8?B?Zkl3MTRzODc2bEVOR1k1TUNMcHZGcWtteWpmUDBUQk8yckxsRGNlcVpBQ1VV?= =?utf-8?B?RXAwai83cGM3T3ViTVN6TDU0anRHd1dUbnY5RE9XQTdkeWEwSVppMVNVVEhh?= =?utf-8?B?ODFxK0E3RDM5REY5OFpYQXpQa244c3JrcE9WVEpCR2xlV2NjcjgxR3YvYkdF?= =?utf-8?B?Zm1TNk1UbVhpc01nYnhVZzdvcWQycVZrKzBTZW40bVpPTVpYZ1FEY1lxeXh6?= =?utf-8?B?Nm55Tlo0WTQxS1pJWFVhdzlwa2Vzbk4rTlo0MEdjUXRhNjJNaitkbWRLMnlI?= =?utf-8?B?U2NnNEJFSUVBcStNdm5jeVBnaWtnVkd4dUh2U0RBOFI0YXFvczZZQndjTUtI?= =?utf-8?B?L3NicXd6RHpEVmdJZlkyVXNnWG02ZmdmSkgrSlh6dWtnTitOcjBWNDRQZUh4?= =?utf-8?B?WnhmLzZOWGZJWFZhRFN0SDlDS0UvVXo5akR5VW9iOTZJK3E5UXlwQUYyaC9t?= =?utf-8?B?Y3pIOGJiMm5QU0Q0a05rRk11V04zcEk3QTZGVkJHbURQMGNWZVJubmpjMk1L?= =?utf-8?B?aWd2QkZFbE41Rm94MDZJT25JTzdRZW94Z3ZwMmZZZldMcWZKcTIvdDV6T3ZW?= =?utf-8?B?YWhSQnZTMlBVNXc0S3RJZ051Mk1RZm1aT1JaVFRqRG1mTElGK3Z0Q0lGUTNC?= =?utf-8?B?dm5jVFBhUy9NRERsVERHdXd0ODFaTk1uSXVnTzVnNjlDR2wyUjVvbm9ueDIx?= =?utf-8?B?cStkUndpeEt3QVB6TWRPYVRna3YxakdQWTcvR3VBcXhHTDZVZ2xBUm9yZkk2?= =?utf-8?B?b0NhWmQ5aG4vWUNzMlJQeEdydVl4eXpBTjVISXNmUnY2UVI0RTJuYXEwOVBG?= =?utf-8?B?a0RyS0VHUlYwVElzTlZnVzdDQUFCcGNGTXZ0NTlkZHU0amV2TTRRUT09?= X-OriginatorOrg: est.tech X-MS-Exchange-CrossTenant-Network-Message-Id: 66476987-2efc-475b-df29-08de90d2c125 X-MS-Exchange-CrossTenant-AuthSource: DB8P189MB0966.EURP189.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Apr 2026 16:13:19.2535 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: d2585e63-66b9-44b6-a76e-4f4b217d97fd X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: snTAJxq/tCpOy77LLgr+zln1AFmJJTUkcZdovl2meRGFN6RhPSaFtIPAgPoX5UIqsp3+5fYXJEu38JUJhY6R1A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXP189MB1952 From: Miaohe Lin [ Upstream commit 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 ] Thorvald reported a WARNING [1]. And the root cause is below race: CPU 1 CPU 2 fork hugetlbfs_fallocate dup_mmap hugetlbfs_punch_hole i_mmap_lock_write(mapping); vma_interval_tree_insert_after -- Child vma is visible through i_mmap tr= ee. i_mmap_unlock_write(mapping); hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem! i_mmap_lock_write(mapping); hugetlb_vmdelete_list vma_interval_tree_foreach hugetlb_vma_trylock_write -- Vma_lock is cleared. tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem! hugetlb_vma_unlock_write -- Vma_lock is assigned!!! i_mmap_unlock_write(mapping); hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside i_mmap_rwsem lock while vma lock can be used in the same time. Fix this by deferring linking file vma until vma is fully initialized. Those vmas should be initialized first before they can be used. [tk: Adapted to 6.6 stable where vma_iter_bulk_store() can fail (unlike mainline which uses __mt_dup() for pre-allocation). Preserved error handling via goto fail_nomem_vmi_store. Previous backport (cec11fa2eb512) was reverted (dd782da470761) due to xfstests failures.] Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.c= om Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing") Signed-off-by: Miaohe Lin Reported-by: Thorvald Natvig Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@re= volver/T/ [1] Reviewed-by: Jane Chu Cc: Christian Brauner Cc: Heiko Carstens Cc: Kent Overstreet Cc: Liam R. Howlett Cc: Mateusz Guzik Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Muchun Song Cc: Oleg Nesterov Cc: Peng Zhang Cc: Tycho Andersen Cc: Signed-off-by: Andrew Morton Assisted-by: Claude:claude-opus-4.6 Suggested-by: David Nystr=C3=B6m Signed-off-by: Tugrul Kukul Acked-by: Alex Williamson --- kernel/fork.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index ce6f6e1e39057..5b60692b1a4ea 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -733,6 +733,21 @@ static __latent_entropy int dup_mmap(struct mm_struct = *mm, } else if (anon_vma_fork(tmp, mpnt)) goto fail_nomem_anon_vma_fork; vm_flags_clear(tmp, VM_LOCKED_MASK); + /* + * Copy/update hugetlb private vma information. + */ + if (is_vm_hugetlb_page(tmp)) + hugetlb_dup_vma_private(tmp); + + /* Link the vma into the MT */ + if (vma_iter_bulk_store(&vmi, tmp)) + goto fail_nomem_vmi_store; + + mm->map_count++; + + if (tmp->vm_ops && tmp->vm_ops->open) + tmp->vm_ops->open(tmp); + file =3D tmp->vm_file; if (file) { struct address_space *mapping =3D file->f_mapping; @@ -749,23 +764,9 @@ static __latent_entropy int dup_mmap(struct mm_struct = *mm, i_mmap_unlock_write(mapping); } =20 - /* - * Copy/update hugetlb private vma information. - */ - if (is_vm_hugetlb_page(tmp)) - hugetlb_dup_vma_private(tmp); - - /* Link the vma into the MT */ - if (vma_iter_bulk_store(&vmi, tmp)) - goto fail_nomem_vmi_store; - - mm->map_count++; if (!(tmp->vm_flags & VM_WIPEONFORK)) retval =3D copy_page_range(tmp, mpnt); =20 - if (tmp->vm_ops && tmp->vm_ops->open) - tmp->vm_ops->open(tmp); - if (retval) goto loop_out; } --=20 2.34.1