From nobody Wed Apr 1 20:37:31 2026 Received: from PH7PR06CU001.outbound.protection.outlook.com (mail-westus3azon11010017.outbound.protection.outlook.com [52.101.201.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42195334C33; Wed, 1 Apr 2026 14:40:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.201.17 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054412; cv=fail; b=KwSRqAaVfaeKbrmM3EX9GTz+vAUi9OHxIVu6SEX8n/LXO6PcuV/UZazCxP1WQjKsgEQBZnIzS3j5a1BnCwYVSAPGOXrm8MfYdF7xElI1Z+J7mCQrdKF4kf0BLA/t7j4DueZpRK7kvQKpWr+DAOVffWpsSXyE/sBbRs4RdAcWDZ8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054412; c=relaxed/simple; bh=mLITaaSF/hQEef0DYZDw4Z3svxbDWJfm5lv7Nzj4wGI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sbZsVv4S+V9Sjc8r3Yh9L7ZRn61JmSFsl0+uMXjH2CTuS5WcCwbn5zQXqtkKgtxS/hC6BAovoqfmyPyYsLpcCqEAHTmxTRYlAmdOG6Oraw3CpEBS0S4lEpLfrfFmXMloH0vc1MMW3+IzL4bA5LTOMyh4eMlA5y1FEACaiqnwmZk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=MqgRsil5; arc=fail smtp.client-ip=52.101.201.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="MqgRsil5" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=o0K2DddsO//24Dn1LEssv1r5kK4MaJzeSyp/AeGRX+nzB6VnxcBDyDL3xYlN7TBr8Twgsye4KRLGcaK1+zzJX39tqCM3To/2yd4JU4wCbgHS2mnviBza1rhX7A4rFJiFwPsphJtLrmcVYyUc0Ob3HvfKzg6Suo8SIIh8ae46QkeFhkNoK2ShEqWzVFeoJcZ5f5J8j3lYGkya9/PFq0HpDu34K3dFqhr9RhoHxvb/LhEDvh/HX7fCENhppl1V/+f/uZ7i1fO1dfBw5ILGTgoAOFtVcbnmzXVNmwpbKO7TfV403YxmCNp93IRWiRu3dSqmtGXHPaFDlkyTu5jxop+UHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jkRf7wEtNWI768/BsZFgMUeT3a2fRvjzSz9z+MtPQ+c=; b=JQqID6mj+KJYGOU2ZKbketVq/AFWJ6o8yhYKgFIKr5J/skuDnKSSk/AxLFcKQWaoD+cr3UgMWXpfJ8xDubSbNN9jozgNqp7CClklQSnDcLU9F6TSw59dn/uqir3NhatvAJYdzOqpVNTaYqQDS4ZbPiEPhsavLCyUpHGDPaJDgnY4jC6n2H1T/z0+Fhw6xbmRdCybR9XR+I2jvvt00jV/zigtssymaU5VjfSW5y+HiQEwrYlWXrjktv13fW8pJro2S+o0MIJrpo9DfutgFlarVFujJFskxNvmh2cBU8eYvlaAOeI8I6Nl454524cok2+fxLwE5riodzi3Thz3z/4bPA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jkRf7wEtNWI768/BsZFgMUeT3a2fRvjzSz9z+MtPQ+c=; b=MqgRsil5JNVSyV5kNIeG3w2QaXxB+8nH779WWzGKa4jEYuao0lwUyz/qh6bpvNQq8RawZxTkzOTZc4JomQzfblAekJ9c4itWpgMUO0nqZA5SHCLImbgkM7r6XyWidZN8mQocuV5lwvq1dLWffnRT3/Hspzn+EhJ0AdcsGCNsLRhUMfuVqjQLc+1eq7O9n5cbydSMVmrF5WKRYXF6NFXqpUsG5iupK8djwnX28NpCjeI/Gm1hwNFCaE8K1YXycyrbazcSPGamHkfxaVaeKimctVW1CzHpVUdw48LBrzSZqmv4mfHiKqb7HL0g4STO/mMpf2BjSg6x73zAiapLvHwvew== Received: from SJ0PR05CA0149.namprd05.prod.outlook.com (2603:10b6:a03:33d::34) by LV8PR12MB9617.namprd12.prod.outlook.com (2603:10b6:408:2a0::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:40:02 +0000 Received: from SJ5PEPF000001D2.namprd05.prod.outlook.com (2603:10b6:a03:33d:cafe::b2) by SJ0PR05CA0149.outlook.office365.com (2603:10b6:a03:33d::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9769.15 via Frontend Transport; Wed, 1 Apr 2026 14:40:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D2.mail.protection.outlook.com (10.167.242.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:02 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:39:47 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:39:40 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 01/20] cxl: Add cxl_get_hdm_info() for HDM decoder metadata Date: Wed, 1 Apr 2026 20:08:58 +0530 Message-ID: <20260401143917.108413-2-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D2:EE_|LV8PR12MB9617:EE_ X-MS-Office365-Filtering-Correlation-Id: 067925f1-ccb7-4f6a-cd40-08de8ffc8ee6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|7416014|1800799024|36860700016|921020|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: IEoMTu+JVJxc+lsiqIp/1+zqei3PadEtH0O8J+oM5J8sbLHEkVXhhVjKAO65T1ooIhpUUkU4qP9W3513daaU+HihJbLYXsatoM3L2JBKaBrP1yQZTrXl+b+thqlAzWEVbN6M1yZ3H4Z1j9i5oMBaupLtmoH9+h+UFzmDqZj+q9Yn4vUzVN7lWIcmwfMl993gSxY/izmT0HNrh00G5yyT5odCSRE9ZM3uFwSdkTveQWWP90jtBcCTYpRxDcV+Zk4MVMFzsVoxaO1ZqFl+E5xO/o9Z9lz+ie8ybXlsItExp/JnN1MfeaEkKWFw9T4Yxs7JyAM2ZQidpcqS+nj7MPoNWTBJJxamlX6dWsP5nAbB9WHu2tU2METoJ6kG35M7hI+/Kstz918ajE1PxWZUfHrnsS5pT6YVPR0vCgGuMBExZIFoHXMIphlHQYxTdhRVxqBQ9HWzlAE/f5wm5kEUfEOatzH7bxWjBE118qH2VoGfmTnzwtguEmgX0WPZ1ServMAUCSO0OhRamf1AL+dFzHP7h+xyRjsFVxLUq/CEL5xnZKRlY7ojFNtVD4QTXhMOg7vd7X+MDyxwwuBluh7UztIWFtW7dwbsWdvt6ihXSmoqJEPl9ByLO5lyk8dj2A5T09GJxtATW0LRXUT+oZDH+GX69scdUp0q3+fA4CjGvmYBnko0OiCqv3CRwpNBAqLUCap1jF2O7I+5OLRb/cxPu0cSi3XDmhb/toSBdBRi2Mt4nvny1QvZvjbrtF5Hj6gnR3boowdB5ymlar2VRepAf57+eR5Ar4qaA3dFWfj2GMYDRxjgXns/2ATkJyI9YhRlqrzn X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(376014)(7416014)(1800799024)(36860700016)(921020)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: kTIHs4Mwhnefzy76luhyMoplRH7nSRMLyIvqJ3furJrlNIW/YsQpBjA1iUO+G/y6IP8qdXGms8MYjZgkDoFPFinURsw+BEWzAxfxNg4Lv6YCqsMPd2M6RjZL5guX4mjIgeVWvsjmDGvcNk3ICVJi79VrDigSKNKhIgXd+XILvH43/3pS11U9aZ2E2l7IrOZcXPI+Xj87RgzfYHnQa3TLV66T7vay75Y/Cu+s0Hce5Hgv5ua0wtwKr2AkT6bB62v1W+cfabL9mKtoLGuDzj2C5Z8GZrN8Rv4UX1ZUaz+hcunbluVbi4rnKNUPutwZ9zzrpt/rQc/T0HwFnYViAX6JMNC3Ff3iP+ESVpBlZS5QeLb09XWaYj5cMJwIrkRLaVOk21gs8/THd/Vj0qNhberGuXOjdhzj2J5ga03DREYoZkq/mq3RLd5WdNv3LpZUP3JK X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:02.3926 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 067925f1-ccb7-4f6a-cd40-08de8ffc8ee6 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D2.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9617 Content-Type: text/plain; charset="utf-8" From: Manish Honap cxl_probe_component_regs() finds the HDM decoder block during device probe and caches its location, but does not record the decoder count and does not expose the result outside drivers/cxl/. vfio-cxl needs the decoder count and the byte offset and size of the HDM block without re-running the probe sequence. Record decoder_cnt in rmap->count when parsing the HDM capability in cxl_probe_component_regs(), extend struct cxl_reg_map with a count member, and add cxl_get_hdm_info() to return offset, size, and count from the cached map. Export under the CXL namespace; stub to -EOPNOTSUPP when CONFIG_CXL_BUS is off. Co-developed-by: Zhi Wang Signed-off-by: Zhi Wang Signed-off-by: Manish Honap --- drivers/cxl/core/pci.c | 29 +++++++++++++++++++++++++++++ drivers/cxl/core/regs.c | 1 + include/cxl/cxl.h | 16 ++++++++++++++++ 3 files changed, 46 insertions(+) diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index ba2d393c540a..a5147602f91f 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -449,6 +449,35 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, s= truct cxl_hdm *cxlhdm, } EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, "CXL"); =20 +/** + * cxl_get_hdm_info - Get HDM decoder register block location and count + * @cxlds: CXL device state (must have component regs enumerated via + * cxl_probe_component_regs()) + * @count: number of HDM decoders in the block (from HDM Capability bits = [3:0]) + * @offset: byte offset of HDM decoder block within the component register= BAR + * @size: size in bytes of the HDM decoder block + * + * Return: 0 on success. -ENODEV if the HDM decoder block is not present. + */ +int cxl_get_hdm_info(struct cxl_dev_state *cxlds, u8 *count, + resource_size_t *offset, resource_size_t *size) +{ + struct cxl_reg_map *hdm =3D &cxlds->reg_map.component_map.hdm_decoder; + + if (WARN_ON(!count || !offset || !size)) + return -EINVAL; + + if (!hdm->valid) + return -ENODEV; + + *count =3D hdm->count; + *offset =3D hdm->offset; + *size =3D hdm->size; + + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_get_hdm_info, "CXL"); + #define CXL_DOE_TABLE_ACCESS_REQ_CODE 0x000000ff #define CXL_DOE_TABLE_ACCESS_REQ_CODE_READ 0 #define CXL_DOE_TABLE_ACCESS_TABLE_TYPE 0x0000ff00 diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c index 20c2d9fbcfe7..e828df0629d0 100644 --- a/drivers/cxl/core/regs.c +++ b/drivers/cxl/core/regs.c @@ -85,6 +85,7 @@ void cxl_probe_component_regs(struct device *dev, void __= iomem *base, decoder_cnt =3D cxl_hdm_decoder_count(hdr); length =3D 0x20 * decoder_cnt + 0x10; rmap =3D &map->hdm_decoder; + rmap->count =3D decoder_cnt; break; } case CXL_CM_CAP_CAP_ID_RAS: diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h index 50acbd13bcf8..d86faebb99b7 100644 --- a/include/cxl/cxl.h +++ b/include/cxl/cxl.h @@ -80,6 +80,7 @@ struct cxl_reg_map { int id; unsigned long offset; unsigned long size; + u8 count; }; =20 struct cxl_component_reg_map { @@ -284,4 +285,19 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder **cxled, int ways); + +#ifdef CONFIG_CXL_BUS + +int cxl_get_hdm_info(struct cxl_dev_state *cxlds, u8 *count, + resource_size_t *offset, resource_size_t *size); + +#else + +static inline +int cxl_get_hdm_info(struct cxl_dev_state *cxlds, u8 *count, + resource_size_t *offset, resource_size_t *size) +{ return -EOPNOTSUPP; } + +#endif /* CONFIG_CXL_BUS */ + #endif /* __CXL_CXL_H__ */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazon11013044.outbound.protection.outlook.com [40.107.201.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03D2C46AEC6; Wed, 1 Apr 2026 14:40:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.201.44 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054423; cv=fail; b=sGJHUr3ulvqBZrg6EoErpRw0Uinr/gLjfTiS64G7st5dCO6Nb6OQDA/LTqRbIq4VqEChrJP6SbPavNIJBdNfmB5qLMzhqZ0HEpr1La0xtsVzrww9o1o03fSxGmeYBAvh9MM/1zVCWGi20Vhkz5HtcsfUBGr8kgu9p5Poc3omc7c= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054423; c=relaxed/simple; bh=CM3Omvs4kPJIEeZpVHQjTd2p5V/VJ/5vM5LDlIguEIU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CS1vazP2VCbtwagHJiy+CqnwW6101mvWTzc1I8c2lFOoZnSuHLudNJNNoh3MJIFXV42VV5j0Kv3w74JIW51/JB42wf0w2x7bb1haD34zS9CLJQJ2ewfwaFK1qXNY+6W0pL7VKOcOIwbWlZqFSUkfceKb0vJkYYQG+bv9GajHOgY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=VNHSVYJo; arc=fail smtp.client-ip=40.107.201.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="VNHSVYJo" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=iNGzFb88s9YMU2qbmmENp6ymip+Y7N2/cu9q3WteW94WHZNJjoX+C0dM5zftRRAQtnmks86LcuyvX3bRRoLH+TrArlQ2jVIwfszNNwLMBOfhv3f5Mh00UsLSUlFDXcFNTqgPO1rbe5CFM5qiRHT5pEbwnnlERcDoFhD6cFhhD7Fty0GNoWjPZ4/vG06Vhii3ys7kHGyKDcH9jiYTJxsZtWaotLw80Oi1N1mia0ci55O7HRZhVW788kIhdUTfVzEaEcb8KGN6dvdf5MBZf9BmDm90y/TfwSvVLhIpDRpW8Rpwo5twy4zYyBg1Wp4e/uyVPSCW/lu9sD+792irMH4LFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KOhoKZWIQIV3EJwsde2rjAKTmFymKCOodHc98qR7CBQ=; b=fLKle9xJs50gSMe+hGqE6QezdwOjXQgPmX/mWOMq5TsufV6+5Np3hFAMaX7tmrorjr0a+qbBsg+4mDWceiprJ09ou7n1ag9+EXeQnSJvkSEo/UwTF3XJwlba/fpD/APciyAx/2VzUFYdjkzzhcBTsSx3zdfReMIP4QZizbuP53OQnfyU5spYUpxkWyCg336T/wZcn5vE8jiaUD+iM17LqrhSLVPGArTVAO7/etDCBAjR3eloH6fP9XwydATvFEizTN7sNBSV/D+A8gayfRwhMHCqnPo7tdOkMcmr/bUV1MjBs818HGNX603ccYVWHtj+E/007JoksoLJ+fpEiyz8+A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KOhoKZWIQIV3EJwsde2rjAKTmFymKCOodHc98qR7CBQ=; b=VNHSVYJoJv3oNdQUfcfx0CyVCx0oIAFhDXZLzZu0KKYhHY3yZhowkcibUv9qzd43Y+4hoELig1RMdifqviIwXI4bH701y0mzg/Mt5tcbkYWTpT/SwiRGUriYeAjwjgNPrjqciEJ3Buh7qOSuAJa6C+Msd4Ku7Kr1fEgJvec+gQqxjME66EyZlKqH+hYZJBJV1rCwDieTu4JAp6JDN1mEAj+CU6ZtHZe1rd5kvoLfIrvNDKY7Y6WP13EOqghGOgafYaR61kJq4xvN92/DfgMN8NvOn3zFdIHjLLC+nIhccO9I5r/3gplN8rL2EPwLPrkLorRT37XNo/VIvfwZJ8l44A== Received: from MN0P221CA0024.NAMP221.PROD.OUTLOOK.COM (2603:10b6:208:52a::24) by SN7PR12MB7911.namprd12.prod.outlook.com (2603:10b6:806:32a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 14:40:13 +0000 Received: from BL02EPF0001A0FE.namprd03.prod.outlook.com (2603:10b6:208:52a:cafe::c5) by MN0P221CA0024.outlook.office365.com (2603:10b6:208:52a::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.31 via Frontend Transport; Wed, 1 Apr 2026 14:40:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BL02EPF0001A0FE.mail.protection.outlook.com (10.167.242.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:12 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:39:54 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:39:47 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 02/20] cxl: Declare cxl_find_regblock and cxl_probe_component_regs in public header Date: Wed, 1 Apr 2026 20:08:59 +0530 Message-ID: <20260401143917.108413-3-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0001A0FE:EE_|SN7PR12MB7911:EE_ X-MS-Office365-Filtering-Correlation-Id: 42fb7f3c-3662-4a9a-1849-08de8ffc94e7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|36860700016|376014|7416014|18002099003|22082099003|56012099003|921020; X-Microsoft-Antispam-Message-Info: DlRZ4/MA9u0Bi5rVK2hZwMaOFc2ihMh+PbeHjfUDW7QPtbGjPvQaHEMi9SMLX/wlBS3OJA9TbNbUeU92taiIP/Nexdma2jkJkH5/cIC74WIAAli/jNonv+SqY4U1ICsY4kAmTkMGXM8g2VZqoKl0hCLZkHR6K9CVI+GFS4n3c031I2/tY20PeUlgFc7MjU4EATddwhkQ00zL30Ngk5JAsNsR6TMC2eCW+gBby8U+ZIovXYEhvkmREnDOcLRUN4EK+dyPTHFChHaIM56SruXKLpRVzL9VGn9T2puTiqxHJn++ToHfVpMBznmDphSwdU/BJxhWzQYcikq+1uvDZgjNxwrKXTHWnDIxuVCxsrO0KqoueijLS5ECjVbMOcLKgMfCpBN5n6TP7GbOiBbUMvUDby6s0YSCT46+bNRdRCftwY8cSZrXWJmJ8Jj8GCft7jYhz69y10bv4cqRvTqWbhqTOuJn4nem9J8yBy2RRtrH0wUNddynmEprYO+FNrwQcH6uIoJZBImrhq4O0XiCPEfcWAPE908E3Rrnugle6tjLFDybvLX2rh9EBWgkHqRxq4kppRhwUCDdiTWuqEvs+yqzYqfoB0J04nmVESwESpeidlqUOF4y+WBAvSXvqN+iadUutVk8gvSjJRQFrqbLIfWm0C44b4yTKnfQ4ILt4dGqzh2sfIpissx7JFuD6jfb02SUpL9hzrLvo/y4wr+BTxyQNxIAznrRrJLNlAbclHzPFazO8y+y9z33jRIhP95cQnKWYCflBOznsP+yrSZ5Y0vpVtOVb4KLWUS2g3d5ONDZPSWcKZN9V489K5Jr4UDM1jYi X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(36860700016)(376014)(7416014)(18002099003)(22082099003)(56012099003)(921020);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 1tJliM/9PbGpEUlTqchMNt9gUwLA6ranDCjWm4racSwBwz3A6JtAwqVmZ2ZLlyQcyWqMXuEgH2hjvfvg7UIL5V7+tPR1DNB37jgc1qh57CoDk6A+ceBC6nKga6PFQI9C9j77ESso5eF0V7m6WCDlsrpLFNXhyWJQovAFDMNOsxOFKRvnpZMgwH0TxmXmZIBfiw47SwFCFXTYDr6kneqfvjHsZM3CIZOyMn0119BgKoadrywLkX7TC+Gv8gORIAgFEGBLEoUvTxEQpdmIFocbJPrcFR6qdLvDIgjJAJSFgFfnpxzIbVs/SSbs1Z9bpGYGpZl6RNdq9pNj8+Cdj8xv32RCYvExnu6+ncjPr5oZia32NhljZqy3jErhExNOeKR8baa3Z2uAv/2dzCe3Gl1VMqodcBNrszxlXo1MNlkZMfgea8F6fOq2W6u5Y11KUIVr X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:12.4167 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 42fb7f3c-3662-4a9a-1849-08de8ffc94e7 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0001A0FE.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7911 Content-Type: text/plain; charset="utf-8" From: Manish Honap vfio-cxl lives outside drivers/cxl/ but still needs to locate the component register block and fill cxl_component_reg_map. Those prototypes were stuck in the internal drivers/cxl/cxl.h. Move the declarations to include/cxl/cxl.h next to the other vfio-facing hooks, with stubs when CXL bus support is disabled. Drop the duplicate prototypes from the private header. Signed-off-by: Manish Honap --- drivers/cxl/cxl.h | 4 ---- include/cxl/cxl.h | 16 ++++++++++++++++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 2b1f7d687a0e..10ddab3949ee 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -198,8 +198,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *ei= w) #define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48) #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 =20 -void cxl_probe_component_regs(struct device *dev, void __iomem *base, - struct cxl_component_reg_map *map); void cxl_probe_device_regs(struct device *dev, void __iomem *base, struct cxl_device_reg_map *map); int cxl_map_device_regs(const struct cxl_register_map *map, @@ -211,8 +209,6 @@ enum cxl_regloc_type; int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type); int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type = type, struct cxl_register_map *map, unsigned int index); -int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type, - struct cxl_register_map *map); int cxl_setup_regs(struct cxl_register_map *map); struct cxl_dport; int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dpor= t); diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h index d86faebb99b7..8ef7915a51f7 100644 --- a/include/cxl/cxl.h +++ b/include/cxl/cxl.h @@ -286,17 +286,33 @@ struct cxl_region *cxl_create_region(struct cxl_root_= decoder *cxlrd, struct cxl_endpoint_decoder **cxled, int ways); =20 +struct pci_dev; +enum cxl_regloc_type; + #ifdef CONFIG_CXL_BUS =20 int cxl_get_hdm_info(struct cxl_dev_state *cxlds, u8 *count, resource_size_t *offset, resource_size_t *size); =20 +int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type, + struct cxl_register_map *map); +void cxl_probe_component_regs(struct device *dev, void __iomem *base, + struct cxl_component_reg_map *map); + #else =20 static inline int cxl_get_hdm_info(struct cxl_dev_state *cxlds, u8 *count, resource_size_t *offset, resource_size_t *size) { return -EOPNOTSUPP; } +static inline int +cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type, + struct cxl_register_map *map) +{ return -EOPNOTSUPP; } +static inline void +cxl_probe_component_regs(struct device *dev, void __iomem *base, + struct cxl_component_reg_map *map) +{ } =20 #endif /* CONFIG_CXL_BUS */ =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011061.outbound.protection.outlook.com [52.101.52.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDF8046AED8; Wed, 1 Apr 2026 14:40:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.61 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054425; cv=fail; b=pBvtAGgeMuWOorD840Gy3nSGbOTx34OXUao2qM9e+Q0qNpri8h0dVblu1XOyHPCmX7GFKaM8XxzHzqNsCC6N8GGe4AKYt6HW5FamK12G1aZpteN6O2UwsgzzwikhmGR2UgvVMV6hO5Z+xvCmn9rUI2aelzsLY/OwV1nfBNbVYOU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054425; c=relaxed/simple; bh=Z+Fh5jVuPP0BfbV/y/d/4akbCuV8sU6WbZkCYMnB8OY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Di46eTaDgw9MhfejWSN+GJjy2Bkd5unLzDXmRD1nh507OQFl0wGHFt+5u+6Khu6wczAoJI0HlN6z5wBjf7D5Vv/c3gwW1bmt7dnr3m4ymyAjnfGth0in+WhNMGTfPxSoFVh8Glugcs+NVryee626tPcaOGhkmpuO+1VqgqEd5O8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=B/LSvh5U; arc=fail smtp.client-ip=52.101.52.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="B/LSvh5U" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=m+H9dxXlBklILO8kUR/uim6HYDqVjvU0MQXkCh4fo0TGVZ/sRPIV2dw1oZKlOHIouzXC7mzg69W/53WvFpObCz2yA9mdiLEB1CIsJMCgojNq5jUh0t8stQhYiiqhfpFNvCC4MGTiCrshHcs7rY95fBFSICd454tFbz47ZVXrH4YG16lsVtKeJVJ7ErgtZbNAcQfiMbc0iY9qNx/oYC5iKGZhslk663fHLWxGwEK29IqnjWinwlKFuFR4lqwy3MZz41fRWtMbQxhIwBtkgRl1NApD15tnsKZZmqXzhWgzhHXA+rJXjBgiHyQoe/Z6XfWTaXZEoS6vgjb95oWauvm5gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W33yyWvx4735N0IT9MZcNIjK0Y/PjuH37OLW0pmeWHM=; b=RnbNfBYprZsg+Rk40ghfwzeDmjQcUOiEjvNGzV7cMMAXre/+oFfQGynUEwvlrtdaC1nr/on5nqt2knT9QNK8atDvEkpDgOM8t80FoE/F+t07HPP5aEGnjcqm1rk7mxoJ9yfDv+/tKV8YDWpNakCnBvYv+wy0/18yONmJ9SnkOptqUEZuc+hGGChJUDeQ+BnnX0j5K2NUYVxkMjZ5E8kYZeMonp5fXQzxQUzsiNwDbrS0pYq3t6H3RE/cyG0vWg027nlZdkFFuN1+5KoIAczWf2+LaAOZqyuVsIhsDXTVNqvMpeFek1YyqD369r9WxZC8i0QwAgM6qKMsjZNbmfulyw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W33yyWvx4735N0IT9MZcNIjK0Y/PjuH37OLW0pmeWHM=; b=B/LSvh5Uofpm8we5YrdmubA0g60M+RVJzp+CinTqQa1Mj1xd5pQBrvJPnXWp8aJNbXXnfIBd7IbgRLPvbp/hy8SLTQvb8k7d7oWEIc8XQtRycOxgusM66La+vZlgiy4ergm4WRT6VV55qCJtWCQUOl0cDOVNjBMv5oqOReQoDDyADu7ueD9Wo4bQ8WbkPGkKNfLJOyMuNDlZLA1SFicRuIGEyIJ4ZXi3mCajp3aMzH0jmOmYTnjDW/aca8RtuM46osBmp39rYTPR3QY5c2f4d8vh7gnuWPZTl8wSMhhgk7b7AShKXaQ7iwiRzz8e/keOiCY5K4ebv0/jCC3suhKoFA== Received: from SJ0PR05CA0070.namprd05.prod.outlook.com (2603:10b6:a03:332::15) by DS0PR12MB7678.namprd12.prod.outlook.com (2603:10b6:8:135::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:40:15 +0000 Received: from SJ5PEPF000001D5.namprd05.prod.outlook.com (2603:10b6:a03:332:cafe::de) by SJ0PR05CA0070.outlook.office365.com (2603:10b6:a03:332::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:40:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D5.mail.protection.outlook.com (10.167.242.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:15 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:01 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:39:54 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 03/20] cxl: Move component/HDM register defines to uapi/cxl/cxl_regs.h Date: Wed, 1 Apr 2026 20:09:00 +0530 Message-ID: <20260401143917.108413-4-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D5:EE_|DS0PR12MB7678:EE_ X-MS-Office365-Filtering-Correlation-Id: e86ed2bc-7e62-4c7e-e55c-08de8ffc9681 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|82310400026|1800799024|7416014|376014|921020|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: SgGyIeBdOi/hpABe+Ih0GS4aFKq1Pw2yEP4AQc+SAj7S7fhfeVjULHPlqxhyZ3oWC/x/xf/Cw/4bcrAYrQhtfXubGi2LpPQfNPbXHAmkx/EIekwhgqy4OsTYBGi4EA/nZ3ZY7L0cHG9d2TSBwydmCX/HmK0WUBjE+rmo3H6mTvLmaf17xpuc0uh3iZmiMHEWz+cxUPvw1C4xiSGkl2R+uUHZJsjiSQEuE6Gejc34B2Wu4dKLvvRnoHYEnQ3jxP0UTNDJMHL7RdU30/HjyK8ve6do9cvyK0WdbCMteRK3MlOfsKj/54sSka5/Dj6/TzqPgguLI7t2ouRHO/aWuBPVbXhmbYvgcEWbK+nx/eIlyvvVnF9DOSgoHm3KD2E0dsphJ+vGhd8QYAs7U5DQ2AM/1wE0aH8X+qaauUq2DSmotx+aQkQ1jBSnBztkYpxl7Lt8gkV47yBI8dhmxPGmos7nhi3YWvd8vjNlbZxHqtl9dE6Cd90yiNvz5HVLUpIfsQa1WfRb0pdBup57AVDpUdKOWBxBCteSMOzKyOtJZV6v3GtN6FIE9sC5m+kwpPws3dJTvZLN586qRdEakQqzc3jW6tqc8TLaVnQnW9de3IeeeUINitSWbGSog/7TU9r9n0+aiO+tpyVZMQG6+49uHqCKmpm/VCujmdx3Q8fNpMAlokVyftpnH3tQLPtLr8x0DuHHKzWAhh0e5mVuLgYGMrA36CnCqvGSFexxjdDF9dBGPc6eEMxtezrf7NE9IsTzQhTnKl7bif60qgjPXkOkCQDzRNH/F7489Cixq8tvhQI7pr3o7pMq/KP+UrIaHiTlY4pJ X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(36860700016)(82310400026)(1800799024)(7416014)(376014)(921020)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: V6bKbM5DYHOjJ5/I7uIjWFiui/R3HZ8fDLeYyv8h90PaubD9EUcP3tBPpXl3Aq4ydQVYVKBsFOhnt8HG2n4LpwgR4lzeQ26b3mq8R56ybnew1pRfUQLl5sPG7aH/leLwsNjr3HrkGW/tPCgUnF9j6LJPOqKkbo/mS/t23GP1YZhG8BFmTi9w29pk/mWeqKdbuoD4vI0dSpTdfsOnj1usAGPLBPOacYDhEQzMZkjExKclZInw1MjcGRgOPh0BNKxE0/ruVJyLKeoYU8rLle6t8ZduAHI6x8UGuOgyYKdpqbx1CVo+c58/YQ0fZl6lcv5XbTU5QzLi17Km0eIpcSCWGn4C16Y1NIcbSpMVgDZyb/Lb8CrKQeRyCOE3r1oCKGR7ZOrG6QHy1eJLSiUMHnYeFRtiff82k9K3x6hfiqSjbtS+8GFCDv+i+aRKjmnmNW2i X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:15.1525 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e86ed2bc-7e62-4c7e-e55c-08de8ffc9681 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D5.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7678 Content-Type: text/plain; charset="utf-8" From: Manish Honap VFIO and other code outside the CXL core needs the same offset/mask constants the core uses for the component register block and HDM decoders. Pull them into a new include/uapi/cxl/cxl_regs.h (GPL-2.0 WITH Linux-syscall-note) and include it from include/cxl/cxl.h. Use the uapi-friendly __GENMASK helpers where needed. Section comments in the new file reference CXL spec r4.0 numbering. For UAPI change, replaced the SZ_64K with actual size as the macro will not be available for userspace programs. Signed-off-by: Manish Honap --- drivers/cxl/cxl.h | 42 --------------------------- include/cxl/cxl.h | 1 + include/uapi/cxl/cxl_regs.h | 57 +++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 42 deletions(-) create mode 100644 include/uapi/cxl/cxl_regs.h diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 10ddab3949ee..172e38d58c50 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -24,48 +24,6 @@ extern const struct nvdimm_security_ops *cxl_security_op= s; * (port-driver, region-driver, nvdimm object-drivers... etc). */ =20 -/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */ -#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K - -/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/ -#define CXL_CM_OFFSET 0x1000 -#define CXL_CM_CAP_HDR_OFFSET 0x0 -#define CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0) -#define CM_CAP_HDR_CAP_ID 1 -#define CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16) -#define CM_CAP_HDR_CAP_VERSION 1 -#define CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20) -#define CM_CAP_HDR_CACHE_MEM_VERSION 1 -#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24) -#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20) - -/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */ -#define CXL_HDM_DECODER_CAP_OFFSET 0x0 -#define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0) -#define CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4) -#define CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8) -#define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9) -#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11) -#define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12) -#define CXL_HDM_DECODER_CTRL_OFFSET 0x4 -#define CXL_HDM_DECODER_ENABLE BIT(1) -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10) -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14) -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18) -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c) -#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20) -#define CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0) -#define CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4) -#define CXL_HDM_DECODER0_CTRL_LOCK BIT(8) -#define CXL_HDM_DECODER0_CTRL_COMMIT BIT(9) -#define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10) -#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11) -#define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12) -#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24) -#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28) -#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i) -#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i) - /* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */ #define CXL_DECODER_MIN_GRANULARITY 256 #define CXL_DECODER_MAX_ENCODED_IG 6 diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h index 8ef7915a51f7..f48274673b1b 100644 --- a/include/cxl/cxl.h +++ b/include/cxl/cxl.h @@ -9,6 +9,7 @@ #include #include #include +#include =20 /** * enum cxl_devtype - delineate type-2 from a generic type-3 device diff --git a/include/uapi/cxl/cxl_regs.h b/include/uapi/cxl/cxl_regs.h new file mode 100644 index 000000000000..1a48a3805f52 --- /dev/null +++ b/include/uapi/cxl/cxl_regs.h @@ -0,0 +1,57 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * CXL Standard defines + * + * Hardware register offsets and bit-field masks for the CXL Component + * Register block, as defined by the CXL Specification r4.0. + */ + +#ifndef _UAPI_CXL_REGS_H_ +#define _UAPI_CXL_REGS_H_ + +#include /* _BITUL(), _BITULL() */ +#include /* __GENMASK() */ + +/* CXL 4.0 8.2.3 CXL Component Register Layout and Definition */ +#define CXL_COMPONENT_REG_BLOCK_SIZE 0x00010000 + +/* CXL 4.0 8.2.4 CXL.cache and CXL.mem Registers*/ +#define CXL_CM_OFFSET 0x1000 +#define CXL_CM_CAP_HDR_OFFSET 0x0 +#define CXL_CM_CAP_HDR_ID_MASK __GENMASK(15, 0) +#define CM_CAP_HDR_CAP_ID 1 +#define CXL_CM_CAP_HDR_VERSION_MASK __GENMASK(19, 16) +#define CM_CAP_HDR_CAP_VERSION 1 +#define CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK __GENMASK(23, 20) +#define CM_CAP_HDR_CACHE_MEM_VERSION 1 +#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK __GENMASK(31, 24) +#define CXL_CM_CAP_PTR_MASK __GENMASK(31, 20) + +/* HDM decoders CXL 4.0 8.2.4.20 CXL HDM Decoder Capability Structure */ +#define CXL_HDM_DECODER_CAP_OFFSET 0x0 +#define CXL_HDM_DECODER_COUNT_MASK __GENMASK(3, 0) +#define CXL_HDM_DECODER_TARGET_COUNT_MASK __GENMASK(7, 4) +#define CXL_HDM_DECODER_INTERLEAVE_11_8 _BITUL(8) +#define CXL_HDM_DECODER_INTERLEAVE_14_12 _BITUL(9) +#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY _BITUL(11) +#define CXL_HDM_DECODER_INTERLEAVE_16_WAY _BITUL(12) +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4 +#define CXL_HDM_DECODER_ENABLE _BITUL(1) +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10) +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14) +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18) +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c) +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20) +#define CXL_HDM_DECODER0_CTRL_IG_MASK __GENMASK(3, 0) +#define CXL_HDM_DECODER0_CTRL_IW_MASK __GENMASK(7, 4) +#define CXL_HDM_DECODER0_CTRL_LOCK _BITUL(8) +#define CXL_HDM_DECODER0_CTRL_COMMIT _BITUL(9) +#define CXL_HDM_DECODER0_CTRL_COMMITTED _BITUL(10) +#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR _BITUL(11) +#define CXL_HDM_DECODER0_CTRL_HOSTONLY _BITUL(12) +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24) +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28) +#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i) +#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i) + +#endif /* _UAPI_CXL_REGS_H_ */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from PH7PR06CU001.outbound.protection.outlook.com (mail-westus3azon11010067.outbound.protection.outlook.com [52.101.201.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6109A2DAFB0; Wed, 1 Apr 2026 14:40:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.201.67 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054444; cv=fail; b=kYIGuGoYrRfE2KbMECT9XD4umMh+cnhYzihl8mUHYvByZ/XsaHMTbBw3JqXTrRfnWeKWkhLg/vNU5pfN3JPDgEEsWceIROwjIfNKo1Wh4SzMhShtT0yuIRfjLmHSc+jK0P15PnhhYZj9zTQ0s28OpyNHGZykPl7XolAidP6IstU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054444; c=relaxed/simple; bh=+4VMsaGbdAGIflmjQrJ4//NBcja1D5ttQmBKNuw20TM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CMfnCMcsCr6q+Hs7euNkst1X/70BjBTc3+oVjGfkhNqCVlA7rf2bH5TjXAH3qd3NbzAPH8g8+lXeF3TuDCLiLgsfhs/16yqjUA/n/i/uIzoWgPzGX9CJqqCyrV8REh65TD5FU1nw7oSwacYNMfRPVAtNRx8Ipo/fccr0BQejoB8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=ChrW8dyj; arc=fail smtp.client-ip=52.101.201.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="ChrW8dyj" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GIGOUgromeEuNCXLtNCOcgQv++hJJmeoyTfSBd3Z+MHrtNz7aYr10hmxS3u52v67fZSYWN7IgmzlZPJCoh70lx2vksFC1Gj+n4/5N2YVaX3vgF5VEulVeIgbR8Bw7iYq0xvNfuAkPSuk9vI5HNcPT89lIzwCEK9ByJWegTklF+Xkd2JjxNO+Q0SkkAf73zKw95zryOcyreo4yhOiDQLqf3ygd+I5k50ug6KT7gfT7lBRRw6dC26+IDqQiuoMKGyFvcoXBIIEx8JjfK5ARPCFuNkIS9IBrNsq+EI6RAF6ptiG8LpNPY4m+g6fqQz5BariJEm1A3UZHpMMeqUv7TfZuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6aXLmxkS0OPtOM3vbtEXezb677ZOXe+bFO3IFD3xBFw=; b=GRDAO2mHHPFaj4oF0FUIkwD2RTXw7y9ztE1fvVWqXJAlmuJyYTjkFWEHcVOu9k43oQPTUkrrHXKuGtVD9ujS7MOND1R6MuoIEAiCI/lKqXArkkVLPz09WSOmPw2g3sagB7u3PQ8PKav4i1fayDBt5agiGqIlbAN5jUYpFQLHw7SFSTwjpiUtYTYv41K4GEMBJ8YaqB7LrZUfBFY7gJg53PqxSFRnXPEtKjaA+CjDMwhOu/+Dst/f7I7bI6PhakK0zKL4ze5Y2wiYBS/Ui3NATvpk6ftCFsDuZW81Zubx+IlBIZiEAsY0kwGiZmQSAUfEYUBResBoBtPfHd4W7PqfuQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6aXLmxkS0OPtOM3vbtEXezb677ZOXe+bFO3IFD3xBFw=; b=ChrW8dyjE+npcFv+E/g/m7BcrVHdaa4r6O9VaEoufQMZq6DMfGeB3dFCQ+LX/hGVcucb8AtCJR2TIqfKohcrKqQiR+l9NFgUCTreRrfZlClE8FQkPSGjFKcheqEzTtjF/BdwUPq4RJUyRfhSOLd4m/dcPNQpemFxHZiFQIGOI1LwIPOvcurSVt74F3ieSnTh/SgTVV6zzmPtIwV1kNHWmID66RA1YNVW/kwxg7UAVKvU6qikVx9FHrfOS4frH5MrAJDzQKz590G7Fd2RA4ZI/mY1UhkVwH7NrQTO8fSNvAIte+EqxFlLB5RNCWrCYFT+ZP+gEOmbJvCxdzP9dhhG+g== Received: from MN2PR05CA0062.namprd05.prod.outlook.com (2603:10b6:208:236::31) by MW4PR12MB6803.namprd12.prod.outlook.com (2603:10b6:303:20e::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 14:40:32 +0000 Received: from BL02EPF0001A0FF.namprd03.prod.outlook.com (2603:10b6:208:236:cafe::10) by MN2PR05CA0062.outlook.office365.com (2603:10b6:208:236::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9769.16 via Frontend Transport; Wed, 1 Apr 2026 14:40:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BL02EPF0001A0FF.mail.protection.outlook.com (10.167.242.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:32 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:08 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:01 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 04/20] cxl: Split cxl_await_range_active() from media-ready wait Date: Wed, 1 Apr 2026 20:09:01 +0530 Message-ID: <20260401143917.108413-5-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0001A0FF:EE_|MW4PR12MB6803:EE_ X-MS-Office365-Filtering-Correlation-Id: ab8fe79c-3268-4c25-902d-08de8ffca0d6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|36860700016|82310400026|921020|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: UmAdL43qGw6hfhs+AoxULDZCHbWE8HVKwY0erEOijqO4Hndvw3r1d/SvSfLW9iaNMY5s+lJhBIWJD0wlA6bGKhBC+fjq37C1BDvheXHQL1gFYQ5r2LWOvem7ad2FRAqfLF8UXsWIHPcMYnNwmwPq4IrvI0HdSLUqc2O9Zgt/u3gcXsN+n/3hdVpeY0yazEsivcGVIKyvGwfEcmrEyuAHcx2sggCNsTUQeUe6d9roAflkJznilGfuZcbKqaVm5lsH6DkeW0yMQS+Nzb9S5JJyumS6S6gKt/ooIXF3g+0cjfGnS40/y2Pf6rikxkxpVxm16/YXif3yEV09LTXzDuS6KGu6J3FR672g5VRjKpQsvSGcHwYL0igtmIBf122RB1Z+yJcPO2bJdp2iBngyQfoamlXXjqXcXA0RBlovHWWQMsXyiZ3iMGILz+3Ds9XJJccrz2dpxJEKG4VLbIq52xq8YTyS+p4spdbJC1kVOJVV7qvhSbF+fRwZJ8SsxbkR1nzRBAf3X2OE3GcMf0bRp/FsJpnkHgroW4dwu7G4JzH7NDfkVg+gMaUhq2gB2HDS0xUSeSWDY7qfmjitBdavSvSCNS/YhJDqwHkR0fNKgzxTe0EjEVgPaSbp5tEbC/9/0O+X5eAPTCVbS5tdeR7YTccnNT0TNyfjjDHW9hRi23BBM54WZvWIYfBzyb/jfGP6VfM2izPzm/dPT2MHe3kqQCrSur54z2BPEF2syebaOSRS5bN6TiD6emJgb0T28LEjcJKXPmrCKa4FlHrosYaPCTVgsLpwMQym3Gj3c7RBXtiP7UGmsR0Ge0UjOHTRbP/fLEQR X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(36860700016)(82310400026)(921020)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: KH0OA3QeoFp/Y5SBqIK1wJCUKoonq0YDo3v7q0rqZ6v6i9smRe8to3u5KdpD1miy6lXh9yaitA5355j6wJt1NlKd1c3jkbB51X+rpfKXNC5bxu/HXCTeCOZ4YQmRgU3sqlSrBdB57n5bZcZlFyCsM7v3WAQjAt6DHEn8V120avD8FtalQWAhLHPPWN1aSjE68vsFGgsc9kbFvTWwYxKKjHvn9oxEypCtyR36m/qPGpGZijMUuuPUYW9RWPDEWT728qyv/yl3mhkyjlY9Egg7Yli/hBvZd+i8QJJ3Ad2dyzdAPfVI+rikrLnyY2qCG3xewjUE6XeuYEo2Afi00XurbCX+Nk2meIxcRHlIA2cG4Ih1RNaYhXs2hqwcIM5XKFsa+Z8K3ul3IrakDcZ0ktIq4tO3+a/97m3r1mDDCpE9ejWM8LZufNE5LTr6pvfYdotV X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:32.4291 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ab8fe79c-3268-4c25-902d-08de8ffca0d6 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0001A0FF.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB6803 Content-Type: text/plain; charset="utf-8" From: Manish Honap Before accessing CXL device memory after reset/power-on, the driver must ensure media is ready. Not every CXL device implements the CXL Memory Device register group (many Type-2 devices do not). cxl_await_media_ready() reads cxlds->regs.memdev. Access to the memory device registers on a Type-2 device may result in kernel panic. Split the HDM DVSEC range-active poll out of cxl_await_media_ready() into a new function, cxl_await_range_active(). Type-2 devices often lack the CXLMDEV status register, so they need the range check without the memdev read. cxl_await_media_ready() now calls cxl_await_range_active() for the DVSEC poll, then reads the memory device status as before. Co-developed-by: Zhi Wang Signed-off-by: Zhi Wang Signed-off-by: Manish Honap Reviewed-by: Dave Jiang --- drivers/cxl/core/pci.c | 35 ++++++++++++++++++++++++++++++----- include/cxl/cxl.h | 3 +++ 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index a5147602f91f..1fbe3338a0da 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -142,16 +142,24 @@ static int cxl_dvsec_mem_range_active(struct cxl_dev_= state *cxlds, int id) return 0; } =20 -/* - * Wait up to @media_ready_timeout for the device to report memory - * active. +/** + * cxl_await_range_active - Wait for all HDM DVSEC memory ranges to be act= ive + * @cxlds: CXL device state (DVSEC and HDM count must be valid) + * + * For each HDM decoder range reported in the CXL DVSEC capability, waits = for + * the range to report MEM INFO VALID (up to 1s per range), then MEM ACTIVE + * (up to media_ready_timeout seconds per range, default 60s). Used by + * cxl_await_media_ready() and by callers that only need range readiness + * without checking the memory device status register. + * + * Return: 0 if all ranges become valid and active, -ETIMEDOUT if a timeout + * occurs, or a negative errno from config read on failure. */ -int cxl_await_media_ready(struct cxl_dev_state *cxlds) +int cxl_await_range_active(struct cxl_dev_state *cxlds) { struct pci_dev *pdev =3D to_pci_dev(cxlds->dev); int d =3D cxlds->cxl_dvsec; int rc, i, hdm_count; - u64 md_status; u16 cap; =20 rc =3D pci_read_config_word(pdev, @@ -172,6 +180,23 @@ int cxl_await_media_ready(struct cxl_dev_state *cxlds) return rc; } =20 + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_await_range_active, "CXL"); + +/* + * Wait up to @media_ready_timeout for the device to report memory + * active. + */ +int cxl_await_media_ready(struct cxl_dev_state *cxlds) +{ + u64 md_status; + int rc; + + rc =3D cxl_await_range_active(cxlds); + if (rc) + return rc; + md_status =3D readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET); if (!CXLMDEV_READY(md_status)) return -EIO; diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h index f48274673b1b..45d911735883 100644 --- a/include/cxl/cxl.h +++ b/include/cxl/cxl.h @@ -299,6 +299,7 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_re= gloc_type type, struct cxl_register_map *map); void cxl_probe_component_regs(struct device *dev, void __iomem *base, struct cxl_component_reg_map *map); +int cxl_await_range_active(struct cxl_dev_state *cxlds); =20 #else =20 @@ -314,6 +315,8 @@ static inline void cxl_probe_component_regs(struct device *dev, void __iomem *base, struct cxl_component_reg_map *map) { } +static inline int cxl_await_range_active(struct cxl_dev_state *cxlds) +{ return -EOPNOTSUPP; } =20 #endif /* CONFIG_CXL_BUS */ =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012054.outbound.protection.outlook.com [52.101.43.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B651A43E49C; Wed, 1 Apr 2026 14:40:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.54 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054449; cv=fail; b=oBfbyxJPKhW7nKuKoHJCtYC+67TAMRDY+OyouIYXK+b5I6H+zPhk/aPWzNV6KzEGdzv60YKzixXRPI2OCNYG0l92jEpSF/mul5LWUQU25vxdXKTCMCUL8P/1OMgAtf5OdpKyjHhvrZjEbjbGBCwsvEC80DtClbkEfHpdwa6UCew= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054449; c=relaxed/simple; bh=oG2lVXc/ClWnCjW8dqwpOxhd8OQDyg9ukOFHsktgHeU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MoIvrjHxxC4sxTCyGVkK0p0sQWEWOu8xYvJmi5a2m4quuYKzLuxtXzLGj27tZEracFKfJuHIomb5VteCrcRF93NVrT1cl88kMHM8V58KFaVBas6vdKtKNvbFnKj/MUIX9furciUGOYSFeE4k3irsRvUrwL1qctA32fyWhpzi/3M= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=HGmVmyYF; arc=fail smtp.client-ip=52.101.43.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="HGmVmyYF" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GKPln78/umSlP0F8IM2XeVmz5W3SGDMm9kr8OghxHndoO3ayDKzfKnQ70nAVLp7YEjMuvC/sIyDAxJ8N3HxGFunkGdB1YcZRLuGDvHRAk8eteRSIUP/61b8QOaYnhv3nbw8bn15+G24ZZt8PQxENl7XECEHlL3PWwAKZ+ONxV8T4tAYUc7HQQnOtYIB0I+GEqsyVqq/i5Y7EKYoH1yd4h43xe4ZbUBa9kDaprF8huX9qLV7FGEpqY9NCgycPlqO/3sWIrv9b3yBN/vr2fx/vKtQ6PkhHoV8RCIIW9zFiw24h+jJabxYjA5jotPu/DgQ1ZMeZ6aRfQNkLCERvdSQRVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pp1qho77JqVUGahNWVpqszcz11EQnYca0A6DadrkGq4=; b=Bj26mwN3b8jSjPhmX2JElkRqAhtjVBiX2RsvZaul++1YIG2MM8ZQ6UZ/rcqxpopxBeW5jcSAVmkoQmTYmXqwlfTWrKyjqaGFf8hM8GrgDj83Pz7JM/cWupMerYui7QT1j0m+ZLvwPHca7l1MAvffen4hUuEWQTsLMeZDPX5z9aTiyNhPugBb25Chn1n/QF+4GEaiBDC8n9JJKEPT3Zo3c0hGHcTBBxf3Z9tbz6vdsWLvPwVzPXTpSvc0D0zPvGDh/vzSvfhY3n44CBBwrak/IKnpKIjbGtgC7jSiXK5OdYuWLlCYl3a+jkcha4kqf48JxGZljBY+seg0qWG7f00glA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Pp1qho77JqVUGahNWVpqszcz11EQnYca0A6DadrkGq4=; b=HGmVmyYF7uQvSt08JG3uJ2yg9d1OBKI3h/1P53+Of8W+N+tAQFspX/stGHgdwCNiDxWtbHpbKL8lv4q3YyO37VXT0/RwXttpv5q/S6+1nK1BytyM2xs59Sxp+qdYJqmD5sspKEYS4W++t+aN763S6JQckTIZOa5+0NaG62q1UqGCiOLf07hmCBS567kbgM3WVX6F1mczchJ6nnCt3Mi5XfrlyzGfNJX8jS29ufCUj4S5TCY022duqjysSlt/GLUe90lrvuJLP5AEUM/WUPKknl8f56a/cyO9DXWC9gTqzkmL58UsEotYB6KNJXP2lPZly9nYZItIZtdSdebE9QjcLw== Received: from BL1PR13CA0385.namprd13.prod.outlook.com (2603:10b6:208:2c0::30) by SJ2PR12MB9163.namprd12.prod.outlook.com (2603:10b6:a03:559::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.16; Wed, 1 Apr 2026 14:40:42 +0000 Received: from BN1PEPF00005FFE.namprd05.prod.outlook.com (2603:10b6:208:2c0:cafe::d8) by BL1PR13CA0385.outlook.office365.com (2603:10b6:208:2c0::30) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:40:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00005FFE.mail.protection.outlook.com (10.167.243.230) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:42 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:16 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:09 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 05/20] cxl: Record BIR and BAR offset in cxl_register_map Date: Wed, 1 Apr 2026 20:09:02 +0530 Message-ID: <20260401143917.108413-6-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00005FFE:EE_|SJ2PR12MB9163:EE_ X-MS-Office365-Filtering-Correlation-Id: d4e69089-b754-49d9-072b-08de8ffca6a4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|376014|1800799024|7416014|82310400026|56012099003|22082099003|18002099003|921020; X-Microsoft-Antispam-Message-Info: baXme/g2fRzEpuv1T7/iRWKSybjw5w9zpfdqE4SXRTjmLyPYPHNaaMVbuPXeQuJBGgk6F6WYdWYHZltIlsSHiDDnjnFzjPhQ2blqQIzjAkb1KXIWEeFGacLQzNYKNISB/58xZPtPAAS2668YkDrddTvu4vlkEAWuxSlssdK9qKa3LozXKrLViYHL378DBykPObbyymD4Cpbwk/WjkBYEOmdoIo53TobWA+t08eNOL+qE9F6kYI9hIi38kh+U5QLPYo4pImlg/6GvmRJxqEbUzpJJ1K2ZHY0HSD3Ch9F16Sg8l11eIS/4aeIRFJqNDpNzJsi97Jgm7AWFuankJmo5Nc8/dfTSNq9nnee3zWfcEptzQ/9FOiVUEJp3lxhmD7PzM/bwViqw+WFnG+qQ4/lj8TX0MrR82bw2FlTeOVDqVhB6gBI55prR5XMxhtDGrvDNAt9jlISQBZ3d+WtB+uLVRqJ6+Pqo0lLXjGjtyD4IZW5HwdSdSPQ+BX0XFD1QPFpS27avTxsbpmKEckkXufKs20yjK+1w+7XswqBLVr4YYFGEoh+v9sygiDdw0BYj/Q/Ldr8x+FNee0c6rzwoRUyIuvvdlz4u77VEupSCkV6XtLhIXBamZ5iY1YX+dAeDRtvew9D0leVCQmvAchcrneyLMcgtoVIOmfsEI/NHT/BDJUIDI/8N3vCgFMr8IVCv1vFQF7saysiXGxk008PzAwwZIJOk8Io7LKojN2EYhbI9xNI0EPjbtU10FxUGWUvfyPBl4ieCdiiwL8LJuK5eJlhZEAt6HpwX3IdxRFRiy4abDwkhop5PHR/Py98xt65OoOrr X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(36860700016)(376014)(1800799024)(7416014)(82310400026)(56012099003)(22082099003)(18002099003)(921020);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: M0SRV6N3XTk/sU3td5sRyWIqZi3GGhfyhU+XrG074imCUhyaSfT1CKzU2a4ZbsjJKcHdGnd6Amk3MJ5Z7zmKwxIeayopkylH8R4/APaNAk+XfET8T7yVG2o2TyWbZgcuguMLIjQDt1u3xOzEGIug33mQltdAfG3MiXiHdJ4x3bxkaNkdy9p7T6a0lhjdtfCds3QPLd9/4K77dGXtlTYCZ0DbqEjNx/GJEgEfi218SmzcpD0jYv2Aa/bxtgMB75r8MD/AMY+B/dehZ56kyVnUe2FBSyxSgVXIxGpE6VEpmoRXscU3PyGj5RRGL9gclf8L5s+GPq1tvjFWfJXhdDYS4CrDZpy6bq3Uf3acXaw19YsXSOaAJhl9/6w9mT5vBaKBFuNyqeN5HY/a9HcW1MRUYhIrZXNiZBOnNjNsSGAOPr41eLiYlEXwiPzFOvow4UAz X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:42.1543 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d4e69089-b754-49d9-072b-08de8ffca6a4 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00005FFE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB9163 Content-Type: text/plain; charset="utf-8" From: Manish Honap The Register Locator DVSEC (CXL 4.0 8.1.9) describes register blocks by BAR index (BIR) and offset within the BAR. CXL core currently only stores the resolved HPA (resource + offset) in struct cxl_register_map, so callers that need to use pci_iomap() or report the BAR to userspace must reverse-engineer the BAR from the HPA. Add bar_index and bar_offset to struct cxl_register_map and fill them in cxl_decode_regblock() when the regblock is BAR-backed (BIR 0-5). Add cxl_regblock_get_bar_info() so callers (e.g. vfio-cxl) can get BAR index and offset directly and use pci_iomap() instead of ioremap(HPA). Add cxl_regblock_get_bar_info() to return those fields; -EINVAL if the map is not BAR-backed. Signed-off-by: Manish Honap --- drivers/cxl/core/regs.c | 29 +++++++++++++++++++++++++++++ include/cxl/cxl.h | 15 +++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c index e828df0629d0..43661e51230a 100644 --- a/drivers/cxl/core/regs.c +++ b/drivers/cxl/core/regs.c @@ -288,9 +288,37 @@ static bool cxl_decode_regblock(struct pci_dev *pdev, = u32 reg_lo, u32 reg_hi, map->reg_type =3D reg_type; map->resource =3D pci_resource_start(pdev, bar) + offset; map->max_size =3D pci_resource_len(pdev, bar) - offset; + map->bar_index =3D bar; + map->bar_offset =3D offset; return true; } =20 +/** + * cxl_regblock_get_bar_info() - Get BAR index and offset for a BAR-backed + * regblock + * @map: Register map from cxl_find_regblock() or cxl_find_regblock_instan= ce() + * @bar_index: Output BAR index (0-5). Optional, may be NULL. + * @bar_offset: Output offset within the BAR. Optional, may be NULL. + * + * When the register block was found via the Register Locator DVSEC and + * lives in a PCI BAR (BIR 0-5), this returns the BAR index and the offset + * within that BAR. + * + * Return: 0 if the regblock is BAR-backed (bar_index <=3D 5), -EINVAL oth= erwise. + */ +int cxl_regblock_get_bar_info(const struct cxl_register_map *map, u8 *bar_= index, + resource_size_t *bar_offset) +{ + if (!map || map->bar_index =3D=3D 0xff) + return -EINVAL; + if (bar_index) + *bar_index =3D map->bar_index; + if (bar_offset) + *bar_offset =3D map->bar_offset; + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_regblock_get_bar_info, "CXL"); + /* * __cxl_find_regblock_instance() - Locate a register block or count insta= nces by type / index * Use CXL_INSTANCES_COUNT for @index if counting instances. @@ -309,6 +337,7 @@ static int __cxl_find_regblock_instance(struct pci_dev = *pdev, enum cxl_regloc_ty =20 *map =3D (struct cxl_register_map) { .host =3D &pdev->dev, + .bar_index =3D 0xFF, .resource =3D CXL_RESOURCE_NONE, }; =20 diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h index 45d911735883..52eb40352edc 100644 --- a/include/cxl/cxl.h +++ b/include/cxl/cxl.h @@ -106,9 +106,16 @@ struct cxl_pmu_reg_map { * @resource: physical resource base of the register block * @max_size: maximum mapping size to perform register search * @reg_type: see enum cxl_regloc_type + * @bar_index: PCI BAR index (0-5) when regblock is BAR-backed; 0xFF other= wise + * @bar_offset: offset within the BAR; only valid when bar_index <=3D 5 * @component_map: cxl_reg_map for component registers * @device_map: cxl_reg_maps for device registers * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units + * + * When the register block is described by the Register Locator DVSEC with + * a BAR Indicator (BIR 0-5), bar_index and bar_offset are set so callers = can + * use pci_iomap(pdev, bar_index, size) and base + bar_offset instead of + * ioremap(resource). */ struct cxl_register_map { struct device *host; @@ -116,6 +123,8 @@ struct cxl_register_map { resource_size_t resource; resource_size_t max_size; u8 reg_type; + u8 bar_index; + resource_size_t bar_offset; union { struct cxl_component_reg_map component_map; struct cxl_device_reg_map device_map; @@ -300,6 +309,8 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_re= gloc_type type, void cxl_probe_component_regs(struct device *dev, void __iomem *base, struct cxl_component_reg_map *map); int cxl_await_range_active(struct cxl_dev_state *cxlds); +int cxl_regblock_get_bar_info(const struct cxl_register_map *map, u8 *bar_= index, + resource_size_t *bar_offset); =20 #else =20 @@ -317,6 +328,10 @@ cxl_probe_component_regs(struct device *dev, void __io= mem *base, { } static inline int cxl_await_range_active(struct cxl_dev_state *cxlds) { return -EOPNOTSUPP; } +static inline int +cxl_regblock_get_bar_info(const struct cxl_register_map *map, u8 *bar_inde= x, + resource_size_t *bar_offset) +{ return -EINVAL; } =20 #endif /* CONFIG_CXL_BUS */ =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from CH5PR02CU005.outbound.protection.outlook.com (mail-northcentralusazon11012035.outbound.protection.outlook.com [40.107.200.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C40E7477E4C; Wed, 1 Apr 2026 14:40:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.200.35 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054454; cv=fail; b=ngHeeVPaEh6ELLQ8fk51xKq+5SPPr5C7B0u2CYcbfxqHqhYIQMI6TOPd/8oJ8wshIf7Z61fJQrJz0XmGY8aldVCxQFKfY2nJHzuGF9/7GH5BipFeUdSgw/RY+T6lCy3p+D2eltoKo0a23dY00wIlNcshTeEgUMkoJOWZQVwEEdo= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054454; c=relaxed/simple; bh=W3rZoJ2lO33AOu+TLx+oZZ0eHeMZHeal4GGgxmfKgzw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RXkI5w8Fzq63tAd3nJ3kpAjMD2pTXbTUP7//WOE0dQ1DE2vVz2niYIX6RT1HHS9eS4xbkgPOvE/3qDmpe8kaTwlTrBNUb1vvq998D1k3qtFV6XhlC4poRMnb0LLjdqAOAnrPiFAuaIs1BNTAEcoxdB+jMHe7In1cuU/I9cQ6Sbg= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=RYU0H1q7; arc=fail smtp.client-ip=40.107.200.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="RYU0H1q7" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ur4Yv1eqeKQ89p80uoBMFlAx296SnIJ1odlTr5++daJUWShuu0bIZzf0z5ixGX6anD/QsVq9DtZ7//t0Wehzuo4UbQ71stDHqRznJM3bK8d5iuVONkKA9Vhe1Ctr0WpfzdvNfZh3ZR8X0Vt2G6rL1SnU9SZG57pCG/bqSNt0KXEFEyYtKOZMCO3Pp/pLZIwXX4QRXD2Iq+fzCNbLb+hSmU3crEddB9tp/dvPrM+p9a9Jie+FoZPgOpgrmMj4mrkhAVYVY3ipju5/XOh7hjQ+T3Aq+J25aKuaKfwGybsj4GOzNlEvlD3DFUkfxypmCJyBuMQ5sKIe4I/J6Si8tTSDIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VX30mfDVP2wg77ld2bMhtIHAbS8WBvFoMXWUF+gz8yc=; b=fuOP0buI93DqVTb1IBvpKlvav+P8Xw6P+PUZZRcHRKUoZYRg2e6hxKtsO/Hov/zuPh+njFpGofnlaKnkexX+lA66ru3FOhlbw4W/uJZ4pVhHLAxr7szio7iGO0hUX65YfXXoaxcDHtaWuMlXTM979uqaq5eBxb+h9MdtJ55L65OoJ8Yd/gwjN1j5vZUTqhpHfeP+Pvch+yw80O9N0MqzOHjz0dbkulQr3Y+C3Qmb4wKMYuniySEzOiYCwMekhYqlWxVGVUvCbsFVrOsVioWMO6YcXeFolFJs3p/NP/E2C0dOugNdLasCS5Tq9o6vtogOUfBTRgPicLsxI+RD5te1Aw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VX30mfDVP2wg77ld2bMhtIHAbS8WBvFoMXWUF+gz8yc=; b=RYU0H1q7nIhkDAw27r+8ias7ObOKZsGMyA+dAKfu+sZc497md4XdCzyYM+L1hEZHq/05SpHYokaXUhBB1eGORJFBgd+O8BlJYO56V+YjWbVAk9EC99jrVT/2S70UVV8SGkHpntPr35YgIWRd3VsNQK5W0O4y/9xnqBkPLbvBMjA6r+lVvPIUAiy503OUiwViDSlZu5yIAEGWFut8HPsHL4/FuhiHRVO9WP3kZZHQaEDFXPodaKncWz/L2kuP22wcitnO869F3a0QmgI2/ZgR6ShOhyExcLw1rEoWqeaDQ2PrUIXlc0ta5M51IjIpQqsh/3iQYjGL7CCW5ethECGNzg== Received: from SJ0PR05CA0086.namprd05.prod.outlook.com (2603:10b6:a03:332::31) by CYYPR12MB8890.namprd12.prod.outlook.com (2603:10b6:930:c7::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.16; Wed, 1 Apr 2026 14:40:47 +0000 Received: from SJ5PEPF000001D5.namprd05.prod.outlook.com (2603:10b6:a03:332:cafe::2) by SJ0PR05CA0086.outlook.office365.com (2603:10b6:a03:332::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:40:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D5.mail.protection.outlook.com (10.167.242.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:46 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:24 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:16 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 06/20] vfio: UAPI for CXL-capable PCI device assignment Date: Wed, 1 Apr 2026 20:09:03 +0530 Message-ID: <20260401143917.108413-7-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D5:EE_|CYYPR12MB8890:EE_ X-MS-Office365-Filtering-Correlation-Id: 0384da76-964b-466f-0cb0-08de8ffca94c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|376014|7416014|36860700016|921020|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: /VV+N+rqYVZ/NRs0JWnLZmOWesaZiiktx0v7kRSttWVw8QB3qk82x3NHEt4DRN2c/aWW5KAkgEDf7H8cMaVEA4RkDf0qBq1tSUM95a+/zpfePuWHqUPp+5qfE7JMSMxc1uXmeG5qs4Te/UvEIgnf8UzeUTyqZoh8DF+MuN8ad7pE++UwAMIiUx4ANeA9dfhNG5gB9OKFUGeecKC2cgJXbBPompbdVqdpRxQe/3lLUv3mGR/uHFTdqSTq/Bh6b/EiM3t2Vei8m/v5SYOLS2m8GMVwU6qXvK+FnyeSplFH2TU4nlx5We1APAGPDDGQPUJkNGD3QZg6q7FhAKduGcboXfdfVsDa2CBwXz9klSEWJ3Aglb5zuHrTCDl0LIyywRFuP7Ae+tubB3R41JtxLHE/KkHUG9rri/kzrD55ci/+lgif6tyjkD2QF81Bq5nwTDWcjKCaeiXsTIdJidor4sau3ZRHZ7C/bweD/d5JcdGOSXXG8Mnr+AOd2GBv1TFBXws6S2c7prjfUBKQUneJsF7OxjglnsQKqosHp7Q/0NL+Sck4Y6b5ddpxD97J4BQuUkyd7gQUihZFLDQDmGyRLaLTKPXrO+KAw9cc0cQOBXxq1wJs23Lq48y8POt8X8iD7cqAxGqC+vNu5TbNXoPIYiXK6oOZAUP2g5MzeFn39DpnaE3B+4x9ZAhfqHvo1j5e54ORKjgg86FNVJUeBy4Z7bygoL2ewf0wEXwu/Gy9z4GV02lIAtXx5ujoZu3b4e2pFRE80Uh1HI0acWIeVmJHk5mr1jrQtGKlzPrEkihXgp8C2mqopFs1g/nvYVJM8H2NLE4q X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(376014)(7416014)(36860700016)(921020)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ZpOkaJoySpgvbNutHpVEORaoh/iZ0xJDYMdTQcZGartwzTLQWgahmsYmCyhVNK1k76OspZ6g2BJ1M/8S//8hTAQKWXLilns4oP1306JZzSGpKIYLWPZ+X0vhBJWjVxUmX8zFJmRYhQ/5P0vGg+YJq20LB3EOlZICHoxcH6jFa/EFiqIpb5WFiV70eugCQYo+XM2ktllHCWGJ6t0cHszbE8tiM+8hDOnFn2+EfmzQtlu2we9zRk9XX1Yrl3GeGQjvDAwkDMTMkgihJBPLJ27JAHl9jdIjgzoyGYChqRxnOgVVPGmaHaE14y0p49kftqwtBmVCfd7j5lssIQc8wpH29ZipDUmf8lRC8uyj/5Z7edjgJ012rXG+xsi2NoxB+KALuC8GR8wtDak8rSCmyf1vk77J6s/TE+3M7Ex0fV6/XdW4frvW2RWudDdXqZIm9Q88 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:46.6770 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0384da76-964b-466f-0cb0-08de8ffca94c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D5.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYYPR12MB8890 Content-Type: text/plain; charset="utf-8" From: Manish Honap Vendor GPUs and accelerators can expose CXL.mem (HDM-D or HDM-DB) without using PCI class code 0x0502. VMMs need a stable way to learn DPA sizing, firmware commit state, and where the extra VFIO regions live. Add VFIO_DEVICE_FLAGS_CXL (bit 9) and VFIO_DEVICE_INFO_CAP_CXL (cap ID 6). The capability struct carries: hdm_regs_bar_index PCI BAR containing the component register block hdm_regs_offset byte offset within that BAR to the CXL.mem area (comp_reg_offset + CXL_CM_OFFSET) dpa_region_index VFIO region index for the DPA window comp_regs_region_index VFIO region index for the emulated COMP_REGS HDM decoder count and the HDM block offset within COMP_REGS are intentionally absent; both are derivable from the CXL Capability Array at COMP_REGS offset 0. Locate cap ID 0x5 (HDM) and read bits[31:20] of its entry for the byte offset. Then read bits[3:0] of the HDM Decoder Capability register for the count: count =3D (field =3D=3D 0) ? 1 : field * 2. Two flags accompany the capability: VFIO_CXL_CAP_FIRMWARE_COMMITTED A decoder covering @dpa_size bytes was programmed and committed by platform firmware before device open. The VMM can use the DPA region immediately without re-committing. VFIO_CXL_CAP_CACHE_CAPABLE The device is HDM-DB (CXL.mem + CXL.cache). HDM-DB requires a Write-Back Invalidation sequence before FLR to flush dirty cache lines; HDM-D (CXL.mem only) does not. QEMU uses this flag to schedule WBI and to report Back-Invalidation capability accurately in the virtual CXL topology. Mirrors the Cache_Capable bit from the CXL DVSEC Capability register. Signed-off-by: Manish Honap --- include/uapi/linux/vfio.h | 86 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ac2329f24141..fc07fc50b2e5 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -215,6 +215,16 @@ struct vfio_device_info { #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6) /* vfio-fsl-mc device */ #define VFIO_DEVICE_FLAGS_CAPS (1 << 7) /* Info supports caps */ #define VFIO_DEVICE_FLAGS_CDX (1 << 8) /* vfio-cdx device */ +/* + * Vendor-specific CXL device with CXL.mem capability (HDM-D or HDM-DB + * decoder, PCI class code !=3D PCI_CLASS_MEMORY_CXL). Covers CXL Type-2 + * accelerators and non-class-code Type-3 variants. When set, + * VFIO_DEVICE_FLAGS_PCI is also set (same device is a PCI device). The + * capability chain (VFIO_DEVICE_FLAGS_CAPS) contains VFIO_DEVICE_INFO_CAP= _CXL + * describing HDM decoders, region indices, decoder layout, and CXL-specif= ic + * options. + */ +#define VFIO_DEVICE_FLAGS_CXL (1 << 9) /* Device supports CXL */ __u32 num_regions; /* Max region index + 1 */ __u32 num_irqs; /* Max IRQ index + 1 */ __u32 cap_offset; /* Offset within info struct of first cap */ @@ -257,6 +267,70 @@ struct vfio_device_info_cap_pci_atomic_comp { __u32 reserved; }; =20 +/* + * VFIO_DEVICE_INFO_CAP_CXL - CXL Type-2 device capability + * + * Present in the device info capability chain when VFIO_DEVICE_FLAGS_CXL + * is set. Describes Host Managed Device Memory (HDM) layout and CXL + * memory options so that userspace (e.g. QEMU) can expose the CXL region + * and component registers correctly to the guest. + * + * The HDM decoder count and HDM decoder block offset within the COMP_REGS + * region are derivable from the COMP_REGS region itself. + * + * To find the HDM decoder block offset (hdm_decoder_offset), traverse the= CXL + * Capability Array starting at COMP_REGS region offset 0: + * - Dword 0 bits[31:24] (CXL_CM_CAP_HDR_ARRAY_SIZE_MASK): number of + * capability entries. + * - Each subsequent dword at offset (cap * 4): bits[15:0] =3D cap ID + * (CXL_CM_CAP_HDR_ID_MASK), bits[31:20] =3D byte offset from COMP_REGS + * start to that capability's register block (CXL_CM_CAP_PTR_MASK). + * - Locate the entry with cap ID =3D=3D CXL_CM_CAP_CAP_ID_HDM (0x5); the + * extracted bits[31:20] value is directly the byte offset + * hdm_decoder_offset (no further scaling required). + * + * To find the HDM decoder count, pread the HDM Decoder Capability register + * at hdm_decoder_offset + CXL_HDM_DECODER_CAP_OFFSET within the + * COMP_REGS region; bits[3:0] (CXL_HDM_DECODER_COUNT_MASK) encode the cou= nt + * using the formula: count =3D (field =3D=3D 0) ? 1 : field * 2. + */ +#define VFIO_DEVICE_INFO_CAP_CXL 6 +struct vfio_device_info_cap_cxl { + struct vfio_info_cap_header header; + __u8 hdm_regs_bar_index; /* PCI BAR containing HDM registers */ + __u8 reserved[3]; + __u32 flags; +/* Decoder was committed by host firmware/BIOS */ +#define VFIO_CXL_CAP_FIRMWARE_COMMITTED (1 << 0) +/* + * Device implements an HDM-DB decoder (CXL.cache + CXL.mem). Reflects + * the Cache_Capable bit (bit 0) in the CXL DVSEC Capability register. + * + * When clear: HDM-D decoder (CXL.mem only, no CXL.cache). FLR does not + * require a Write-Back Invalidation (WBI) sequence; the device holds no + * coherent copies of host memory. + * + * When set: HDM-DB decoder (CXL 3.0+). The kernel driver does not + * perform Write-Back Invalidation (WBI) automatically. The VMM must + * issue a WBI sequence before asserting FLR to flush dirty device cache + * lines and prevent coherency violations, and should advertise + * Back-Invalidation support in the virtual CXL topology. + */ +#define VFIO_CXL_CAP_CACHE_CAPABLE (1 << 1) + /* + * Byte offset within the BAR to the CXL.mem register area start + * (=3D comp_reg_offset + CXL_CM_OFFSET). This is where the CXL + * Capability Array Header lives. + */ + __u64 hdm_regs_offset; + /* + * Region indices for the two CXL VFIO device regions. + * Avoids forcing userspace to scan all regions by type/subtype. + */ + __u32 dpa_region_index; /* VFIO_REGION_SUBTYPE_CXL */ + __u32 comp_regs_region_index; /* VFIO_REGION_SUBTYPE_CXL_COMP_REGS */ +}; + /** * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8, * struct vfio_region_info) @@ -370,6 +444,18 @@ struct vfio_region_info_cap_type { */ #define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1) =20 +/* 1e98 vendor PCI sub-types (CXL Consortium) */ +/* + * CXL memory region. Use with region type + * (PCI_VENDOR_ID_CXL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE). + * DPA memory region (fault+zap mmap) + */ +#define VFIO_REGION_SUBTYPE_CXL (1) +/* + * HDM decoder register emulation region (read/write only, no mmap). + */ +#define VFIO_REGION_SUBTYPE_CXL_COMP_REGS (2) + /* sub-types for VFIO_REGION_TYPE_GFX */ #define VFIO_REGION_SUBTYPE_GFX_EDID (1) =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from PH0PR06CU001.outbound.protection.outlook.com (mail-westus3azon11011015.outbound.protection.outlook.com [40.107.208.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2903144D688; Wed, 1 Apr 2026 14:41:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.208.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054465; cv=fail; b=SHQK9cInDVCeb6cmgMtWfWc9xoQ2kGqC5BlJ7DzOtbP4IGbBXrMpvvMSGoz6nQXjf/J4UearHbH6xY6XexYC9vdDBWbZgfnLgA/asjWm/ERB7P7liKkTQ3GeGYhSfJ1SHcL5Rl1MVwcL6iYErVCoGMKBpCXNrm8H0wifOrjU7hE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054465; c=relaxed/simple; bh=VKqaFgeBlaCo+WLtGCn4oOY5U6/kEFg0I/Kt+eJbkw4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UuUcUJBNgueEIo1oExGhuhe4mh1X+ycvJzIopiviZ6XS0ls5pBv7SYhI/Accd/PorYPxkVyjCNUJm6rluhftL2LIhPIJJ4x3hkk8F4M6N7WAr1bhyhJwDHUtClRgfjFnwePtDXbmumZWzgqWzN2knrz8HuHydD3TgwTX3F0uato= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Bh1zyiYj; arc=fail smtp.client-ip=40.107.208.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Bh1zyiYj" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=uxgYkcq25PJRLcqGw72ZSgD/krLk9dHqZ4ZYYPucA5QwVCOLwHfzbGukSarNXkvce3Se+OCAoOzEw0fJ9w0NsHmomoIqBv9d6EiEh9Zf80kwuyjxXlbqPbfBxrbjJT/qxl8DVu0eeoYJZo2kogGPotQW4o8FFT5sIMrMYuRvkQHaQRECGb04PMll0nRUpY+r6+k9OFh2jwWlZ0S6KxHLxCQLqIFgCncaJDAIs6HE7xePgyhrB6Yr420z58N5equIIjKtZ/nJmBGIynC+5hJnaqmesW1pERBXUQwYVsMdac8E/MuxVD+hMwOOvmrMwatLukVFeEWDamb5SFbMAFJZIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gcYfEXpIjYFMLSekJl2aYBilh6iV/wb11aY6IOFmOkE=; b=XnN3X9KKYJfKiHJQr/XvvNb38Q4oQjycv5LnjN2JN8iarS2JdBNlNX+np3/cVF8DM1i0HkMG1UhvDcReGobDkmZrC+ybOdZ4H2mdnAHx83NRbofqSOt7gKV8oS+qGGO+ar8mgUbloJkm1gw1fdwaqtaxjVGFaqvQ5Vd167J2dEoArJJBObGkinYOwTU4QTPsYC/lHpReRC2PFKtq26rctaBYdPWSha+TclK3taC2UKYEJdV+LMpQg3TfxDF6RntD8b9UyRuzJ8Mn/Ueh4dUcMQToGWVOt8gYVFI+/EXGTunWzznH6mB0fc/dxAgZ3m9Wzy3uIx6fP8KPr/X9eFY8/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gcYfEXpIjYFMLSekJl2aYBilh6iV/wb11aY6IOFmOkE=; b=Bh1zyiYjPYbMwhohbqAy7g4LGFi3Z+38bnqaLrTGa7qhfasmdH+PBhbSOKGbwIhByH3r+VsMFigUKg1hGaiy3zjVE3QhZcQYwiYwUl9J2g9JQ+/dkx1TTc+jM/4ZdgX7TWKfmp8UUYvgRhRe/2apq+uUHzh6vQACD6HlA3E/hU/tbXjlYZEOvFtWjBH02g58GksT1QxtZ0YwKWeGLzk9qAPusFwqdhc9eYIeInniW9MgkELgQylx232EHymWK3vmjh8DYVHjtpu/tCSgNlRFnPmK1xuZ5fel8/m/wMewzynPke+FwSfyOTweW3hkDIrjtGvZd7y+tIvq8nrU564GsA== Received: from SJ0PR05CA0146.namprd05.prod.outlook.com (2603:10b6:a03:33d::31) by MW3PR12MB4425.namprd12.prod.outlook.com (2603:10b6:303:5e::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.11; Wed, 1 Apr 2026 14:40:55 +0000 Received: from SJ5PEPF000001D2.namprd05.prod.outlook.com (2603:10b6:a03:33d:cafe::c5) by SJ0PR05CA0146.outlook.office365.com (2603:10b6:a03:33d::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9769.16 via Frontend Transport; Wed, 1 Apr 2026 14:40:55 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D2.mail.protection.outlook.com (10.167.242.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:40:55 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:31 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:24 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 07/20] vfio/pci: Add CXL state to vfio_pci_core_device Date: Wed, 1 Apr 2026 20:09:04 +0530 Message-ID: <20260401143917.108413-8-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D2:EE_|MW3PR12MB4425:EE_ X-MS-Office365-Filtering-Correlation-Id: 141c3495-2b6d-4c46-db5a-08de8ffcae5d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|376014|7416014|82310400026|1800799024|921020|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: dKv3G7VUKKp0kAMX2+P0RmwFKDfomqvPdImGiqQph6vANNDEeU1oQtxRGRgn31XH8fGaF0qsu2vncg8SkQJNydX3YfNGYRqOmFWss6+YoiqxtnFtrAk7CR/q37/Odt8BoN3ZVHwYFHIv2tn7UFEwJ//w4ViJK5GjnxYDEquk7OTjFWBVBk6YiwnsQ0AjQWJ0VzMQLJhrUwtCGyoDuDNnYlx3TrfVdxtHEilYyb4koXZKje1iR8Gum11lom+Ph+T7bVaATn2D/Wb8doLM042BMiP6aCEZnjR6GK6UUh+VDyEsiq+bH85PbVZSRA/gwXI20v2UshshJR2dow1ITWWrijV8OE8nvDU5kQp8E95jz94o/VZ2QZ/h6JC1b3UyBTBoeEDymHd8BOyT+WSyIJPdF71NtXGdojjxL/vQzzl3V3o+iEoCSbug0rFkEelJT4gWVt4axLfSCsIJ/qakVJEE2Sjxq59hWqTQw6rYNKRiHVvc6VHuTN/eiKTnVhFw89P/mrVib4x+RKHyafVF4Y+L+XTWrRxnza7qXWQgoxoxy4V8NtcEIr7K+sECmgs+q6/KRc2Ynhp3Z6cvCP/9K1sFCS24H6UAFXYxcUwdoK/9m8lRd0JGX7Er8YoRzPxmRiK7vLdHzK6AfB9/3XW/xJEbqhIQWBTp+X0fj4m0MdrdabrVf1kLsTwYzzXAseyl83aJIMCEOOiyiSZJTW7xxdLmNG3AJtp2wk9zQmBoltIocdI3EtTnBHp0Xiyv34bLpHO8nWvOe5dnFBUa8QOslULgaAtn5Xe/XmrSVKxRy/H5MY4wYETPdGP0C1sgOw+10wBO X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(36860700016)(376014)(7416014)(82310400026)(1800799024)(921020)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: KWuCyW8EzgnzeNtCR/tWy0j0xDqujKkzECcPCRU2QnMpgWfsMQF08j1p17yoyk03AC/XkBj0GM81X3xVV9i5ZsP7PgpRunvx+gndJqsF7hbb9d0NE43LUV3KM7x/XBzjlb2HWwXKz+9H8wdSBHqli4q8n69twU1v5J5wc3fQx2OeO1D67qqiy6NhvR/ipBdklcPAEilR9cIhNof6FN8na2swjae5CJAupRdUOD5OMDhNbpzTFu18pHbUd4fEElGd62dFGTTlWphWd4SzoMHteB5qkcAqo97xK3H0dOgP2HIhjaiftFzC4QDW29pJOHHPJUtRn2cE3FWEblIsg+P2zm6qxlgwRxHsB8/dzNVuncn9NimnHyrxJRwUGBOCYmvCbBAH2gztcTPBkHNhwn/CM9wvF3wsf3MQplIcj49H70khs55j/hI4ELF/3tFhTo6r X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:40:55.1694 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 141c3495-2b6d-4c46-db5a-08de8ffcae5d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D2.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4425 Content-Type: text/plain; charset="utf-8" From: Manish Honap Add CXL-specific state to vfio_pci_core_device structure to support CXL Type-2 device passthrough. The new vfio_pci_cxl_state structure embeds CXL core objects: - struct cxl_dev_state: CXL device state (from CXL core) - struct cxl_memdev: CXL memory device - struct cxl_region: CXL region object - Root and endpoint decoders Key design point: The CXL state pointer is NULL for non-CXL devices, allowing vfio-pci-core to handle both CXL and standard PCI devices with minimal overhead. This will follow the approach where vfio-pci-core itself gains CXL awareness, rather than requiring a separate variant driver. Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_priv.h | 28 ++++++++++++++++++++++++++++ include/linux/vfio_pci_core.h | 3 +++ 2 files changed, 31 insertions(+) create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_priv.h diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h new file mode 100644 index 000000000000..4cecc25db410 --- /dev/null +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Common infrastructure for CXL Type-2 device variant drivers + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#ifndef __LINUX_VFIO_CXL_PRIV_H +#define __LINUX_VFIO_CXL_PRIV_H + +#include +#include + +/* CXL device state embedded in vfio_pci_core_device */ +struct vfio_pci_cxl_state { + struct cxl_dev_state cxlds; + struct cxl_memdev *cxlmd; + struct cxl_root_decoder *cxlrd; + struct cxl_endpoint_decoder *cxled; + resource_size_t hdm_reg_offset; + size_t hdm_reg_size; + resource_size_t comp_reg_offset; + size_t comp_reg_size; + u8 hdm_count; + u8 comp_reg_bar; +}; + +#endif /* __LINUX_VFIO_CXL_PRIV_H */ diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 1ac86896875c..cd8ed98a82a3 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -30,6 +30,8 @@ struct vfio_pci_region; struct p2pdma_provider; struct dma_buf_phys_vec; struct dma_buf_attachment; +struct vfio_pci_cxl_state; + =20 struct vfio_pci_eventfd { struct eventfd_ctx *ctx; @@ -138,6 +140,7 @@ struct vfio_pci_core_device { struct mutex ioeventfds_lock; struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; + struct vfio_pci_cxl_state *cxl; struct list_head sriov_pfs_item; struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011031.outbound.protection.outlook.com [52.101.52.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11F20477991; Wed, 1 Apr 2026 14:41:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.31 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054514; cv=fail; b=RPPolVUoGL/m++iO2F4T4KRGwoFRoYCSkubwozR0WCDDqtYauiC1XqJyMnyPWOkLFlrxrTDeINeO483Cz9o86trIoMtcO3iPHxxGv0n3UbOAKZL8hXxs6y1aiV9AB4he37mRSMiorCNSPz4aS7JZTmPUJuMPWYVBItKqwo8RBao= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054514; c=relaxed/simple; bh=aVU8CdYnp9eKjsQtzfH1L7trCI9kB9PR7DMX1n4OcQ4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QUkkVFiJ7mtlt9V8p5TD3zs5hJF0sJmRn+fBMzmBCHv+dmoDZdM9Xn14WLK+pzenB/XKgLcJVZMrAy0rB3fSboQ8R8Yj3Y+m/ypWiGy+rnkiG+3HIxcLb/hZckWIviZUvtR7fAFvVE26rhel9eDnBHJaSIfGDjfF4xHW6h3ogII= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=GNLMdL06; arc=fail smtp.client-ip=52.101.52.31 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="GNLMdL06" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FIY6ZAJy5BrmQVnH4CWSpUmZ/bygNhk9R69DMgEE7RIDChdYmhYFeZd59UD5Qngj30KuHzkj4HKQfYRgVQN9uES3Bh9GNA8nhQw7wfJUQQ9aVJs17VZ94WGleNM4K8bcOHpqLBe86As66aV3UIXV3fgeT4OCNZjJ/NaCgDFOw1z8INGjOF50AxsNehFip6mO//7MC/IWlV/JbQ5l+liARJo6BFA8mCZxL0NjZlDvjCa2N294vc3FjpApVASOZWTi6nOBheDXdsGlhEToJPA6qnPbaLCKXjZ0GFh7oQAD3wuHjTbn+91qQ9nfJuaL7R4Qnt7G8qC18XVtLAhnxaZNmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NQhvKt+IvBINGq5ByeFR7CgVwADZ59Ngp7qTEaCK5EA=; b=FwNdewb9Y0T+877ozx5JE6JX3U9VlFtGroDacQsgJG6+kYN6Uo7hy9DIQY/28tzHA+7iu7nC+e7PXlXl9afoOqCIffDG2+raFWCz9HPEz9l+B1AQM/85c90ncl1uCDs7sYR4snngv3iUgsPZm5VmNddErn6NyV/qwu7jnbltQJ5kf0zEfqA2sEs4xvMwp3Fa9DSeoSNmOiz9hl6NM2q7UpV6yFtKyajk+kLA3GOG0xYRWXKyS/vL6HwF17/pAxsGjLG3sfShEN+O3Y//wS3KB2Txmx2ryNZ5ApHpboStJJRM6VjPA5x6AkWdy4gDJip3dnrB7W+/ews7nXqKK/kAPg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NQhvKt+IvBINGq5ByeFR7CgVwADZ59Ngp7qTEaCK5EA=; b=GNLMdL06pIBMmgCXP11+TZmw3QTSafhtjcLM0StR1S6NDMc7lTY8XTQadeIWHmzekDnXGbo/0ORlqyA0vfB2w4HtlUQw6PsifjoEKOf40DU+XPSzPnDv3ukvgDrNk1o3Fly1VFZeY7+mlimwmNu1G6Y9vC+HdE9AxwIxUjBGq2Bk7qfws5aK9sVpC+H7A8me53oz1NdN5zo64PD95hFOgHssb3+MZvLMJZru+/0EoYYFhXYgg3y+uWnWVq+Wb7Rzv8qCYuMlwdbv1jGaVxmstBYTPGwGGBQztOWmFQh8TYw9ZYUeTzmaP2VlV0ZHCky1c3CkiH+zDNFL37o+XbMyLA== Received: from SJ0PR05CA0082.namprd05.prod.outlook.com (2603:10b6:a03:332::27) by SJ5PPFDDE56F72B.namprd12.prod.outlook.com (2603:10b6:a0f:fc02::9a5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.18; Wed, 1 Apr 2026 14:41:12 +0000 Received: from SJ5PEPF000001D5.namprd05.prod.outlook.com (2603:10b6:a03:332:cafe::2e) by SJ0PR05CA0082.outlook.office365.com (2603:10b6:a03:332::27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9769.15 via Frontend Transport; Wed, 1 Apr 2026 14:41:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D5.mail.protection.outlook.com (10.167.242.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:01 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:38 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:31 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 08/20] vfio/pci: Add CONFIG_VFIO_CXL_CORE and stub CXL hooks Date: Wed, 1 Apr 2026 20:09:05 +0530 Message-ID: <20260401143917.108413-9-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D5:EE_|SJ5PPFDDE56F72B:EE_ X-MS-Office365-Filtering-Correlation-Id: 008f60bb-96f6-473e-0417-08de8ffcb235 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|7416014|376014|1800799024|36860700016|921020|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: CQjuVf9pc6eghZm2eJJ6q+jdNHdwbQ2W74HYyr4AiCaFsyhG/sE3swJB9CBqYpRWvLV9V6uv/i8lJoBDyeFs5lUnckHGqyfApo2XcesbqWuC7NHYlkt15zc3Gak9FygZXxzDsBxtP+cCZGAwpGznsG3ZFt42jtj+l8x6SFgywcQfO12DGVwgf6Ez/bYjuxqcpHGECCv6iXc1WlJzpjEkVfw6e1SCpTKqRJLXLgxzTL+nsvEYV2fJ6WYIVVt0AIkHI7YthYy+T3xpsl5Fv48L3gzBMKRMMlJlBvpcKM1l/i8vq3CvXtY0Q6IZrEo6D29UjfworRLvwleEr1mI2npNYDTLQ4ZFBQAcOowv7IsgEDyjOjlGPOZgkNcimz5H6coQ3pQQhdFujCkbZ+4P1LsmcgEv1QU6jkVO1jGfl1zlGuwx9BLLRXI7zgGnkNc9Ni0KXunPuZQJf4UvXDlzqu9KA5U3mPHjUFgVjNiymUeb77HJjIVMcoEdoO8VeRZb+4Zxlw7LPcEbmlDYb1unnjt2PMHCZc6w9TnjvQRSmTUz9oQSd3KReWgyMB3luzBQvuBnDNlqDUZiIyGP7aWzfUahJyLrzjAGO/PIpEtl9lOwp9ENbTl5jLCWGy8USJAu/J8yF2mmZrDcbhbIXRA7BZAimaNdcldNvuaWObsnhkTCcWupSNzLEnwyR2VTLs4m52nZLhfSQJmRMxqGsPQceFitkuff8uw8Sqjv8IsOhcZ2C0ATtRQAoaxZrFm2EygV8kywX5Izr/hlP/vmc387p1hn5Imrs3mb9nCdnygTEvq6NdrEfQ00uKCk1hOGZnkXh/ib X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(7416014)(376014)(1800799024)(36860700016)(921020)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: TNs6FrUbhzoQr5HeUIMxr9fkRLs+6Eh/GV2WCP4NJ4rRRLjvwhODHSXhorK4yO0uqmFi9I8ewFQltm0C5C2EWSyJ2iJy214cwLE54NPCDif0To8G0qPLNSlGX4xwDo27sDdiuvrQOz6jrHHsOewlNAhR4m6DJFs0jFUn7AQYGua7Vatj2jpP4v2k/+W3AwYxmSzyf139HvjgZgr2+3lAvRo4ij4zfkQnwqfwQA8GvRIhH/+0ouxlMGntXRzrZgX6t2koexCyLcKQnYp2v7C8p35K+iObW1s2GMRLpOC63Znxe9VxEMLG6xh7J3JHg7PitH3I97SK5zPkMVYW1fhVDTTadLGz+vf2NhH/AfwquHQEepVEWnQedFOCxN4ca0s1+QT96LoGxaJhcMDbeDcVgHTb+OviStodT95I2tYieS3C7kjhnO/SAFO4CW81zRIn X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:01.6239 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 008f60bb-96f6-473e-0417-08de8ffcb235 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D5.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ5PPFDDE56F72B Content-Type: text/plain; charset="utf-8" From: Manish Honap Introduce the Kconfig option CONFIG_VFIO_CXL_CORE and the necessary build rules to compile CXL.mem passthrough infrastructure for vendor-specific CXL devices into the vfio-pci-core module. The new option depends on VFIO_PCI_CORE, CXL_BUS and CXL_MEM. Wire up the detection and cleanup entry-point stubs in vfio_pci_core_register_device() and vfio_pci_core_unregister_device() so that subsequent patches can fill in the CXL-specific logic without touching the vfio-pci-core flow again. The vfio_cxl_core.c file added here is an empty skeleton; the actual CXL detection and initialisation code is introduced in the following patch to keep this build-system patch reviewable on its own. Signed-off-by: Manish Honap --- drivers/vfio/pci/Kconfig | 2 ++ drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/cxl/Kconfig | 9 ++++++ drivers/vfio/pci/cxl/vfio_cxl_core.c | 41 ++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_core.c | 4 +++ drivers/vfio/pci/vfio_pci_priv.h | 14 ++++++++++ 6 files changed, 71 insertions(+) create mode 100644 drivers/vfio/pci/cxl/Kconfig create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_core.c diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 1e82b44bda1a..b981a7c164ca 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -68,6 +68,8 @@ source "drivers/vfio/pci/virtio/Kconfig" =20 source "drivers/vfio/pci/nvgrace-gpu/Kconfig" =20 +source "drivers/vfio/pci/cxl/Kconfig" + source "drivers/vfio/pci/qat/Kconfig" =20 source "drivers/vfio/pci/xe/Kconfig" diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index e0a0757dd1d2..ecb0eacbc089 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only =20 vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o +vfio-pci-core-$(CONFIG_VFIO_CXL_CORE) +=3D cxl/vfio_cxl_core.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o diff --git a/drivers/vfio/pci/cxl/Kconfig b/drivers/vfio/pci/cxl/Kconfig new file mode 100644 index 000000000000..fad53300fecf --- /dev/null +++ b/drivers/vfio/pci/cxl/Kconfig @@ -0,0 +1,9 @@ +config VFIO_CXL_CORE + bool "VFIO CXL core" + depends on VFIO_PCI_CORE && CXL_BUS && CXL_MEM + help + Extends vfio-pci-core with CXL.mem passthrough for vendor-specific + CXL devices (CXL_DEVTYPE_DEVMEM) that implement HDM-D or HDM-DB + decoders without the standard CXL memory expander class code + (PCI_CLASS_MEMORY_CXL). Covers CXL Type-2 accelerators and + non-class-code Type-3 variants (e.g. compressed memory devices). diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c new file mode 100644 index 000000000000..d12afec82ecd --- /dev/null +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VFIO CXL Core - CXL.mem passthrough for vendor-specific CXL devices + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved + * + * This module extends vfio-pci-core to pass through CXL.mem regions for + * vendor-specific CXL devices (CXL_DEVTYPE_DEVMEM) that implement HDM-D or + * HDM-DB decoders but do not report the standard CXL memory expander class + * code (PCI_CLASS_MEMORY_CXL, 0x0502). This covers both CXL Type-2 + * accelerators (with CXL.cache) and non-class-code Type-3 variants (e.g. + * compressed memory devices) which cannot be paravirtualized by the host + * CXL subsystem and require direct DPA region access from the guest. + */ + +#include +#include +#include +#include + +#include "../vfio_pci_priv.h" +#include "vfio_cxl_priv.h" + +/** + * vfio_pci_cxl_detect_and_init - Detect and initialize a vendor-specific + * CXL.mem device + * @vdev: VFIO PCI device + * + * Called from vfio_pci_core_register_device(). Detects CXL DVSEC capabili= ty + * and initializes CXL features. On failure vdev->cxl remains NULL and the + * device operates as a standard PCI device. + */ +void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev) +{ +} + +void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev) +{ +} + +MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 3a11e6f450f7..b7364178e23d 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -2181,6 +2181,8 @@ int vfio_pci_core_register_device(struct vfio_pci_cor= e_device *vdev) if (ret) goto out_vf; =20 + vfio_pci_cxl_detect_and_init(vdev); + vfio_pci_probe_power_state(vdev); =20 /* @@ -2224,6 +2226,8 @@ void vfio_pci_core_unregister_device(struct vfio_pci_= core_device *vdev) vfio_pci_vf_uninit(vdev); vfio_pci_vga_uninit(vdev); =20 + vfio_pci_cxl_cleanup(vdev); + if (!disable_idle_d3) pm_runtime_get_noresume(&vdev->pdev->dev); =20 diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 27ac280f00b9..d7df5538dcde 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -133,4 +133,18 @@ static inline void vfio_pci_dma_buf_move(struct vfio_p= ci_core_device *vdev, } #endif =20 +#if IS_ENABLED(CONFIG_VFIO_CXL_CORE) + +void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev); +void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev); + +#else + +static inline void +vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev) { } +static inline void +vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev) { } + +#endif /* CONFIG_VFIO_CXL_CORE */ + #endif --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azon11012048.outbound.protection.outlook.com [52.101.48.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 521B4472784; Wed, 1 Apr 2026 14:41:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.48.48 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054483; cv=fail; b=UPHLOMDzNacRcJwtSZx1bGPn7dBmFW1JWe/yzFNqsBmrKfw1jxhaGzX0HtYbeYkkOJP+NmZ2QyZ1ZBG88w4UiLweMu9avf9KA0U03xVDN9uNXL5pRImVPqHLkMrfHAFiWlnPj0lwO6hW4+LPgtMBo0Cs0bUfvdjehu6AmpN8Vd0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054483; c=relaxed/simple; bh=Mj4FO4ivkPGj7Lsz7DO5JMcM1ZnLR6bQk05Jsuxi9kM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ByyH0qywrljptQfz/aivU0LFqD+RSNmB0x0Q4lfFum4BJo/QSbEx0rZhjoqZMCHgvrJQV8Sc6i6foPSfVAaimEn/piU5Jbg7dCq9tyhIqLG7sBxyUHOwlcBo1h/jzTYjnqg1NXuEFwtIKZIQoIE/UgGa+0veJiKfhP6EwQQEUjQ= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=fkVhJeM2; arc=fail smtp.client-ip=52.101.48.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="fkVhJeM2" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=VKD90oJPjjEoHNEBVlCBVkMtQ9xrKEGu+85IoDFH7ruzdQUdVJW9b2IPmscS29sN+Q1PXTYpkYqQAAgd2yiW9GUNN5HzXvRmEzyP3l3vzoZL97ZZGk4Oy3TO7rrLFMfnfyAOm/h13DP+mfCcdKc66ITAekFyqEulgg/HtGn3qxscKNirTHqMK9cXUiO0ddsU1ZLnytBfbgUuitLAaHjyqruLY9a9BaRUGgEyZgW6mSm0R0WrlPO7gS3LnvfjXWK2pcHzKgJwapIf1jd/NgtUMSwZDNnGbl3LT5+myHDvhrm/z0ypiZvtQuEkPqppU4F7uX8f0EagTQUjOFut4VpvZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oGnyw0zorclIAgHtXFJ9d+OGIdew+gBTMi71wHkp6aw=; b=in3g0cC9Xlfv95wL2L0HjozlkYrLOiUtWRkGYKUM8w3cUNzx31zuucvS4dXGnA26/kZqpUr/zymk2wO/rSw1SwSu7kKPn1W3BlHQIoDtY8XNQhpVvAql+385zEci5UL7UGU0tST0z/WvenIy9rYgJfUKTr5DCgRoHTrJCL/1dDKsXImwcpFlHYCbvOawmRhZzGAPGMcVEKoOGP1aIwZSNx7uJAKxWHuSqSJQaa0NhMOy40DKkih7hGUVTH2K3gESkEgdv3XwLWZT9npuWuERoMuVP8TlW9wZQUze9LWYCALAmHR+b7PRfhoao2taiwNIUk4DUNnRJ9YpiFKqPP46ow== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oGnyw0zorclIAgHtXFJ9d+OGIdew+gBTMi71wHkp6aw=; b=fkVhJeM24F/B6+UXLLShrGtB5qPIDV2RR7TXNDw3IkvBkmHPNLgkmp/r2IS8DZxXXrtQi8V0ii6V4bAiu4Z0Xa+d6zS08JFKNkynSoYVVKoAjnZrr9eFVIaip2tjbFUL4aaCMm/y2k6BECEFtE23nCa5Bo1ocFrSjLcGYR/uEYh7ByhCLQN9v3sIDhsSuS9yx3Wex4FBoR2e/NSptCg0JPQNPQzWa5X+gJBRNNLZN3ckfkJwWV/fsH3k8Snd4aXFlqLpYiy6LnsRcCB1yiGPEo3OQDZT+ZNH5nY7F+KCU6C6Kv0BIgBt9mXhbSvlzx5dvay/vWD9nc6Uwyj/4m4Zkw== Received: from SJ0PR05CA0125.namprd05.prod.outlook.com (2603:10b6:a03:33d::10) by DS0PR12MB7804.namprd12.prod.outlook.com (2603:10b6:8:142::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.18; Wed, 1 Apr 2026 14:41:08 +0000 Received: from SJ5PEPF000001D2.namprd05.prod.outlook.com (2603:10b6:a03:33d:cafe::95) by SJ0PR05CA0125.outlook.office365.com (2603:10b6:a03:33d::10) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9769.15 via Frontend Transport; Wed, 1 Apr 2026 14:41:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D2.mail.protection.outlook.com (10.167.242.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:08 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:46 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:38 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 09/20] vfio/cxl: Detect CXL DVSEC and probe HDM block Date: Wed, 1 Apr 2026 20:09:06 +0530 Message-ID: <20260401143917.108413-10-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D2:EE_|DS0PR12MB7804:EE_ X-MS-Office365-Filtering-Correlation-Id: 64a30470-a87e-4d1c-8676-08de8ffcb601 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|82310400026|1800799024|376014|7416014|921020|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: oa4pG2FAXZOKCDWWG0I02Ew2+joiE/fiCP/lsKnN2QYKudAFocEnimRLcKYXV8thlKNlf4DzhF4vUhUE0uvA4XgPIxw1AWWJDzrr+oILR1+ritHrsI7vDBaIbsY6Pqoa+KPHHBXz2OHXycfiS8hFRl//1AsEwz7TXCfixRIiGqvZsqvS8Y0FYlRS1bYQhNVfSPz9XCDoIfYO+mohKnQk5bmbRbBg2RKubrQSFJ2GyF351b0ZFKnyu65ZuyJ2UiMN6ENRaPKnQSGhOTVPxOMYy3pBs6CEwc/8XR+cW/BjoGOFzoXp96Rn58H2J9832B6zW/NtPo2gY8pG6DHcmQGNzn+mczcFtfihlteUlb8rlSPTfFx1h/RMa8KOgYKHzsGIwrRMwuI63dYep0OaQpSkv1fWwazafKrClRzpnibh+oEaGzD3tLQGo+uEwZcL6MMZ/G6vwRt5gy1p/JObNj3pveueukA5IAaGX9GLp0V7yAQpUTgx22p8hJqxkfiVSHXuB6/S0M9SlYf2oh8J8dVWkrSKduDdhUSXni5FuyL9l2bsSQcTZbW5WC9/tui+3/yCcZXpxHx5E/jiLXE/6GhIECZqjeXDhUSi6+StubMLXGM6tRzMs6eY0ksDHkqc4i0mQBao4DhNak3FXrkhLiEvjF0xbEyKwiU44l8UkNkpIcP5+aMfhzZprlgbbrAPa6i7THYCqoNdZ5UtIc7jh6h8fYQ+z//lLt0QCaCCcl7kx77j9AazOePO2LOEDMcft+5g4AMbwGEcsm+JgK8M7+OHcLHNFUoDKcV6xNaROsnFCUAdAQruI5Hb3JZE8YylHxIU X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(36860700016)(82310400026)(1800799024)(376014)(7416014)(921020)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: C1t/heU7pXLjBku48HUBVDYfNw8mU9kVdEj5VxfYk1DoWF3lwxYJNMp3I82y+NU9W6y+/RxIE7gSGFhMDHqu8H1B/RfhY+PG1HHHlXoXnFsVEDRMXD9iX3L+3CWuOOfzNQ5TPDTvtM0FXiYAeV2IZPf03wGN6len8055ZkBIUqZnJvs8NOp8K2S9LmwgDhePqnfsVnUKzLSB0WhP/VkVArqtv4Y7+L4iGtYFuHJY/0TDonq+HcxRULsPj2xVlIvd00HiTPLWVkGKzlFFVr8z7IBm3CsoN27Gb34+Dljlzs2V1Bmr7vEyGdUnkBnQvYJ+sP4P9lPENQZZS5+6Yd2gpQDG+m4f3AtqGFVwW+GcPZnpNoLUKGYyqfW/AjC2dgz0K8omoprmkEVQ+AHtQtHfu9qdiOunffRvOuKQOfIKMQeft1GnTuIGIfEFjgZkWGki X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:08.0011 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 64a30470-a87e-4d1c-8676-08de8ffcb601 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D2.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7804 Content-Type: text/plain; charset="utf-8" From: Manish Honap Detect a vendor-specific CXL device at vfio-pci bind time and probe its HDM decoder register block. vfio_cxl_create_device_state() allocates per-device state via devm and reads MEM_CAPABLE and CACHE_CAPABLE from the CXL DVSEC. vfio_cxl_setup_regs() locates the component register block, temporarily maps it, calls cxl_probe_component_regs() to find the HDM block, then releases the mapping. vfio_pci_cxl_detect_and_init() chains these two steps. If either fails, vdev->cxl stays NULL and the device falls back to plain vfio-pci. Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 217 +++++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_priv.h | 12 ++ 2 files changed, 229 insertions(+) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index d12afec82ecd..b1c7603590b5 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -21,6 +21,158 @@ #include "../vfio_pci_priv.h" #include "vfio_cxl_priv.h" =20 +/* + * vfio_cxl_create_device_state - Allocate and validate CXL device state + * + * Returns a pointer to the allocated vfio_pci_cxl_state on success, or + * ERR_PTR on failure. The allocation uses devm; the caller must call + * devm_kfree(&pdev->dev, cxl) on any subsequent setup failure to release + * the resource before device unbind. Using devm_kfree() to undo a devm + * allocation early is explicitly supported by the devres API. + * + * The caller assigns vdev->cxl only after all setup steps succeed, preven= ting + * partially-initialised state from being visible through vdev->cxl on any + * failure path. + */ +static struct vfio_pci_cxl_state * +vfio_cxl_create_device_state(struct pci_dev *pdev, u16 dvsec) +{ + struct vfio_pci_cxl_state *cxl; + u16 cap_word; + u32 hdr1; + + /* Freed automatically when pdev->dev is released. */ + cxl =3D devm_cxl_dev_state_create(&pdev->dev, + CXL_DEVTYPE_DEVMEM, + pdev->dev.id, dvsec, + struct vfio_pci_cxl_state, + cxlds, false); + if (!cxl) + return ERR_PTR(-ENOMEM); + + pci_read_config_dword(pdev, dvsec + PCI_DVSEC_HEADER1, &hdr1); + cxl->dvsec_len =3D PCI_DVSEC_HEADER1_LEN(hdr1); + + pci_read_config_word(pdev, dvsec + CXL_DVSEC_CAPABILITY_OFFSET, + &cap_word); + + /* + * Only handle vendor devices (class !=3D 0x0502) with Mem_Capable set. + * CACHE_CAPABLE is forwarded to the VMM so it knows whether a WBI + * sequence is needed before FLR. + */ + if (!FIELD_GET(CXL_DVSEC_MEM_CAPABLE, cap_word) || + (pdev->class >> 8) =3D=3D PCI_CLASS_MEMORY_CXL) { + devm_kfree(&pdev->dev, cxl); + return ERR_PTR(-ENODEV); + } + + cxl->cache_capable =3D FIELD_GET(CXL_DVSEC_CACHE_CAPABLE, cap_word); + + return cxl; +} + +static int vfio_cxl_setup_regs(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl) +{ + struct cxl_register_map *map =3D &cxl->cxlds.reg_map; + resource_size_t offset, bar_offset, size; + struct pci_dev *pdev =3D vdev->pdev; + void __iomem *base; + int ret; + u8 count; + u8 bar; + + if (WARN_ON_ONCE(!pci_is_enabled(pdev))) + return -EINVAL; + + /* Find component register block via Register Locator DVSEC */ + ret =3D cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, map); + if (ret) + return ret; + + /* + * Request the region and map. This is a transient mapping + * used only to probe register capabilities; released immediately + * after cxl_probe_component_regs() returns. + */ + if (!request_mem_region(map->resource, map->max_size, "vfio-cxl-probe")) + return -EBUSY; + + base =3D ioremap(map->resource, map->max_size); + if (!base) { + ret =3D -ENOMEM; + goto failed_release; + } + + /* Probe component register capabilities */ + cxl_probe_component_regs(&pdev->dev, base, &map->component_map); + + /* Check if HDM decoder was found */ + if (!map->component_map.hdm_decoder.valid) { + ret =3D -ENODEV; + goto failed_unmap; + } + + pci_dbg(pdev, "vfio_cxl: HDM decoder at offset=3D0x%lx, size=3D0x%lx\n", + map->component_map.hdm_decoder.offset, + map->component_map.hdm_decoder.size); + + /* Get HDM register info */ + ret =3D cxl_get_hdm_info(&cxl->cxlds, &count, &offset, &size); + if (ret) + goto failed_unmap; + + if (!count || !size) { + ret =3D -ENODEV; + goto failed_unmap; + } + + cxl->hdm_count =3D count; + /* + * cxl_get_hdm_info() returns rmap->offset =3D CXL_CM_OFFSET + + * (see cxl_probe_component_regs() which does base +=3D CXL_CM_OFFSET bef= ore + * reading caps and stores CXL_CM_OFFSET + cap_ptr as the offset). + * Subtract CXL_CM_OFFSET so hdm_reg_offset is relative to the CXL.mem + * register area start, which is where comp_reg_virt[0] is anchored. + * The physical BAR address for hdm_iobase is recovered by adding + * CXL_CM_OFFSET back in vfio_cxl_setup_virt_regs(). + */ + cxl->hdm_reg_offset =3D offset - CXL_CM_OFFSET; + cxl->hdm_reg_size =3D size; + + ret =3D cxl_regblock_get_bar_info(map, &bar, &bar_offset); + if (ret) + goto failed_unmap; + + cxl->comp_reg_bar =3D bar; + cxl->comp_reg_offset =3D bar_offset; + cxl->comp_reg_size =3D CXL_COMPONENT_REG_BLOCK_SIZE; + + iounmap(base); + release_mem_region(map->resource, map->max_size); + + return 0; + +failed_unmap: + iounmap(base); +failed_release: + release_mem_region(map->resource, map->max_size); + + return ret; +} + +/* + * Free CXL state early on probe failure. devm_kfree() on a live devres + * allocation removes it from the list immediately, so the normal devres + * teardown at unbind time won't double-free it. + */ +static void vfio_cxl_dev_state_free(struct pci_dev *pdev, + struct vfio_pci_cxl_state *cxl) +{ + devm_kfree(&pdev->dev, cxl); +} + /** * vfio_pci_cxl_detect_and_init - Detect and initialize a vendor-specific * CXL.mem device @@ -32,10 +184,75 @@ */ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev) { + struct pci_dev *pdev =3D vdev->pdev; + struct vfio_pci_cxl_state *cxl; + u16 dvsec; + int ret; + + if (!pcie_is_cxl(pdev)) + return; + + dvsec =3D pci_find_dvsec_capability(pdev, + PCI_VENDOR_ID_CXL, + PCI_DVSEC_CXL_DEVICE); + if (!dvsec) + return; + + /* + * CXL DVSEC found: any failure from here is a hard probe error on + * a confirmed CXL-capable device, not a silent non-CXL fallback. + * Warn the operator so misconfiguration is visible. + */ + cxl =3D vfio_cxl_create_device_state(pdev, dvsec); + if (IS_ERR(cxl)) { + if (PTR_ERR(cxl) !=3D -ENODEV) + pci_warn(pdev, + "vfio-cxl: CXL device state allocation failed: %ld\n", + PTR_ERR(cxl)); + return; + } + + /* + * Required for ioremap of the component register block and + * calls to cxl_probe_component_regs(). + */ + ret =3D pci_enable_device_mem(pdev); + if (ret) { + pci_warn(pdev, + "vfio-cxl: pci_enable_device_mem failed: %d\n", ret); + goto free_cxl; + } + + ret =3D vfio_cxl_setup_regs(vdev, cxl); + if (ret) { + pci_warn(pdev, + "vfio-cxl: HDM register probing failed: %d\n", ret); + pci_disable_device(pdev); + goto free_cxl; + } + + pci_disable_device(pdev); + + /* + * Register probing succeeded. Assign vdev->cxl now so that + * all subsequent helpers can access state via vdev->cxl. + * All failure paths below clear vdev->cxl before calling + * vfio_cxl_dev_state_free(). + */ + vdev->cxl =3D cxl; + + return; + +free_cxl: + vfio_cxl_dev_state_free(pdev, cxl); } =20 void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev) { + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + + if (!cxl) + return; } =20 MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 4cecc25db410..54b1f6d885aa 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -21,8 +21,20 @@ struct vfio_pci_cxl_state { size_t hdm_reg_size; resource_size_t comp_reg_offset; size_t comp_reg_size; + u16 dvsec_len; u8 hdm_count; u8 comp_reg_bar; + bool cache_capable; }; =20 +/* + * CXL DVSEC for CXL Devices - register offsets within the DVSEC + * (CXL 4.0 8.1.3). + * Offsets are relative to the DVSEC capability base (cxl->dvsec). + */ +#define CXL_DVSEC_CAPABILITY_OFFSET 0xa +#define CXL_DVSEC_MEM_CAPABLE BIT(2) +/* CXL DVSEC Capability register bit 0: device supports CXL.cache (HDM-DB)= */ +#define CXL_DVSEC_CACHE_CAPABLE BIT(0) + #endif /* __LINUX_VFIO_CXL_PRIV_H */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from CH5PR02CU005.outbound.protection.outlook.com (mail-northcentralusazon11012046.outbound.protection.outlook.com [40.107.200.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BFC447A0D8; Wed, 1 Apr 2026 14:41:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.200.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054489; cv=fail; b=nUQiC+s+yzu4e6QtpyQZW1dW0bsuLnmF0AHIpQgKcO7da+xmzXyO1mMrcgHZk0uTWiGSrvx4fFEZfhPV4oWjogqmSSDo9J0CfXWohUbrKMoVRo2gHe49e0krJbb3U/Gcw0vC4HQYmbVACceqfXhjjDl3EDsBuWO9Sogdl8S9vqg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054489; c=relaxed/simple; bh=Lw0uVcR9spD9eS9Hf3OZfuiyknEVXzWTHirujhvnLmE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VXiGwfM0Z/+vkilp7adKKa4QaaurWy+8Q9Zqg8nD+fZ5eWqUMt4baPCEk4JZlAAz5EYsUhgKQGRf5S0hrjj6IuTZI3QrTpMyfwPJHy43IRW7og4lmAeyu1+3vmUWwC5msKk2vortYpXPagCgCs7P8Ho4PfM7GEaAkXj6+n/IFmU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=kUvJXZMr; arc=fail smtp.client-ip=40.107.200.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="kUvJXZMr" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=V723mYaHGpn4ThMeHsiPJTdj0Tbw75UpJvn0f9LQK315NKCUkci18vZ05VX/xl1be0G4RTBoyzZlqRqjMSxUXHqhwT77ucqHSKJv5glDkjUJXhF29BgxLaE9yTePq9snUpJloIFTTC27htZql6MrSp51pZ33ftOWwITEvGX+UcVNLCIlWEP3O4NTTsK2gos4bzi68vlJWers7BI7GDxXmKIJPn5OPF4tE5BvJ02gbvQ2bDnSpStXz7fcUOMweTHZVszlpp532awLhQHKA1TrUhZJfelTd5gUoUMBhaCoGrtg8sRr2mI0cwuDpljFEGioWta2CHhVd8eUbA8XtfFP7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eybHKM7jD3vuFPvkt9PjkwP5OvjxsGzWrZD5j/MYHw0=; b=euofFPcs07Y34YrenRojzy797oGBNHg8LPxZ10ymL71T9mZTvlVB11EPuP2QBuGfHA0nkqsUErjRkSb8ksULNKzpUffC7YdEtNEUQIJQ6PPVnZW0uNmvXoGaS78oeIXzW/wMBQbdYK/Wy0asI1MKujkRjBB7MGnQrew/OUTHDUqfEy1f92CIErwtylNsYO/gimLTVAJ7X0fCtz3UyeIpBBmlQboMQC0wK7FjrWj1YKY9tmxk9DXbA09EliWdhFsD6v1bJIwdDuUbA5xjiNZKdQa25UQAIEdd4c3gJZEEIFHGN7yGXYA7Ja+IXGc7RH4NreLuuJ0N01oti9gqOIwDKA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eybHKM7jD3vuFPvkt9PjkwP5OvjxsGzWrZD5j/MYHw0=; b=kUvJXZMrvU+NJaxT65wWaWUSmYZPU64M6uSDqTz9bDJD7aLq7xh58feSzd4UuvJs/4xwwK6cBXQ59R6lutobLqbdwTcBDzdY5aZFqGHG6yN7W728rZ4vXSDjUoedVjL3HCOrIOKrDZkwfx/1Nd0IT+PetD6koDiqaJVMZirOuhIOO37PjB+sNFhjtjHwrMlDbXr3WqAJREK5AVvI99F3RrA2CKpFYRHNHzYbN+1ds/M/JZ9i2ex9WN0ZZj7L/9BopnoFjpTZOjjvPpEoK+n0gAohKV/JAhJNTwQtq9d6Ll4gLlEAKYRJS3kVfVDOj/jVExGhkVUYWiyRr6cZA050bA== Received: from MN0P222CA0007.NAMP222.PROD.OUTLOOK.COM (2603:10b6:208:531::15) by DS2PR12MB9638.namprd12.prod.outlook.com (2603:10b6:8:27b::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:41:14 +0000 Received: from BN1PEPF00005FFE.namprd05.prod.outlook.com (2603:10b6:208:531:cafe::72) by MN0P222CA0007.outlook.office365.com (2603:10b6:208:531::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.31 via Frontend Transport; Wed, 1 Apr 2026 14:41:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00005FFE.mail.protection.outlook.com (10.167.243.230) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:13 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:53 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:46 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 10/20] vfio/pci: Export config access helpers Date: Wed, 1 Apr 2026 20:09:07 +0530 Message-ID: <20260401143917.108413-11-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00005FFE:EE_|DS2PR12MB9638:EE_ X-MS-Office365-Filtering-Correlation-Id: eb3b081a-2600-4f3f-2b0c-08de8ffcb963 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|82310400026|36860700016|921020|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: RNZGaHUoGAM3hCrP0nta9/ANtTGTeMTZktHV4NIUDdtHRzcrzw6grKkmob1tAuNkM4hmPgTGfYEeJs2mld9kILX+Jw8gzjgpo+nWrTJizkaO8b3mAthQCVH8NJDqImN23g2xnDDVfzEaHeyox2N1vRQq8Oxwgedc9UEXfZClRSAY2nLPJcSw8Ix8DdiEnIh6oJ0br8NQMef8sUcR0HvH91rynLeZ4fMLOKxE0xuwoKUYEKlqW/7PFLL6jEUNdrDpjmacSG+NU1N6vADcDRH9r8coThO+qE1e+z+DlbJX7gc1794111FmEvydPuwsGKpRQwp9XLpbxpvKSp4B9w4hC4f0hS9B7xO6hhV/RLh56u43Yk0xk+iTc/Ph+FsSm0wM1hq4chqOQYyHvo4OOoG2J1b1WYJUosq6c6BCj90F02UdJXH0rF03RvXQMrEYdSNgvud5G3E0CH6hkb3m576dRTu+Bt5KJSMZjlpvA15C6aJ+EAOCimmVQF42J6mzVSkOe/5rg3TBra2luoVPO5UBepICSvTsVyYZSpO7kyArHPFFldqPL/OEC4AtzdXwZL27K1v5gkTArQB7oTX5aBlG3n6gUjXQ2hHkHwMoiZnMPTv1WJCmTs+I+U/+1Njkm4F24ULAuhGd3jVQGnAGe7pjpyS1R+3H+M3UBxuxjIYRHzrrn4mg5+0SLZe8CVAJvTWqgUvhPfGZjChC5iQMEzGD+K5gk+HVQ1JA51LhAU0KUuP+Qa6iDycVneLniNmcpE8JaovG30BQobwdQ+RumatRV2MGgsjNdPty4/bhl/WGo21iAhSHzBo+O9GGQXxH/biT X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(82310400026)(36860700016)(921020)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: qgcrKivPlhqN7ovLM4k5Pra9oW67D+nH6UsQmPCxCBCtSpjt7vUpHUq4rFuVqj3H0aTi5uefrZ79deDjtwEmMM/MPvpOIf1io4OJuoZHBB0ROb2g5qMxgt4TGXrUgwxvnk4b92pcc98pQ/1eLEbeCuHeFb53qJLBEiT/Vmj4+G76esVaw0Yq5mIzTCcIFNUAS8fefjeHUukruoGXqH7DcG6LssW6TVYs1e0f8m1HCRMKuPoiUzaLDfCTbCOAQtsj2ad2k/8dZDYP/6wgsAWbsyWqZEC+SA4/OxuX4xj+YsBKz5+jjpgh91QMg5rmnCbOL7YzQy+Ahoyd2f3F612gzRpadwe7OCZwae/0AbgmwLlC/QC2HlIUSrH4AbvqmMUmlIgG7NYiEOZWdIOBBJMqhiDeiNGOYnhknYrHhQ9ejf8PdXbqGOyFlPlOCRF9aHdG X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:13.5875 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: eb3b081a-2600-4f3f-2b0c-08de8ffcb963 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00005FFE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS2PR12MB9638 Content-Type: text/plain; charset="utf-8" From: Manish Honap Promote vfio_raw_config_write() and vfio_raw_config_read() to non-static so that the CXL DVSEC write handler in the next patch can call them. Signed-off-by: Manish Honap --- drivers/vfio/pci/vfio_pci_config.c | 12 ++++++------ drivers/vfio/pci/vfio_pci_priv.h | 8 ++++++++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci= _config.c index dc4e510e6e1b..79aaf270adb2 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -270,9 +270,9 @@ static int vfio_direct_config_read(struct vfio_pci_core= _device *vdev, int pos, } =20 /* Raw access skips any kind of virtualization */ -static int vfio_raw_config_write(struct vfio_pci_core_device *vdev, int po= s, - int count, struct perm_bits *perm, - int offset, __le32 val) +int vfio_raw_config_write(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 val) { int ret; =20 @@ -283,9 +283,9 @@ static int vfio_raw_config_write(struct vfio_pci_core_d= evice *vdev, int pos, return count; } =20 -static int vfio_raw_config_read(struct vfio_pci_core_device *vdev, int pos, - int count, struct perm_bits *perm, - int offset, __le32 *val) +int vfio_raw_config_read(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 *val) { int ret; =20 diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index d7df5538dcde..1082ba43bafe 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -37,6 +37,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device = *vdev, uint32_t flags, ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user = *buf, size_t count, loff_t *ppos, bool iswrite); =20 +int vfio_raw_config_write(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 val); + +int vfio_raw_config_read(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 *val); + ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *bu= f, size_t count, loff_t *ppos, bool iswrite); =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from BYAPR05CU005.outbound.protection.outlook.com (mail-westusazon11010015.outbound.protection.outlook.com [52.101.85.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E90AC4779AA; Wed, 1 Apr 2026 14:41:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.85.15 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054498; cv=fail; b=oWqzBfQrx/8/13FNyCvfOSBbaJLKr0EJo3PSugWXCnLAFrrcDw91PQmufoKYkykYrYsuFdvprgc5U9ZdvhUiTQpCFPlhDxu7YZglZm7X3atWsx9wfy3Wy0oFtfVFgyQ3AoowbWmNAoMRbxD/npj9IuYCLpYZqAtQ5sg8EXU58Yk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054498; c=relaxed/simple; bh=y0LAzn6wmWEIPHqx8OHW05IXPw1vuAPcBEfCrxlLrOk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TN7kqKzeIUW+ocO8pxvF7dunboSxEFIxG/XCVyxhtgMOv2kkq6/0Lgn75CTM13npAnCWw4JHLnnKjM/1s1fB0iibsfb67Uj4RwArVHAKIdZkca6yiiJMqK/9hb+P2OGfdiPD3Rcr9Rnxr3pm9aRMoR151zqfgDLnDV7dD0RVINk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=gRH/Fklg; arc=fail smtp.client-ip=52.101.85.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="gRH/Fklg" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DQ6Sp0/LgKsvI7JCU78iTDkpLKNkpvq5xJby/l3Fl9joUyLIpX29nL7DhxluNTIu/zFh8fPduMerkSWWZYBjmVuFzw33FAwR7MRE37wOPHokMpeY88h0Q9XY5vZXJcGtM3GBHA9GsKZgHPHDU99n/66zra11Co+byXEpeVxdHPeVe/8egc+Xvo0PtJHheeR3oem2YQW+2L6HXUY3fw3LpOPnTQ+FSD9P+uRmJaI5Vka7YLzCEGlnwsem9hhyl3k+UBQsWcuWTH6bpJ8kRn7UYCLpZS1kVMcBzLxxnmPk44nKhUiJGykF53cCfIxEZy+So+qlHjwEyUfDd3pPH1kiLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FaDxrt6cd4fGPeUAxMAqZoGE4LyafNrpCYF347VFJlA=; b=QcW/HkqVzo04u8SUQuaLnYJNCaL3DvqZ+raWUFhQtmPkfzDn/EP/bX/yKjRD34RsNNv8e9jZHKfNKAq+soh+ZGMXh1Ofl4Xfx4A3d4NKrNpWiDFhdJi0CQ9SDgP/wKFB5e8fzylpp9AwCB61rz7NBBsXn47iPNCnc9ZQqb3pifAGEnIcQu42gWMtFbtA4LsxsRdpUz10ecQ9qZWKI8xKstNgMX7945C2iOtoO2Wezk4N8iZFeFoqh6yPNx9sK0WATeTnGcQWThyAkwRTlIExx3FT5xcFNTDBc/XNg7b+jDKTtwP4gczwLAeVuxfgRvTVXFjjCeJmyBWPrEwkU2Kynw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FaDxrt6cd4fGPeUAxMAqZoGE4LyafNrpCYF347VFJlA=; b=gRH/Fklgp14SiaiAIcH57XhKlWzi1lzgghvfMArlnia2kz35qyYa05TvkmpLSje3DxanJmo1hfGqD9jS2wl9S9+i/SW+w5ZfnyRTEv1ZygQxpFUG++Ysepfgj7QFTuuZufZuesTkIZ87e6p0sDoXH7V3LPGhqrf1kbCBnwSkAs7OFj9riVsweP9IAz+Log6e0aFGUt9VhNNsCXjDBcqxU/8qoqrdOguyru1UVf3R16ehgpP9oBfBiQamWv7wwH7bILxMT7HoJk8ML4VX77F2Twp+1yk5UG2lg1URM78bfcUi4xvcM0expFY9hckEw5lXlELx/TNXn1dhL6zmY8OLqg== Received: from BN9PR03CA0258.namprd03.prod.outlook.com (2603:10b6:408:ff::23) by DS0PR12MB8041.namprd12.prod.outlook.com (2603:10b6:8:147::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:41:21 +0000 Received: from BN1PEPF00006003.namprd05.prod.outlook.com (2603:10b6:408:ff:cafe::c0) by BN9PR03CA0258.outlook.office365.com (2603:10b6:408:ff::23) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:41:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006003.mail.protection.outlook.com (10.167.243.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:21 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:00 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:40:53 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 11/20] vfio/cxl: Introduce HDM decoder register emulation framework Date: Wed, 1 Apr 2026 20:09:08 +0530 Message-ID: <20260401143917.108413-12-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006003:EE_|DS0PR12MB8041:EE_ X-MS-Office365-Filtering-Correlation-Id: 12ed4a35-dddb-4a06-79e9-08de8ffcbe1a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|376014|7416014|36860700016|921020|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: SFsjgg3sGCwiB6jk0cYRRqfjW+IcyMXQsXJC8ZJBRVxrgOLLaRu84vDesIFvVIsNah9G7TqoFd6hPXiEPIvoy+ZJ1dV4UvJRuXrGNhLuKXsH/Qu2+kwlVEhkLfpQsZu/x8VpV3MCcGHtch0GRMJQOdBTFm4BSX8pL+LmjqYzt/32pZf7Eyn5PQHHknSmX3p+65COmm7bY3av5cd2JJPwPz3BvLigkluj65yyfSGrM5/k6mXyBTtS3z1k+DPZhAJL/59u++rDjhGvEpU2rdDO/q8JIZ7XY9+y82vpjPV+YKlAwh8ZjIMDVLRkW1TpoLHI1hkmXvRn6R+J//KkGU7qyYEJX5YSOsW3YivNWO7+eCYPgJzobNBlkp2HEWFABBXCHBhy2P/KiE5x14oZpSqJsfZ9EL5h7Oyc7nylQtD6KG4IDXDgfR2RWwN1aizpQ8OKoq1XTLRyjcc+vIL3+ok4HitPy8flnrKYemgvzgtqudUkWV7ILfjIhldLxqj+Zi4Px24KstcwTIaMuJUt39iymE1DrDvIi9FaXEtKdVRYXraIaINZL6/6BxJV+ead2wZ3fwMqSzCSUEsljh/XWrqzqEhQAd0B/5IvYCcMbuwTLVqx7dQ0+nw9s0aBqSYn0klqeHBOlPWrxNuxfuG+EDm7U6GD7+8YlSfxpkBYSTObef2Qtr9ZGpPh7HGjAtCIQQlDHX3MFtKwRgyHbnlPEjXx5cXjmm9Tgcvi5eSi1amFAdoEaOYUloAPe986HI5hKTdUZhHlmYu24HP70zYIMPvcWIrUmlGb3lGtiD+glkZanMF5IvykoXFeGq0+60nLMZlY X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(376014)(7416014)(36860700016)(921020)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 8A19Ad2QYiDp+s8Ea2iXTKA/z522kDI+mB+Bk3T9y02d7Fb+9Ycbyaor+QJTC20MJzL2V/x+KvCzG1OM20Y9asb6n+bVAlHRQ47WUhrQZDJf2dDcurLIhnwCToKhT1BLktTqHZESFOylzwo20Kw5XbOW9jXGhuHOrcLDs5P/GDlwiJzH1b6IG+759lqngPm9j9kX9YOkvpB4RFFy1JXl7k7nUmSYkQudIrJTEDgoMD3AW1UQU8ynNHu+fqPelfw3TMMOq2dYi6oeLD8hnykhkt8AXYYh+tZXL7A02pBeEXO2CqHrRBbEJeullobDMvpZV1OicygSxLRX4UZ2uAHOzZin6gFPnNoArPSix5HJZkmKsKM07O6S3QXspOtvD/vgwoFcVJXHLNTDkbDYlhybEWkrcsJzxGePiA8cC04j8wOB/EZvksKHVMxrqig/uJc6 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:21.5221 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 12ed4a35-dddb-4a06-79e9-08de8ffcbe1a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006003.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8041 Content-Type: text/plain; charset="utf-8" From: Manish Honap Add HDM decoder register emulation for CXL devices assigned to a guest. New file vfio_cxl_emu.c allocates comp_reg_virt[] covering the full component register block (CXL_COMPONENT_REG_BLOCK_SIZE), snapshots it from MMIO after probe, and registers a VFIO device region (VFIO_REGION_SUBTYPE_CXL_COMP_REGS) with read/write ops but no mmap, so every access hits the emulated buffer and write dispatchers. vfio_cxl_setup_virt_regs() is called from the tail of vfio_cxl_setup_regs(); vfio_cxl_clean_virt_regs() runs on cleanup. HDM decoder register defines come from include/uapi/cxl/cxl_regs.h. Bits with no hardware equivalent stay in vfio_cxl_priv.h. hdm_decoder_n_ctrl_write() allows the guest to clear the LOCK bit. A firmware-committed decoder arrives with LOCK=3D1; the guest driver must clear it before reprogramming BASE and SIZE with the VM's GPA. Such a write clears the bit in the shadow while preserving all other fields. Co-developed-by: Zhi Wang Signed-off-by: Zhi Wang Signed-off-by: Manish Honap --- drivers/vfio/pci/Makefile | 2 +- drivers/vfio/pci/cxl/vfio_cxl_core.c | 5 + drivers/vfio/pci/cxl/vfio_cxl_emu.c | 433 +++++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_priv.h | 47 +++ include/uapi/cxl/cxl_regs.h | 5 + 5 files changed, 491 insertions(+), 1 deletion(-) create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_emu.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index ecb0eacbc089..bef916495eae 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only =20 vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o -vfio-pci-core-$(CONFIG_VFIO_CXL_CORE) +=3D cxl/vfio_cxl_core.o +vfio-pci-core-$(CONFIG_VFIO_CXL_CORE) +=3D cxl/vfio_cxl_core.o cxl/vfio_cx= l_emu.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index b1c7603590b5..0b9e4419cd47 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -149,8 +149,11 @@ static int vfio_cxl_setup_regs(struct vfio_pci_core_de= vice *vdev, cxl->comp_reg_offset =3D bar_offset; cxl->comp_reg_size =3D CXL_COMPONENT_REG_BLOCK_SIZE; =20 + ret =3D vfio_cxl_setup_virt_regs(vdev, cxl, base); iounmap(base); release_mem_region(map->resource, map->max_size); + if (ret) + return ret; =20 return 0; =20 @@ -253,6 +256,8 @@ void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *= vdev) =20 if (!cxl) return; + + vfio_cxl_clean_virt_regs(cxl); } =20 MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_emu.c b/drivers/vfio/pci/cxl/vfi= o_cxl_emu.c new file mode 100644 index 000000000000..6fb02253e631 --- /dev/null +++ b/drivers/vfio/pci/cxl/vfio_cxl_emu.c @@ -0,0 +1,433 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include +#include + +#include "../vfio_pci_priv.h" +#include "vfio_cxl_priv.h" + +/* + * comp_reg_virt[] shadow layout: + * Covers the full CXL.mem register area (starting at CXL_CM_OFFSET + * within the component register block). Index 0 is the CXL Capability + * Array Header; the HDM decoder block starts at index + * hdm_reg_offset / sizeof(__le32). + * + * Register layout within the HDM block (CXL spec 4.0 8.2.4.20 CXL HDM Dec= oder + * Capability Structure): + * 0x00: HDM Decoder Capability + * 0x04: HDM Decoder Global Control + * 0x08: (reserved) + * 0x0c: (reserved) + * For each decoder N (N=3D0..hdm_count-1), at base 0x10 + N*0x20: + * +0x00: BASE_LO + * +0x04: BASE_HI + * +0x08: SIZE_LO + * +0x0c: SIZE_HI + * +0x10: CTRL + * +0x14: TARGET_LIST_LO + * +0x18: TARGET_LIST_HI + * +0x1c: (reserved) + */ + +static inline __le32 *hdm_reg_ptr(struct vfio_pci_cxl_state *cxl, u32 hdm_= off) +{ + /* + * hdm_off is a byte offset within the HDM decoder block. + * comp_reg_virt covers the CXL.mem register area starting at + * CXL_CM_OFFSET within the component register block. + * hdm_reg_offset is CXL.mem-relative, so adding hdm_reg_offset + * gives the correct index into comp_reg_virt[]. + */ + return &cxl->comp_reg_virt[(cxl->hdm_reg_offset + hdm_off) / + sizeof(__le32)]; +} + +static ssize_t virt_hdm_rev_reg_write(struct vfio_pci_core_device *vdev, + const __le32 *val32, u64 offset, u64 size) +{ + /* Discard writes on reserved registers. */ + return size; +} + +static ssize_t hdm_decoder_n_lo_write(struct vfio_pci_core_device *vdev, + const __le32 *val32, u64 offset, u64 size) +{ + u32 new_val =3D le32_to_cpu(*val32); + + if (WARN_ON_ONCE(size !=3D CXL_REG_SIZE_DWORD)) + return -EINVAL; + + /* Bits [27:0] are reserved. */ + new_val &=3D ~CXL_HDM_DECODER_BASE_LO_RESERVED_MASK; + + *hdm_reg_ptr(vdev->cxl, offset) =3D cpu_to_le32(new_val); + + return size; +} + +static ssize_t hdm_decoder_global_ctrl_write(struct vfio_pci_core_device *= vdev, + const __le32 *val32, u64 size) +{ + u32 hdm_gcap; + u32 new_val =3D le32_to_cpu(*val32); + + if (WARN_ON_ONCE(size !=3D CXL_REG_SIZE_DWORD)) + return -EINVAL; + + /* Bit [31:2] are reserved. */ + new_val &=3D ~CXL_HDM_DECODER_GLOBAL_CTRL_RESERVED_MASK; + + /* Poison On Decode Error Enable (bit 0) is RO=3D0 if not supported. */ + hdm_gcap =3D le32_to_cpu(*hdm_reg_ptr(vdev->cxl, + CXL_HDM_DECODER_CAP_OFFSET)); + if (!(hdm_gcap & CXL_HDM_DECODER_POISON_ON_DECODE_ERR)) + new_val &=3D ~CXL_HDM_DECODER_GLOBAL_CTRL_POISON_EN_BIT; + + *hdm_reg_ptr(vdev->cxl, CXL_HDM_DECODER_CTRL_OFFSET) =3D + cpu_to_le32(new_val); + + return size; +} + +/** + * hdm_decoder_n_ctrl_write - Write handler for HDM decoder CTRL register. + * @vdev: VFIO PCI core device + * @val32: New register value supplied by userspace (little-endian) + * @offset: Byte offset within the HDM block for this decoder's CTRL regis= ter + * @size: Access size in bytes; must equal CXL_REG_SIZE_DWORD + * + * The COMMIT bit (bit 9) is the key: setting it requests the hardware to + * lock the decoder. The emulated COMMITTED bit (bit 10) mirrors COMMIT + * immediately to allow QEMU's notify_change to detect the transition and + * map/unmap the DPA MemoryRegion in the guest address space. + * + * Note: the actual hardware HDM decoder programming (writing the real + * BASE/SIZE with host physical addresses) happens in the QEMU notify_chan= ge + * callback BEFORE this write reaches the hardware. This ordering is + * correct because vfio_region_write() calls notify_change() first. + * + * Return: @size on success, %-EINVAL if @size is not %CXL_REG_SIZE_DWORD. + */ +static ssize_t hdm_decoder_n_ctrl_write(struct vfio_pci_core_device *vdev, + const __le32 *val32, u64 offset, u64 size) +{ + u32 hdm_gcap; + u32 ro_mask =3D CXL_HDM_DECODER_CTRL_RO_BITS_MASK; + u32 rev_mask =3D CXL_HDM_DECODER_CTRL_RESERVED_MASK; + u32 new_val =3D le32_to_cpu(*val32); + u32 cur_val; + + if (WARN_ON_ONCE(size !=3D CXL_REG_SIZE_DWORD)) + return -EINVAL; + + cur_val =3D le32_to_cpu(*hdm_reg_ptr(vdev->cxl, offset)); + if (cur_val & CXL_HDM_DECODER0_CTRL_LOCK) { + if (new_val & CXL_HDM_DECODER0_CTRL_LOCK) + return size; + + /* LOCK_0 only: preserve all other bits, clear LOCK */ + *hdm_reg_ptr(vdev->cxl, offset) =3D cpu_to_le32( + cur_val & ~CXL_HDM_DECODER0_CTRL_LOCK); + return size; + } + + hdm_gcap =3D le32_to_cpu(*hdm_reg_ptr(vdev->cxl, + CXL_HDM_DECODER_CAP_OFFSET)); + ro_mask |=3D CXL_HDM_DECODER_CTRL_DEVICE_BITS_RO; + rev_mask |=3D CXL_HDM_DECODER_CTRL_DEVICE_RESERVED; + + if (!(hdm_gcap & CXL_HDM_DECODER_UIO_CAPABLE)) + rev_mask |=3D CXL_HDM_DECODER_CTRL_UIO_RESERVED; + + new_val &=3D ~rev_mask; + cur_val &=3D ro_mask; + new_val =3D (new_val & ~ro_mask) | cur_val; + + /* + * Mirror COMMIT to COMMITTED immediately in the emulated state. + */ + if (new_val & CXL_HDM_DECODER0_CTRL_COMMIT) + new_val |=3D CXL_HDM_DECODER0_CTRL_COMMITTED; + else + new_val &=3D ~CXL_HDM_DECODER0_CTRL_COMMITTED; + + *hdm_reg_ptr(vdev->cxl, offset) =3D cpu_to_le32(new_val); + + return size; +} + +/* + * Dispatch table for COMP_REGS region writes. Indexed by byte offset with= in + * the HDM decoder block. Returns the appropriate write handler. + * + * Layout: + * 0x00 HDM Decoder Capability (RO) + * 0x04 HDM Global Control (RW with reserved masking) + * 0x08-0x0f (reserved) (ignored) + * Per decoder N, base =3D 0x10 + N*0x20: + * base+0x00 BASE_LO (RW, [27:0] reserved) + * base+0x04 BASE_HI (RW) + * base+0x08 SIZE_LO (RW, [27:0] reserved) + * base+0x0c SIZE_HI (RW) + * base+0x10 CTRL (RW, complex rules) + * base+0x14 TARGET_LIST_LO (ignored for Type-2) + * base+0x18 TARGET_LIST_HI (ignored for Type-2) + * base+0x1c (reserved) (ignored) + */ +static ssize_t comp_regs_dispatch_write(struct vfio_pci_core_device *vdev, + u32 off, const __le32 *val32, u32 size) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u32 dec_base, dec_off; + + /* HDM Decoder Capability (0x00): RO */ + if (off =3D=3D CXL_HDM_DECODER_CAP_OFFSET) + return size; + + /* HDM Global Control (0x04) */ + if (off =3D=3D CXL_HDM_DECODER_CTRL_OFFSET) + return hdm_decoder_global_ctrl_write(vdev, val32, size); + + /* + * Offsets 0x08-0x0f are reserved per CXL 4.0 Table 8-115. + * Per-decoder registers start at 0x10, stride 0x20 + */ + if (off < CXL_HDM_DECODER_FIRST_BLOCK_OFFSET) + return size; /* reserved gap */ + + dec_base =3D CXL_HDM_DECODER_FIRST_BLOCK_OFFSET; + /* + * Reject accesses beyond the last implemented HDM decoder. + * Without this check an out-of-bounds offset would silently + * corrupt comp_reg_virt[] memory past the end of the allocation. + */ + if ((off - dec_base) / CXL_HDM_DECODER_BLOCK_STRIDE >=3D cxl->hdm_count) + return size; + + dec_off =3D (off - dec_base) % CXL_HDM_DECODER_BLOCK_STRIDE; + + switch (dec_off) { + case CXL_HDM_DECODER_N_BASE_LOW_OFFSET: /* BASE_LO */ + case CXL_HDM_DECODER_N_SIZE_LOW_OFFSET: /* SIZE_LO */ + return hdm_decoder_n_lo_write(vdev, val32, off, size); + case CXL_HDM_DECODER_N_BASE_HIGH_OFFSET: /* BASE_HI */ + case CXL_HDM_DECODER_N_SIZE_HIGH_OFFSET: /* SIZE_HI */ + { + /* Full 32-bit write, no reserved bits; frozen when COMMIT_LOCK set */ + u32 ctrl_off =3D off - dec_off + CXL_HDM_DECODER_N_CTRL_OFFSET; + u32 ctrl =3D le32_to_cpu(*hdm_reg_ptr(cxl, ctrl_off)); + + if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) + return size; + *hdm_reg_ptr(cxl, off) =3D *val32; + return size; + } + case CXL_HDM_DECODER_N_CTRL_OFFSET: /* CTRL */ + return hdm_decoder_n_ctrl_write(vdev, val32, off, size); + case CXL_HDM_DECODER_N_TARGET_LIST_LOW_OFFSET: + case CXL_HDM_DECODER_N_TARGET_LIST_HIGH_OFFSET: + case CXL_HDM_DECODER_N_REV_OFFSET: + return virt_hdm_rev_reg_write(vdev, val32, off, size); + default: + return size; + } +} + +/* + * vfio_cxl_comp_regs_rw - regops rw handler for + * VFIO_REGION_SUBTYPE_CXL_COMP_REGS. + * + * Reads return the emulated HDM state (comp_reg_virt[]). + * Writes go through comp_regs_dispatch_write() for bit-field enforcement. + * Only 4-byte aligned 4-byte accesses are supported (hardware requirement= ). + */ +static ssize_t vfio_cxl_comp_regs_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + loff_t pos =3D *ppos & VFIO_PCI_OFFSET_MASK; + size_t done =3D 0; + + if (!count) + return 0; + + /* Clamp to total region size: cap array prefix + HDM block */ + if (pos >=3D cxl->hdm_reg_offset + cxl->hdm_reg_size) + return -EINVAL; + count =3D min(count, + (size_t)(cxl->hdm_reg_offset + cxl->hdm_reg_size - pos)); + + while (done < count) { + u32 sz =3D count - done; + u32 off =3D pos + done; + __le32 v; + + /* Enforce exactly 4-byte, 4-byte-aligned accesses */ + if (sz !=3D CXL_REG_SIZE_DWORD || (off & 0x3)) + return done ? (ssize_t)done : -EINVAL; + + if (iswrite) { + if (off < cxl->hdm_reg_offset) { + /* Cap array area is read-only; discard writes */ + done +=3D sizeof(v); + continue; + } + if (copy_from_user(&v, buf + done, sizeof(v))) + return done ? (ssize_t)done : -EFAULT; + comp_regs_dispatch_write(vdev, + off - cxl->hdm_reg_offset, + &v, sizeof(v)); + } else { + /* Read from extended buffer _ covers cap array and HDM */ + v =3D cxl->comp_reg_virt[off / sizeof(__le32)]; + if (copy_to_user(buf + done, &v, sizeof(v))) + return done ? (ssize_t)done : -EFAULT; + } + done +=3D sizeof(v); + } + + *ppos +=3D done; + return done; +} + +static void vfio_cxl_comp_regs_release(struct vfio_pci_core_device *vdev, + struct vfio_pci_region *region) +{ + /* comp_reg_virt is freed in vfio_cxl_clean_virt_regs() */ +} + +static const struct vfio_pci_regops vfio_cxl_comp_regs_ops =3D { + .rw =3D vfio_cxl_comp_regs_rw, + .release =3D vfio_cxl_comp_regs_release, +}; + +/* + * vfio_cxl_setup_virt_regs - Allocate emulated HDM register state. + * + * Allocates comp_reg_virt as a compact __le32 array covering only + * hdm_reg_size bytes of HDM decoder registers. The initial values + * are read from hardware via the BAR ioremap established by the caller. + * + * DVSEC state is accessed via vdev->vconfig (see the following patch). + */ +int vfio_cxl_setup_virt_regs(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl, + void __iomem *cap_base) +{ + size_t total_size, nregs, i; + + if (WARN_ON(!cxl->hdm_reg_size)) + return -EINVAL; + + total_size =3D cxl->hdm_reg_offset + cxl->hdm_reg_size; + + if (pci_resource_len(vdev->pdev, cxl->comp_reg_bar) < + cxl->comp_reg_offset + CXL_CM_OFFSET + total_size) + return -ENODEV; + + nregs =3D total_size / sizeof(__le32); + cxl->comp_reg_virt =3D kcalloc(nregs, sizeof(__le32), GFP_KERNEL); + if (!cxl->comp_reg_virt) + return -ENOMEM; + + /* + * Snapshot the CXL.mem register area from the caller's mapping. + * cap_base maps the component register block from comp_reg_offset. + * The CXL.mem registers start at CXL_CM_OFFSET (=3D 0x1000) within that + * block; reading from cap_base + CXL_CM_OFFSET ensures comp_reg_virt[0] + * holds the CXL Capability Array Header required by guest drivers. + */ + for (i =3D 0; i < nregs; i++) + cxl->comp_reg_virt[i] =3D + cpu_to_le32(readl(cap_base + CXL_CM_OFFSET + + i * sizeof(__le32))); + + /* + * Establish persistent mapping; kept alive until + * vfio_cxl_clean_virt_regs(). + */ + cxl->hdm_iobase =3D ioremap(pci_resource_start(vdev->pdev, + cxl->comp_reg_bar) + + cxl->comp_reg_offset + CXL_CM_OFFSET + + cxl->hdm_reg_offset, + cxl->hdm_reg_size); + if (!cxl->hdm_iobase) { + kfree(cxl->comp_reg_virt); + cxl->comp_reg_virt =3D NULL; + return -ENOMEM; + } + + return 0; +} + +/* + * Called with memory_lock write side held (from vfio_cxl_reactivate_regio= n). + * Uses the pre-established hdm_iobase, no ioremap() under the lock, + * which would deadlock on PREEMPT_RT where ioremap() can sleep. + */ +void vfio_cxl_reinit_comp_regs(struct vfio_pci_cxl_state *cxl) +{ + size_t i, nregs; + u32 n; + + if (!cxl || !cxl->comp_reg_virt || !cxl->hdm_iobase) + return; + + nregs =3D cxl->hdm_reg_size / sizeof(__le32); + + for (i =3D 0; i < nregs; i++) + *hdm_reg_ptr(cxl, i * sizeof(__le32)) =3D + cpu_to_le32(readl(cxl->hdm_iobase + + i * sizeof(__le32))); + + /* + * For firmware-committed decoders, clear COMMIT_LOCK (bit 8) and zero + * BASE in comp_reg_virt[] so QEMU can write the correct guest GPA via + * setup_locked_hdm() before guest DPA access begins. + * + * Check the COMMITTED bit (bit 10) directly from the freshly-snapshotted + * ctrl register rather than relying on cxl->precommitted. At probe time + * this function is called before cxl->precommitted is set (it is set + * after vfio_cxl_read_committed_decoder_size() succeeds), so using + * cxl->precommitted here would silently skip the LOCK clearing and leave + * the hardware HPA in comp_reg_virt[]. + */ + for (n =3D 0; n < cxl->hdm_count; n++) { + u32 ctrl_off =3D CXL_HDM_DECODER_FIRST_BLOCK_OFFSET + + n * CXL_HDM_DECODER_BLOCK_STRIDE + + CXL_HDM_DECODER_N_CTRL_OFFSET; + u32 base_lo_off =3D CXL_HDM_DECODER_FIRST_BLOCK_OFFSET + + n * CXL_HDM_DECODER_BLOCK_STRIDE + + CXL_HDM_DECODER_N_BASE_LOW_OFFSET; + u32 base_hi_off =3D CXL_HDM_DECODER_FIRST_BLOCK_OFFSET + + n * CXL_HDM_DECODER_BLOCK_STRIDE + + CXL_HDM_DECODER_N_BASE_HIGH_OFFSET; + u32 ctrl =3D le32_to_cpu(*hdm_reg_ptr(cxl, ctrl_off)); + + if (!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)) + continue; + + if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) { + *hdm_reg_ptr(cxl, ctrl_off) =3D + cpu_to_le32(ctrl & + ~CXL_HDM_DECODER0_CTRL_LOCK); + *hdm_reg_ptr(cxl, base_lo_off) =3D 0; + *hdm_reg_ptr(cxl, base_hi_off) =3D 0; + } + } +} + +void vfio_cxl_clean_virt_regs(struct vfio_pci_cxl_state *cxl) +{ + if (cxl->hdm_iobase) { + iounmap(cxl->hdm_iobase); + cxl->hdm_iobase =3D NULL; + } + kfree(cxl->comp_reg_virt); + cxl->comp_reg_virt =3D NULL; +} diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 54b1f6d885aa..463a55062144 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -21,12 +21,53 @@ struct vfio_pci_cxl_state { size_t hdm_reg_size; resource_size_t comp_reg_offset; size_t comp_reg_size; + __le32 *comp_reg_virt; + void __iomem *hdm_iobase; u16 dvsec_len; u8 hdm_count; u8 comp_reg_bar; bool cache_capable; }; =20 +/* Register access sizes */ +#define CXL_REG_SIZE_WORD 2 +#define CXL_REG_SIZE_DWORD 4 + +/* HDM Decoder - register offsets (CXL 4.0 Table 8-115) */ +#define CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET 0x4 +#define CXL_HDM_DECODER_FIRST_BLOCK_OFFSET 0x10 +#define CXL_HDM_DECODER_BLOCK_STRIDE 0x20 +#define CXL_HDM_DECODER_N_BASE_LOW_OFFSET 0x0 +#define CXL_HDM_DECODER_N_BASE_HIGH_OFFSET 0x4 +#define CXL_HDM_DECODER_N_SIZE_LOW_OFFSET 0x8 +#define CXL_HDM_DECODER_N_SIZE_HIGH_OFFSET 0xc +#define CXL_HDM_DECODER_N_CTRL_OFFSET 0x10 +#define CXL_HDM_DECODER_N_TARGET_LIST_LOW_OFFSET 0x14 +#define CXL_HDM_DECODER_N_TARGET_LIST_HIGH_OFFSET 0x18 +#define CXL_HDM_DECODER_N_REV_OFFSET 0x1c + +/* + * HDM Decoder N Control emulation masks. + * + * Single-bit hardware definitions are in as + * CXL_HDM_DECODER0_CTRL_* (bits 0-14) and CXL_HDM_DECODER_*_CAP. + * The masks below express emulation policy for a CXL.mem device. + */ +#define CXL_HDM_DECODER_CTRL_RO_BITS_MASK (BIT(10) | BIT(11)) +#define CXL_HDM_DECODER_CTRL_RESERVED_MASK (BIT(15) | GENMASK(31, 28)) +#define CXL_HDM_DECODER_CTRL_DEVICE_BITS_RO BIT(12) +#define CXL_HDM_DECODER_CTRL_DEVICE_RESERVED (GENMASK(19, 16) | GENMASK(23= , 20)) +#define CXL_HDM_DECODER_CTRL_UIO_RESERVED (BIT(14) | GENMASK(27, 24)) +/* + * bit 13 (BI) is RsvdP for devices without CXL.cache (Cache_Capable=3D0). + * HDM-D (CXL.mem only) decoders must not have BI set by the guest. + */ +#define CXL_HDM_DECODER_CTRL_BI_RESERVED BIT(13) +#define CXL_HDM_DECODER_BASE_LO_RESERVED_MASK GENMASK(27, 0) + +#define CXL_HDM_DECODER_GLOBAL_CTRL_RESERVED_MASK GENMASK(31, 2) +#define CXL_HDM_DECODER_GLOBAL_CTRL_POISON_EN_BIT BIT(0) + /* * CXL DVSEC for CXL Devices - register offsets within the DVSEC * (CXL 4.0 8.1.3). @@ -37,4 +78,10 @@ struct vfio_pci_cxl_state { /* CXL DVSEC Capability register bit 0: device supports CXL.cache (HDM-DB)= */ #define CXL_DVSEC_CACHE_CAPABLE BIT(0) =20 +int vfio_cxl_setup_virt_regs(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl, + void __iomem *cap_base); +void vfio_cxl_clean_virt_regs(struct vfio_pci_cxl_state *cxl); +void vfio_cxl_reinit_comp_regs(struct vfio_pci_cxl_state *cxl); + #endif /* __LINUX_VFIO_CXL_PRIV_H */ diff --git a/include/uapi/cxl/cxl_regs.h b/include/uapi/cxl/cxl_regs.h index 1a48a3805f52..b6fcae91d216 100644 --- a/include/uapi/cxl/cxl_regs.h +++ b/include/uapi/cxl/cxl_regs.h @@ -33,8 +33,13 @@ #define CXL_HDM_DECODER_TARGET_COUNT_MASK __GENMASK(7, 4) #define CXL_HDM_DECODER_INTERLEAVE_11_8 _BITUL(8) #define CXL_HDM_DECODER_INTERLEAVE_14_12 _BITUL(9) +#define CXL_HDM_DECODER_POISON_ON_DECODE_ERR _BITUL(10) #define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY _BITUL(11) #define CXL_HDM_DECODER_INTERLEAVE_16_WAY _BITUL(12) +#define CXL_HDM_DECODER_UIO_CAPABLE _BITUL(13) +#define CXL_HDM_DECODER_UIO_COUNT_MASK __GENMASK(19, 16) +#define CXL_HDM_DECODER_MEMDATA_NXM _BITUL(20) +#define CXL_HDM_DECODER_COHERENCY_MODELS_MASK __GENMASK(22, 21) #define CXL_HDM_DECODER_CTRL_OFFSET 0x4 #define CXL_HDM_DECODER_ENABLE _BITUL(1) #define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10) --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from BN8PR05CU002.outbound.protection.outlook.com (mail-eastus2azon11011011.outbound.protection.outlook.com [52.101.57.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F8354779B0; Wed, 1 Apr 2026 14:41:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.57.11 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054496; cv=fail; b=LTHvN42fbCzvarY8p3okGbiZ/gVLWLGALmb7kj4x0NgY5IaZXSU4GJYq8kfJ8tgGEZNTqndvx8Sa5pUkaglSJsTVoyeWFRctAttGxCFCRJzqPdQx0KA6KDR/C6A+TUKd2wLXuDowfNSRrOgnrsT5gAyEf5jvkQX97MqnESYKDvA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054496; c=relaxed/simple; bh=PSUKfxYiBjbOGXI+UZ+yzSSynVKBFugSw/7R/RoCQRY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JYMAxYRbQqoNxXAO+aklkxmERnzjJJKQSsXtQtbbPVb7VLhqTrVYvXKWgbhaPFez4A/KfkVlCqny/dDQ/bXXySPQrmgGCVpZN+B0Nz0MFB37U55xvqeV59joyyFjvVHlGUFJzrsKP5+u6NiTkFJmM4NBQFMXZtwETR23JE+mWhM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=IayMhd3w; arc=fail smtp.client-ip=52.101.57.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="IayMhd3w" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=XpWzCEVD9sHP/Ng8eB+osEXrv2U0LqJbT4KqlXG+8FDNWxQ/ZaYgYs0u3hlEzVlfkl94v6jDcrm58BcaxvSmsUkqZDNUOjSiqnHWpy6mdkf23rRUI1v6UxYSfRGBe1nl1c1stLIjdyxNn/yZMtS7+DekBP+EXJ/HZPkthRUMZ4SHn+Mslo8cjjQnlF8T/sq4EkLVdjnRgwvcUeoMp2IdqCKBvgd6By27+XqOAh0RPq7TnqNDIdnR9e65W6eiH5mcDq0H0Z03AJDyC6D9leit8ZiVXqSTvF2oTLqhbfVJx59yvA4c/6K/RipJul55sBn7xsr6OA5RF8zfJqpHqoK+DQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lbj2xnKn8lmK38Gjmqq8Z6gLWeZblrqIiMkO2GT9JAw=; b=FUUS16++u4i3OIb/nxIVV1vVDi+hiXM+DnK6ojDr+70E9eXHr0YzVeXySbI548WjOnO7GFlqk0wNPv4HXbED3cWDi5qa6hYutudS0Qayif7CR57CaD0k2zoIFgfl7QA6GSuEnx94NaksZGRzMF2ktwjwDbTp/W/iw+0F/1fQ9BNIjimx/aAQJb4bOAPBzhGh0pEi7mYtHJekbFSjl4xSfgHtSaOb8rmo7lHTAb+tZkC87XI9S3eI2dTHhjt9WiMZ84vI+2/XJccVI+08uJtCp8FFCaMj5B3bc0gNUxaCXnSDih47H/WYQT3y65d1nB0hHntzYlDx77ex1dXaaoFsvA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lbj2xnKn8lmK38Gjmqq8Z6gLWeZblrqIiMkO2GT9JAw=; b=IayMhd3wfoQFFLA94n5hhW+vRFBHTdG0SCjTxmYYw+lfZxnOrsJcrRh51epXEU5NK7UPlWah8A0GbQ8PTJApkWTACMAI/k5gY0zGr/0IDUAAfNhUV566/REwd22bWH0lIEUak9XF9IFTPt7lzUTbMf4Z+1K4WC5G3CsuFxbr1jOi6Jiet+0Pwu6hvoi/NRDswJ2mip4TcXVzo5L5vucGrqqEgxWPcYE8RnlQkunMc9DRO1YaIrkZeFTHr8mMw7awbySXPa3bCGuJ6pTmncC5xXWQGoNMAsTcruvG4X95nS973hjgA2nsz9zcvrQvzZEfFDyPHn92cWdNulEBVXCvdw== Received: from BN0PR03CA0025.namprd03.prod.outlook.com (2603:10b6:408:e6::30) by MN2PR12MB4045.namprd12.prod.outlook.com (2603:10b6:208:1d6::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.18; Wed, 1 Apr 2026 14:41:31 +0000 Received: from BN1PEPF00006002.namprd05.prod.outlook.com (2603:10b6:408:e6:cafe::7d) by BN0PR03CA0025.outlook.office365.com (2603:10b6:408:e6::30) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 14:41:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006002.mail.protection.outlook.com (10.167.243.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:31 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:07 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:00 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 12/20] vfio/cxl: Wait for HDM ranges and create memdev Date: Wed, 1 Apr 2026 20:09:09 +0530 Message-ID: <20260401143917.108413-13-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006002:EE_|MN2PR12MB4045:EE_ X-MS-Office365-Filtering-Correlation-Id: 5d4bd8f9-86f5-46de-9a8e-08de8ffcc423 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|36860700016|376014|82310400026|921020|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: eu7uX4w3NlLAeu70trtSqE5OuHV0ss20JLIGuP5uy5ZGp/Bxqowk3Ke7Yha6oXcXFAUex2FRmyWJW4c/iNIonebm1E0wSISXn72/qRBGTwK5kZ0nvNbvEAglegjyfI+CDL5/sX+0GIMftTyf7kBRvmqzlRr8cSDadSfcGx9aj6C5wsIygimZDg2h2IHu7kXp7wWXfUK+N+iMikuXJuKbTjUtwvf1knQRjXgNXd8yWkPWyXW2dHdtp4ZEscXcqy1IPewdP0x+Ry08tSqdZTNcgisD6JXtCckGdxmF6jtZ2XfFYybsZr4b6pEkLaCpX21+dJWbtvBLlbZL4CX4cX9HPcD8P4CnVYE6U5rY4qNg+/qPZfJvcqjKJ0EEzhiyBwi9JwpAjXG1Oyk0i6zyxoYQ7gVD0p9QBwcZQFtto9t0YAuDAQOBU/F1dWqIrL3NByUX43Y44j+OCmEnyC4O8kSQZmG7AHXZF3nzqVEAg/KNF6PMyDsj6pvRiWfmH4prUUkV2mfiaPaQiFFKz6JPIMZFw47AZSCu8GAzQY2bV0agTI5gG599FiaQXtJaaFTtPH8WlFVHJkzvLkffaaBcW7rLZVEgedXcM85Qn1ZhSBeWco1Eo/ZgsUC5j4WJSro3X5tTYQ3YgPC7dtWZ6L+bcJta++sHHxQPmJ3zn/pl9JHvr6Dv8F94+zuVxaNWEgy3fcS0ge+aNBo5D0OLlV+scrHRncqjj0yI2ro1LD+czJqZLttbljbVdAaE4YcqEzkoQjmeCqbKmFz7sr67FEJ7qaKpB+Lf/nudG4crxRLK2SF3qdUQI0rL1aU5FS8UauRAViUO X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(36860700016)(376014)(82310400026)(921020)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: j29ykWE5/QKM9BOzsuTohqU1NXiEOEREjNvpjh7crKOcsAc7GT0qihMUXhNnMNhUAQz8KtNeEQk8rXg9JYVYQLk8bpUJq8eYLf3SnV6wzTnzqDgzZWkx97T2dOg9kvBDtlZqKUhzFn8Ut7FVwAr/Xz4cfjoiPbhKTodBpbizj5SxkPSNVq3oRUJfOdVZZtgsvtMYWQBh7bykK7VUu94HTQYXDEBqDb5YOkr6hFDbdRSfddnBj1w0yE5m/qQiWxeCMC/40FUkaWQklTbCICKRDWSfuxZ3Q3eUwv1Bg9CUitxWfmOG9I3YeOg0P88pFj9JwnYk5BPZkLAjaLWSgmNRZOQIGqJ/BUIgz8PB+DHIHTBLUqgJ57d5u/AhEegg5eHVJPvofkY+fyfX4T00XubfLxjPPRH6STibkh0CKalNhLGiQ2S+GmDfu3D/1Dz4rnRk X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:31.6404 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5d4bd8f9-86f5-46de-9a8e-08de8ffcc423 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006002.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4045 Content-Type: text/plain; charset="utf-8" From: Manish Honap After HDM registers are mapped, call cxl_await_range_active() so we only proceed when DVSEC ranges report active without touching the memdev register group Type-2 may lack. Re-snapshot component regs (vfio_cxl_reinit_comp_regs) once MEM_ACTIVE so firmware final SIZE_HIGH etc. land in comp_reg_virt. Read committed decoder size from hardware, set capacity via cxl_set_capacity(), and devm_cxl_add_memdev(). Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 56 ++++++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_emu.c | 42 +++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_priv.h | 4 ++ 3 files changed, 102 insertions(+) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index 0b9e4419cd47..02755265d530 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -165,6 +165,22 @@ static int vfio_cxl_setup_regs(struct vfio_pci_core_de= vice *vdev, return ret; } =20 +static int vfio_cxl_create_memdev(struct vfio_pci_cxl_state *cxl, + resource_size_t capacity) +{ + int ret; + + ret =3D cxl_set_capacity(&cxl->cxlds, capacity); + if (ret) + return ret; + + cxl->cxlmd =3D devm_cxl_add_memdev(&cxl->cxlds, NULL); + if (IS_ERR(cxl->cxlmd)) + return PTR_ERR(cxl->cxlmd); + + return 0; +} + /* * Free CXL state early on probe failure. devm_kfree() on a live devres * allocation removes it from the list immediately, so the normal devres @@ -189,6 +205,7 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_= device *vdev) { struct pci_dev *pdev =3D vdev->pdev; struct vfio_pci_cxl_state *cxl; + resource_size_t capacity =3D 0; u16 dvsec; int ret; =20 @@ -234,8 +251,44 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core= _device *vdev) goto free_cxl; } =20 + cxl->cxlds.media_ready =3D !cxl_await_range_active(&cxl->cxlds); + if (!cxl->cxlds.media_ready) { + pci_warn(pdev, "CXL media not ready\n"); + pci_disable_device(pdev); + goto regs_failed; + } + + /* + * Take the single authoritative HDM decoder snapshot now that + * MEM_ACTIVE is confirmed and BAR memory is still enabled. Using + * readl() per-dword ensures correct MMIO serialisation and captures + * the final firmware-written values for all fields including SIZE_HIGH, + * which firmware commits to the BAR at MEM_ACTIVE time. + */ + vfio_cxl_reinit_comp_regs(cxl); + pci_disable_device(pdev); =20 + capacity =3D vfio_cxl_read_committed_decoder_size(vdev, cxl); + if (capacity =3D=3D 0) { + /* + * TODO: Add handling for devices which do not have + * firmware pre-committed decoders + */ + pci_info(pdev, "Uncommitted region size must be configured via sysfs bef= ore bind\n"); + goto regs_failed; + } + + cxl->dpa_size =3D capacity; + + pci_dbg(pdev, "Device capacity: %llu MB\n", capacity >> 20); + + ret =3D vfio_cxl_create_memdev(cxl, capacity); + if (ret) { + pci_warn(pdev, "Failed to create memdev\n"); + goto regs_failed; + } + /* * Register probing succeeded. Assign vdev->cxl now so that * all subsequent helpers can access state via vdev->cxl. @@ -246,6 +299,9 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_= device *vdev) =20 return; =20 +regs_failed: + vfio_cxl_clean_virt_regs(cxl); + free_cxl: vfio_cxl_dev_state_free(pdev, cxl); } diff --git a/drivers/vfio/pci/cxl/vfio_cxl_emu.c b/drivers/vfio/pci/cxl/vfi= o_cxl_emu.c index 6fb02253e631..11195e8c21d7 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_emu.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_emu.c @@ -365,6 +365,48 @@ int vfio_cxl_setup_virt_regs(struct vfio_pci_core_devi= ce *vdev, return 0; } =20 +/* + * vfio_cxl_read_committed_decoder_size - Extract committed DPA capacity f= rom + * comp_reg_virt[]. + * + * Called from probe context after vfio_cxl_reinit_comp_regs() has taken t= he + * post-MEM_ACTIVE readl() snapshot and patched SIZE_HIGH/SIZE_LOW from DV= SEC. + * comp_reg_virt[] is already correct at this point; no hardware access ne= eded. + * + * Returns the committed DPA capacity in bytes, or 0 if the decoder is not + * committed. + */ +resource_size_t +vfio_cxl_read_committed_decoder_size(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl) +{ + struct pci_dev *pdev =3D vdev->pdev; + resource_size_t capacity; + u32 ctrl, sz_hi, sz_lo; + + if (WARN_ON(!cxl || !cxl->comp_reg_virt)) + return 0; + + ctrl =3D le32_to_cpu(*hdm_reg_ptr(cxl, CXL_HDM_DECODER0_CTRL_OFFSET(0))); + sz_hi =3D le32_to_cpu(*hdm_reg_ptr(cxl, CXL_HDM_DECODER0_SIZE_HIGH_OFFSET= (0))); + sz_lo =3D le32_to_cpu(*hdm_reg_ptr(cxl, CXL_HDM_DECODER0_SIZE_LOW_OFFSET(= 0))); + + if (!(ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)) { + pci_dbg(pdev, + "vfio_cxl: decoder0 not committed: ctrl=3D0x%08x\n", + ctrl); + return 0; + } + + capacity =3D ((resource_size_t)sz_hi << 32) | (sz_lo & GENMASK(31, 28)); + + pci_dbg(pdev, + "vfio_cxl: decoder0 committed: sz_hi=3D0x%08x sz_lo=3D0x%08x capacity=3D= 0x%llx\n", + sz_hi, sz_lo, (unsigned long long)capacity); + + return capacity; +} + /* * Called with memory_lock write side held (from vfio_cxl_reactivate_regio= n). * Uses the pre-established hdm_iobase, no ioremap() under the lock, diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 463a55062144..6359ad260bde 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -22,6 +22,7 @@ struct vfio_pci_cxl_state { resource_size_t comp_reg_offset; size_t comp_reg_size; __le32 *comp_reg_virt; + size_t dpa_size; void __iomem *hdm_iobase; u16 dvsec_len; u8 hdm_count; @@ -83,5 +84,8 @@ int vfio_cxl_setup_virt_regs(struct vfio_pci_core_device = *vdev, void __iomem *cap_base); void vfio_cxl_clean_virt_regs(struct vfio_pci_cxl_state *cxl); void vfio_cxl_reinit_comp_regs(struct vfio_pci_cxl_state *cxl); +resource_size_t +vfio_cxl_read_committed_decoder_size(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl); =20 #endif /* __LINUX_VFIO_CXL_PRIV_H */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from DM1PR04CU001.outbound.protection.outlook.com (mail-centralusazon11010021.outbound.protection.outlook.com [52.101.61.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCAAE3D2FE6; Wed, 1 Apr 2026 14:41:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.61.21 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054510; cv=fail; b=pg1tgAFjuKnL/BseqAI9QqVIkMKE/NzHQKeYD7lYihRUVnOkz5Jo0Ueqt/TFUPWYAOa6qPvQL08GnjvBIPKXatm47XBQVSXQ7YEFU2llbnMq8/6guSdpSrWNR9FwZmXV5lklPpdsTSH2tcrwUQr4JY8whRkJckiAJBFcWqq8dS0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054510; c=relaxed/simple; bh=mVMg95XwTw/+yFfJagVJQ+a/t5fkvoeblIEu5hR5Gqk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TBW4npl/fn0gGJzGeqj70VRxALsMyA50jK744l63Ts0SaBYbFiUL+wwCG3QPlY9SCuk5phZucgOGpSyw95NZ7H3Ys2VvT1O2XEZh7T8sDjqqrIFIs79xK0ZiGaVR7Yye9+tfRabO3RVV1NKHEWhu+HKaPr73Zq7lxOL/EQvv/6U= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=CGm5TarK; arc=fail smtp.client-ip=52.101.61.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="CGm5TarK" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=pa52cpiC9M7tyMIg5bzT0qCTXSJqTrUJHDfvPnY50BsgFymiz2q8TeUPIOD0QVUa5En8J7xanZ8mKQlr4VK1JPx3Omesgt4TftHm+paScYooejEqJIswEznhEB5cR5UVhCqb/RLRO+BveIwuVbGF7KewZY82wvxCY4ofLFMu0pJUB4rD27BTjkksUZgFPhru99w28v6Z2Xc3ggJ6I8Mm6CSIc3fUGebjzXUWJLAetyLfBpWvwhL3B8E7n1ReufxHpQU46NvHopIZ5QekVN39sp7mVXGwXNcwe7iKHf+cc9Tf8n+972wv2WEjyAeflqoYnlxonMuT49iY4j9bHRd5hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JYMDfyTVFoYoqhwWe9W8q3jn0cIi/6OHJy/usPMEbo4=; b=oh5nDPw6PYFlWCb4xjfvXRFojaRnD7JTssYBQjLiF6e+o4akE/L7HmrHM7xlU0lwE0NNbVQBZLWScU37wbdXDvQQlPmIeFLmmSlPVytwHopXuMHBQ7VBCjJ3x5qwH5uwE5owbaoeTrm4aTSjXjpzdQmzS02NTKgxL8FdmIp3SgvmzPmebE1A/j8MHnxLfm4654VcpVwJP6geL8f9mBgDRbAKwWvTH9/a3K7MdNwRVLzfgngB/bK1whYEwLjtwANlC5mhRdBQQPex2fJii2L7A2RQq3J/N/K0JRsnrVeyB4iOUuHRDvyhPrc61i7maFYz7Zk0MDPqWrqKBi57HZupcA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JYMDfyTVFoYoqhwWe9W8q3jn0cIi/6OHJy/usPMEbo4=; b=CGm5TarK/NsVmARBEtuOFQBoD7+1KTzfL6Ok9uhauIzKqTsoZNL/6JcANa10vEtwsgK29g82Stn7ZSRhxsbveKyktkklXA+B/r8JLGAlB2MYLUUhDPFa+lvCoB5GoK3fIE5aG7YjuIqfr6lsYXkP9nDqU9tdLkdikn0nPUbUvgHruwND4HUw9J+wuc2RcYCAPclgYC2Jp7Y1RJpsWU+JAHVtx/iL/CwYxmZCbas/uWY80ii37pyelup6jafIwmiMhyfVn2JPv5zeYLShr7cvqzX0bIiAGXEKTO8V0dcI+bF5OQZkmyGEXhhJeTMi0nGTLWlwWxFaLJo7p5NZcpaHxQ== Received: from IA4P221CA0008.NAMP221.PROD.OUTLOOK.COM (2603:10b6:208:559::6) by MW3PR12MB4347.namprd12.prod.outlook.com (2603:10b6:303:2e::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:41:42 +0000 Received: from BN1PEPF00005FFD.namprd05.prod.outlook.com (2603:10b6:208:559:cafe::75) by IA4P221CA0008.outlook.office365.com (2603:10b6:208:559::6) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.31 via Frontend Transport; Wed, 1 Apr 2026 14:41:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00005FFD.mail.protection.outlook.com (10.167.243.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:41 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:14 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:07 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 13/20] vfio/cxl: CXL region management support Date: Wed, 1 Apr 2026 20:09:10 +0530 Message-ID: <20260401143917.108413-14-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00005FFD:EE_|MW3PR12MB4347:EE_ X-MS-Office365-Filtering-Correlation-Id: f3438c3e-8cf7-44b1-9d0e-08de8ffcc9eb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|82310400026|36860700016|1800799024|921020|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: zZ2+GoEyl65487yVP83WU/5ez+qYy6WGcTaQQ8EQAd4RSPZFFmEIZ9NJdjh3zjs1gUOCvnJrvXjCQLlnwhDPtavFwQFu6WG/tgKNL/R1brff7Tlr+NOMT84XFO5TSfV7cF5+u7IXFZHTPaJvzXMHFfiy40r3gtqf07vHevrfgqqap+3QbODEu0hoYo/WAwy5Zf+Eixv+z31moQFr7gQtksyOVX5FEjFGMSVxirhm7odO/Oz8iu4gmWhOwkbT4OTtf8DJc6UbV+TbrWJGGHUWy5To4OGOuMtmAd52efX6Y2DsV5Duy/kV1vX58FlOfVpVxusT3AOBw1P0IlykuLbveFsC9X79/vappia9wpqpyTU0Qh5DlfpTRlrZqh5REXBW05pHU9TDrHU3LtF38ql5F0jRxzeCWkSks0CZngUpt9izcqRNIqOol7FFqME/K0LwaPhSgcP5g11ZnbCjcxfR+4wdSR3u8XIxEPlAQzVPkYNR+UcEfFvmxqi884qYwF/fh4eYhKMJjlvFXtU8VBgtQDqjc9vPX9A3Y1nTUY19Y8VZqbGKRC8yESv7hc5V2FreZ1LKLRu1iu+cYaerGySvQWdGqd4n2D39G3enzRRpqDkdbNEhvhcqLFTGm2O1haJB/7hxQYJmMs/bdh49+CeSBfJeJCreJpXvUCAoc08tSnSpwJ4VY5mZQRpcCWpAJnF4svZRU0KuMLjqGR2JajVjafCr+QoVV8Epr9uy0kA2PBITLI1+KkfVPGurdZ6iqfJybNAumUZyBIvvsMj5XNNVPDjy7ELPbg2LDaMxlZqeZEJHrj2xtmB8xzNyPywpK8I/ X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(7416014)(376014)(82310400026)(36860700016)(1800799024)(921020)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: F8DxtPOJvxH3t6bzgbMe4iggXMnqKvMmRm6NQm2Eqw13GTY3Wq2gq7kkDV4rSyKEEa0qDinbwAHtIeWK/9ZZbL1y9iS7D2YIZ65vTeE7QPnDTy4iEJsUlil2UedWoh6ivdbozKmWPpIYNINhpb03nV/X9WyXDM8/YES48iDFaaWqeWUk/D7T5HuiJ92/HV7Vl2JkwcHtrTGB0olkzqvHMKu5+7XMeDrL9/wN9G1+sZ1bJufby/lVFFL044GPTNMNDgwQL2CBxAGE+aBemxwYJCWtZOrJ+hBl1U0GeGSnnHAuhR/0DQf7yw6Kf1p6T+jdAi8s7rnUI9HlSm8VbwNW26aYzDlrheVJ0h00Mdm68LCmK4FQKDoUvH+/r1gfefzv96yJ0lDvbE/OxrNlisZhKekEs26B95CENhDhkSfQX3mMW98eV+qg3tCC6GABAnXQ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:41.1482 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f3438c3e-8cf7-44b1-9d0e-08de8ffcc9eb X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00005FFD.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4347 Content-Type: text/plain; charset="utf-8" From: Manish Honap Region Management makes use of APIs provided by CXL_CORE as below: CREATE_REGION flow: 1. Validate request (size, decoder availability) 2. Allocate HPA via cxl_get_hpa_freespace() 3. Allocate DPA via cxl_request_dpa() 4. Create region via cxl_create_region() - commits HDM decoder 5. Get HPA range via cxl_get_region_range() DESTROY_REGION flow: 1. Detach decoder via cxl_decoder_detach() 2. Free DPA via cxl_dpa_free() 3. Release root decoder via cxl_put_root_decoder() Use DEFINE_FREE scope helpers so error paths unwind cleanly. Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 119 +++++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_priv.h | 8 ++ 2 files changed, 127 insertions(+) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index 02755265d530..30b365b91903 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -21,6 +21,13 @@ #include "../vfio_pci_priv.h" #include "vfio_cxl_priv.h" =20 +/* + * Scope-based cleanup wrappers for the CXL resource APIs + */ +DEFINE_FREE(cxl_put_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_O= R_NULL(_T)) cxl_put_root_decoder(_T)) +DEFINE_FREE(cxl_dpa_free, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NU= LL(_T)) cxl_dpa_free(_T)) +DEFINE_FREE(cxl_unregister_region, struct cxl_region *, if (!IS_ERR_OR_NUL= L(_T)) cxl_unregister_region(_T)) + /* * vfio_cxl_create_device_state - Allocate and validate CXL device state * @@ -165,6 +172,112 @@ static int vfio_cxl_setup_regs(struct vfio_pci_core_d= evice *vdev, return ret; } =20 +int vfio_cxl_create_cxl_region(struct vfio_pci_cxl_state *cxl, + resource_size_t size) +{ + resource_size_t max_size; + + WARN_ON(cxl->precommitted); + + struct cxl_root_decoder *cxlrd __free(cxl_put_root_decoder) =3D + cxl_get_hpa_freespace(cxl->cxlmd, 1, + CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2, + &max_size); + if (IS_ERR(cxlrd)) + return PTR_ERR(cxlrd); + + /* Insufficient HPA space; cxlrd freed automatically by __free() */ + if (max_size < size) + return -ENOSPC; + + struct cxl_endpoint_decoder *cxled __free(cxl_dpa_free) =3D + cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM, size); + if (IS_ERR(cxled)) + return PTR_ERR(cxled); + + struct cxl_region *region __free(cxl_unregister_region) =3D + cxl_create_region(cxlrd, &cxled, 1); + if (IS_ERR(region)) + return PTR_ERR(region); + + /* All operations succeeded; transfer ownership to cxl state */ + cxl->cxlrd =3D no_free_ptr(cxlrd); + cxl->cxled =3D no_free_ptr(cxled); + cxl->region =3D no_free_ptr(region); + + return 0; +} + +void vfio_cxl_destroy_cxl_region(struct vfio_pci_cxl_state *cxl) +{ + if (!cxl->region) + return; + + cxl_unregister_region(cxl->region); + cxl->region =3D NULL; + + if (!cxl->precommitted) { + cxl_dpa_free(cxl->cxled); + cxl_put_root_decoder(cxl->cxlrd); + } + + cxl->cxled =3D NULL; + cxl->cxlrd =3D NULL; +} + +static int vfio_cxl_create_region_helper(struct vfio_pci_core_device *vdev, + struct vfio_pci_cxl_state *cxl, + resource_size_t capacity) +{ + struct pci_dev *pdev =3D vdev->pdev; + struct range range; + int ret; + + if (cxl->precommitted) { + struct cxl_endpoint_decoder *cxled; + struct cxl_region *region; + + cxled =3D cxl_get_committed_decoder(cxl->cxlmd, ®ion); + if (IS_ERR(cxled)) + return PTR_ERR(cxled); + cxl->cxled =3D cxled; + cxl->region =3D region; + } else { + ret =3D vfio_cxl_create_cxl_region(cxl, capacity); + if (ret) + return ret; + } + + if (!cxl->region) { + pci_err(pdev, "Failed to create CXL region\n"); + ret =3D -ENODEV; + goto failed; + } + + ret =3D cxl_get_region_range(cxl->region, &range); + if (ret) + goto failed; + + cxl->region_hpa =3D range.start; + cxl->region_size =3D range_len(&range); + + pci_dbg(pdev, "CXL region: HPA 0x%llx size %lu MB\n", + cxl->region_hpa, cxl->region_size >> 20); + + return 0; + +failed: + if (cxl->region) { + cxl_unregister_region(cxl->region); + cxl->region =3D NULL; + } + + cxl->cxled =3D NULL; + cxl->cxlrd =3D NULL; + + return ret; +} + static int vfio_cxl_create_memdev(struct vfio_pci_cxl_state *cxl, resource_size_t capacity) { @@ -279,6 +392,7 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_= device *vdev) goto regs_failed; } =20 + cxl->precommitted =3D true; cxl->dpa_size =3D capacity; =20 pci_dbg(pdev, "Device capacity: %llu MB\n", capacity >> 20); @@ -289,6 +403,10 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core= _device *vdev) goto regs_failed; } =20 + ret =3D vfio_cxl_create_region_helper(vdev, cxl, capacity); + if (ret) + goto regs_failed; + /* * Register probing succeeded. Assign vdev->cxl now so that * all subsequent helpers can access state via vdev->cxl. @@ -314,6 +432,7 @@ void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *= vdev) return; =20 vfio_cxl_clean_virt_regs(cxl); + vfio_cxl_destroy_cxl_region(cxl); } =20 MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 6359ad260bde..72a0d7d7e183 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -17,6 +17,10 @@ struct vfio_pci_cxl_state { struct cxl_memdev *cxlmd; struct cxl_root_decoder *cxlrd; struct cxl_endpoint_decoder *cxled; + struct cxl_region *region; + resource_size_t region_hpa; + size_t region_size; + void *region_vaddr; resource_size_t hdm_reg_offset; size_t hdm_reg_size; resource_size_t comp_reg_offset; @@ -28,6 +32,7 @@ struct vfio_pci_cxl_state { u8 hdm_count; u8 comp_reg_bar; bool cache_capable; + bool precommitted; }; =20 /* Register access sizes */ @@ -87,5 +92,8 @@ void vfio_cxl_reinit_comp_regs(struct vfio_pci_cxl_state = *cxl); resource_size_t vfio_cxl_read_committed_decoder_size(struct vfio_pci_core_device *vdev, struct vfio_pci_cxl_state *cxl); +int vfio_cxl_create_cxl_region(struct vfio_pci_cxl_state *cxl, + resource_size_t size); +void vfio_cxl_destroy_cxl_region(struct vfio_pci_cxl_state *cxl); =20 #endif /* __LINUX_VFIO_CXL_PRIV_H */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012071.outbound.protection.outlook.com [52.101.43.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB63747CC98; Wed, 1 Apr 2026 14:41:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.71 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054516; cv=fail; b=HRM/Z6YYlnI8tqQ60PcbsTLTJ5HQ1Kl79DBerT4+0JCkExDdds//q+VbVg0XwWjZrRPmgLE5G5VuZ9S2ViSbECT6yXbrIYhGpKJwLxP3zBcNq4/kx+4jh1WhZn0lSNGz7tPzOGmVj4qoY2TkuzxTqGty0+3Xn7PkkjjeR6gHXtI= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054516; c=relaxed/simple; bh=zyJuRmRD4IduuavNFAyt/nTJ4huI2Rw/qpToxdnLrwE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VWncq7Il60iloAL7R1/Qdm7SIxcnPZtC0/yeryhSmZ5UYF8dlZxFAJjDER13B1tb20InKFuIvN2PZLcQR4o2zlcZj2c6OoRZ9iP5CveAbh0iOoSzHhuOoE0GB+35mgmq7lO+c875TEjKnL9dyFjeOrXfo+byDYJNDapacWNFdcw= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=aY3R5EB6; arc=fail smtp.client-ip=52.101.43.71 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="aY3R5EB6" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=qxMy3cNbKYfBPoNC38KWjTnHp7wD69chGZ9kXZQ2o3SU2xOGVmlOaHmbL2xoLJB/O9ZmCrJySQ2RcT4/66chQa37tX+lxuhqt2kgbMe4vfq2LLwwnB2L2MDaYgKwSqCZkZpLrjRcE6iszntegZyqgXhGPgR4lgj06zsEOJnCIdkm210uiIC0etNchnK6osztv05Immy3OlImT89EpltQkmYrvzOZCYQvxJ8tvWwqNfYzEgpBjr1GnUZlru7UUfLJNUCqAG0IS4cPUE3Xr5njRa780+mrQd0lONz32fSxC7aNJy+iUlDBGRp+TZLRXjVMotG7tDkSDlNVCbfwJNwnjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0CywdbhIFUrMQw5CcE10L5D6QWxLR2qziki0gj47Sco=; b=lazWWvSgy06JvCXzgR4sdnneuEiUmfuuv6eLf9hvYv2wmhaWcvmr3duYAGBf6l4Wh94VA8+KlR8sBBysRgC5SjoWRzdH8eUFy91ti8NO2Dd+Qzq/E4oLAsmwy7DM6TaGC4SGzojjeIurXt7S3KLH6w5bDlCKFST9I+LVMAY0DAVR07jrbTEsdWZP3IwGl1UQHkbjbLmpBPY4MjVW/ZyKgws7Pw09tEpKDILf6h3lu9o57gzcjJPCmTJ+n3FCbypi2eqdks3VrJzAHrltv8D3KppzEgr+i9xNlHxUfzYktE7wwS1p0BydkTE6ozl/lMr8joj5pQJDxtoXFRgP7OC7yw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0CywdbhIFUrMQw5CcE10L5D6QWxLR2qziki0gj47Sco=; b=aY3R5EB6EPJVD9A2NTNi9aGt8XJcDe3b5Ma4AAqhknSO4Bi6oXkltne1gKEvHoNnxzh2KW6iNc4mfZ5YV3rWMA59uIqIaUD4b2uM2bPAhF9B1Evoh563fUTaOqrP0qoWisj8T9gtzPd07Mpuij1ZxiT+1pA3+4rUNAi/1JBhc3zUdWCFQgbLOj1aOWRbmPTbw+qtzzkqq4X/FnsmuPjnRAUkLFj5KCr7B42bpPpeBt1x3CUFETnAwhzV2maXvfePgCrPhuA38dPOWFKY9avOXP5eYp6vnoiYhOrFPF2dCK4Ug0HwJZmOWcuWmLnKleh9EFUVY43VqNOWqInJxTWpbg== Received: from BY5PR04CA0011.namprd04.prod.outlook.com (2603:10b6:a03:1d0::21) by MN0PR12MB6128.namprd12.prod.outlook.com (2603:10b6:208:3c4::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:41:45 +0000 Received: from SJ5PEPF000001D6.namprd05.prod.outlook.com (2603:10b6:a03:1d0:cafe::41) by BY5PR04CA0011.outlook.office365.com (2603:10b6:a03:1d0::21) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 14:41:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D6.mail.protection.outlook.com (10.167.242.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:44 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:22 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:15 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 14/20] vfio/cxl: DPA VFIO region with demand fault mmap and reset zap Date: Wed, 1 Apr 2026 20:09:11 +0530 Message-ID: <20260401143917.108413-15-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D6:EE_|MN0PR12MB6128:EE_ X-MS-Office365-Filtering-Correlation-Id: 66589cee-41c3-4c19-f225-08de8ffccc16 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700016|7416014|1800799024|376014|921020|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: URksfL8oDi+sAyMqU14iBGca39IdMIJH/DuAEIQki8oyOV3HvIPfOXqYZefLP7q3KcFEYs+3Bh9qG24OftiUOl6bDgIfc2hfbtmWFL3VsfaLpviIeMW5/oGMfNvdGc6aqfH+rRAE4UlS/JiqG8kMpM+JR/1PCxvZEmCc3+00RSCE3su2pLDPZf/1P486/IPsOyjLYba5Z4XLDXp6z3OGLezDoR2I6dkmyhEgLfCPhGc7wrKLD2ECmz9czaFWsmA03+kpa0E+GfWUGebDiXwvvyQmP2gzpBPzZHWyoVq7NdC9wQF6scgFr37a/nLOM3gd1nnKofW7kxLu2AhlM4Th5XKls9J6It00WqoaHTtTKquKnPkLvjy3AJm1TGH+vVNOS/OCNjKm2Sa1GMBxmVDJk+Pqr+1ikPkbmFK5gMcqHsXxwQdqwHVs6T5YExjhOg/Ucrhd8TIDbFfwyENrsgKHMqWHHCM5QYsNOb2UgE6Vqt52+lOIgST5juxBdXiAZirSBjGCanUPwJZlcmPSW64U1UcORXh8W0m36NnabUrxd/JY8P0iYOj0i4Xa7XSn3imk2NplOVT0ooKkC/z8KPANnL/eeVljrrjn73qJ9zjPqFuF85VXQl20d+TSeqUJLA5jvxNSMUXHXiVvF29lCQFAFN6/1qdHANmXyJPVcWSufkn9K6C67G5Od6aCfUfIbjeH+TeXz5qzMXxY4b1xqXYjnBEBKrJfsT4W3ancnTXrvyyfZiUTSOcsK7Wwq79xHL7RKYJb/bDGusYe5zVqbox2UGdRciY1z4hhtzXuxrR5Lxsjflh2iQVCsd9KLs6xztNL X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700016)(7416014)(1800799024)(376014)(921020)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Er6GdcLkmU+D+f64rfemfXDCrXQc9Bd7Gf22eRDWzL7jnAAHZBeSW4XbXomYSE+iQgv1IwlFyp7Mso0CthD68bFppgxyQ0xzQ8rcWWokbCzMfA+MkUdmofaYszPFYPBddwQEn6166tiHlMJ2JFTmqf+1TL+K5t+Nf/XKobs2Yl3SpEcsnwCgRcgTiaSwK8O8M8wj9TFpCnXiL86twGeYHamLjTwcfKfL6WYXszJtLR0vsi4fgQ3BjajbmNVNcLWLiUMAGoNaA66gX5PjN5A4vWQ/O/c3Y1mXNFu5w+RJEZwbelmbtqdXrcSSmQzGv15COZ/qwnkTCVpqgbLy3xiWUUHp4fpbFsN4ZvT/TfPF+K8RzP4+6hFBCBdNSfUgNDO4cay2QoLlrColUiNdjQF6GeB/fJKMiruo8Gn4fnAr8NdeNEcJRvRlU86x/goWYXJY X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:44.9745 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 66589cee-41c3-4c19-f225-08de8ffccc16 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D6.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6128 Content-Type: text/plain; charset="utf-8" From: Manish Honap Wire the CXL DPA range up as a VFIO demand-paged region so QEMU can mmap guest device memory directly. Faults call vmf_insert_pfn() to insert one PFN at a time rather than mapping the full range upfront. CXL region lifecycle: - The CXL memory region is registered with VFIO layer during vfio_pci_open_device - mmap() establishes the VMA with vm_ops but inserts no PTEs - Each guest page fault calls vfio_cxl_region_page_fault() which inserts a single PFN under the memory_lock read side - On device reset, vfio_cxl_zap_region_locked() sets region_active=3Dfalse and calls unmap_mapping_range() to invalidate all DPA PTEs atomically while holding memory_lock for writing - Faults racing with reset see region_active=3D=3Dfalse and return VM_FAULT_SIGBUS - vfio_cxl_reactivate_region() restores region_active after successful hardware reset Also integrate the zap/reactivate calls into vfio_pci_ioctl_reset() so that FLR correctly invalidates DPA mappings and restores them on success. Co-developed-by: Zhi Wang Signed-off-by: Zhi Wang Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 187 +++++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_emu.c | 2 +- drivers/vfio/pci/cxl/vfio_cxl_priv.h | 3 + drivers/vfio/pci/vfio_pci_core.c | 11 ++ drivers/vfio/pci/vfio_pci_priv.h | 6 + 5 files changed, 208 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index 30b365b91903..19d3dc205f99 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -435,4 +435,191 @@ void vfio_pci_cxl_cleanup(struct vfio_pci_core_device= *vdev) vfio_cxl_destroy_cxl_region(cxl); } =20 +static vm_fault_t vfio_cxl_region_vm_fault(struct vm_fault *vmf) +{ + struct vfio_pci_region *region =3D vmf->vma->vm_private_data; + struct vfio_pci_cxl_state *cxl =3D region->data; + unsigned long pgoff; + unsigned long pfn; + + if (!READ_ONCE(cxl->region_active)) + return VM_FAULT_SIGBUS; + + pgoff =3D vmf->pgoff & + ((1UL << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + + if (pgoff >=3D (cxl->region_size >> PAGE_SHIFT)) + return VM_FAULT_SIGBUS; + + pfn =3D PHYS_PFN(cxl->region_hpa) + pgoff; + + return vmf_insert_pfn(vmf->vma, vmf->address, pfn); +} + +static const struct vm_operations_struct vfio_cxl_region_vm_ops =3D { + .fault =3D vfio_cxl_region_vm_fault, +}; + +static int vfio_cxl_region_mmap(struct vfio_pci_core_device *vdev, + struct vfio_pci_region *region, + struct vm_area_struct *vma) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u64 req_len, pgoff, end; + + if (!(region->flags & VFIO_REGION_INFO_FLAG_MMAP)) + return -EINVAL; + + if (!(region->flags & VFIO_REGION_INFO_FLAG_READ) && + (vma->vm_flags & VM_READ)) + return -EPERM; + + if (!(region->flags & VFIO_REGION_INFO_FLAG_WRITE) && + (vma->vm_flags & VM_WRITE)) + return -EPERM; + + pgoff =3D vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + + if (check_sub_overflow(vma->vm_end, vma->vm_start, &req_len) || + check_add_overflow(PFN_PHYS(pgoff), req_len, &end)) + return -EOVERFLOW; + + if (end > cxl->region_size) + return -EINVAL; + + vma->vm_page_prot =3D pgprot_decrypted(vma->vm_page_prot); + + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | + VM_DONTEXPAND | VM_DONTDUMP); + + vma->vm_ops =3D &vfio_cxl_region_vm_ops; + vma->vm_private_data =3D region; + + return 0; +} + +/* + * vfio_cxl_zap_region_locked - Invalidate all DPA region PTEs. + * + * Must be called with vdev->memory_lock held for writing. Sets + * region_active=3Dfalse before zapping so any subsequent I/O to the region + * sees the inactive state and returns an error rather than accessing + * stale mappings. + */ +void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + + lockdep_assert_held_write(&vdev->memory_lock); + + if (!cxl) + return; + + WRITE_ONCE(cxl->region_active, false); +} + +/* + * vfio_cxl_reactivate_region - Re-enable DPA region after successful rese= t. + * + * Must be called with vdev->memory_lock held for writing. Re-reads the + * HDM decoder state from hardware (FLR cleared it) and sets region_active + * so that subsequent I/O to the region is permitted again. + */ +void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + + lockdep_assert_held_write(&vdev->memory_lock); + + if (!cxl) + return; + /* + * Re-initialise the emulated HDM comp_reg_virt[] from hardware. + * After FLR the decoder registers read as zero; mirror that in + * the emulated state so QEMU sees a clean slate. + */ + vfio_cxl_reinit_comp_regs(cxl); + + /* + * Only re-enable the DPA mmap if the hardware has actually + * re-committed decoder 0 after FLR. Read the COMMITTED bit from the + * freshly-re-snapshotted comp_reg_virt[] so we check the post-FLR + * hardware state, not stale pre-reset state. + * + * If COMMITTED is 0 (slow firmware re-commit path), leave + * region_active=3Dfalse. Guest faults will return VM_FAULT_SIGBUS + * until the decoder is re-committed and the region is re-enabled. + */ + if (cxl->precommitted && cxl->comp_reg_virt) { + /* + * Read CTRL via the full CXL.mem-relative index: hdm_reg_offset + * (now CXL.mem-relative) plus the within-HDM-block offset. + */ + u32 ctrl =3D le32_to_cpu(*hdm_reg_ptr(cxl, + CXL_HDM_DECODER0_CTRL_OFFSET(0))); + + if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED) + WRITE_ONCE(cxl->region_active, true); + } +} + +static ssize_t vfio_cxl_region_rw(struct vfio_pci_core_device *core_dev, + char __user *buf, size_t count, loff_t *ppos, + bool iswrite) +{ + unsigned int i =3D VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS; + struct vfio_pci_cxl_state *cxl =3D core_dev->region[i].data; + loff_t pos =3D *ppos & VFIO_PCI_OFFSET_MASK; + + if (!count || pos >=3D cxl->region_size) + return 0; + + /* + * Guard against access after a failed reset (region_active=3Dfalse) + * or a release race (region_vaddr=3DNULL). Either condition means + * the memremap'd window is no longer valid; touching it would produce + * a Synchronous External Abort. Return -EIO so the caller gets a + * clean error rather than a kernel oops. + */ + if (!READ_ONCE(cxl->region_active) || !cxl->region_vaddr) + return -EIO; + + count =3D min(count, (size_t)(cxl->region_size - pos)); + + if (iswrite) { + if (copy_from_user(cxl->region_vaddr + pos, buf, count)) + return -EFAULT; + } else { + if (copy_to_user(buf, cxl->region_vaddr + pos, count)) + return -EFAULT; + } + + return count; +} + +static void vfio_cxl_region_release(struct vfio_pci_core_device *vdev, + struct vfio_pci_region *region) +{ + struct vfio_pci_cxl_state *cxl =3D region->data; + + /* + * Deactivate the region before removing user mappings so that any + * fault handler racing the release returns VM_FAULT_SIGBUS rather + * than inserting a PFN into an unmapped region. + */ + WRITE_ONCE(cxl->region_active, false); + + if (cxl->region_vaddr) { + memunmap(cxl->region_vaddr); + cxl->region_vaddr =3D NULL; + } +} + +static const struct vfio_pci_regops vfio_cxl_regops =3D { + .rw =3D vfio_cxl_region_rw, + .mmap =3D vfio_cxl_region_mmap, + .release =3D vfio_cxl_region_release, +}; + MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_emu.c b/drivers/vfio/pci/cxl/vfi= o_cxl_emu.c index 11195e8c21d7..781328a79b43 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_emu.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_emu.c @@ -33,7 +33,7 @@ * +0x1c: (reserved) */ =20 -static inline __le32 *hdm_reg_ptr(struct vfio_pci_cxl_state *cxl, u32 hdm_= off) +__le32 *hdm_reg_ptr(struct vfio_pci_cxl_state *cxl, u32 hdm_off) { /* * hdm_off is a byte offset within the HDM decoder block. diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 72a0d7d7e183..3458768445af 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -33,6 +33,7 @@ struct vfio_pci_cxl_state { u8 comp_reg_bar; bool cache_capable; bool precommitted; + bool region_active; }; =20 /* Register access sizes */ @@ -96,4 +97,6 @@ int vfio_cxl_create_cxl_region(struct vfio_pci_cxl_state = *cxl, resource_size_t size); void vfio_cxl_destroy_cxl_region(struct vfio_pci_cxl_state *cxl); =20 +__le32 *hdm_reg_ptr(struct vfio_pci_cxl_state *cxl, u32 hdm_off); + #endif /* __LINUX_VFIO_CXL_PRIV_H */ diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index b7364178e23d..48e0274c19aa 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1223,6 +1223,9 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_= device *vdev, =20 vfio_pci_zap_and_down_write_memory_lock(vdev); =20 + /* Zap CXL DPA region PTEs before hardware reset clears HDM state */ + vfio_cxl_zap_region_locked(vdev); + /* * This function can be invoked while the power state is non-D0. If * pci_try_reset_function() has been called while the power state is @@ -1236,6 +1239,14 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core= _device *vdev, =20 vfio_pci_dma_buf_move(vdev, true); ret =3D pci_try_reset_function(vdev->pdev); + + /* + * Re-enable DPA region if reset succeeded; fault handler will + * re-insert PFNs on next access without requiring a new mmap. + */ + if (!ret) + vfio_cxl_reactivate_region(vdev); + if (__vfio_pci_memory_enabled(vdev)) vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 1082ba43bafe..726063b6ff70 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -145,6 +145,8 @@ static inline void vfio_pci_dma_buf_move(struct vfio_pc= i_core_device *vdev, =20 void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev); void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev); +void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev); +void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev); =20 #else =20 @@ -152,6 +154,10 @@ static inline void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_device *vdev) { } static inline void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev) { } +static inline void +vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev) { } +static inline void +vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev) { } =20 #endif /* CONFIG_VFIO_CXL_CORE */ =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from CH1PR05CU001.outbound.protection.outlook.com (mail-northcentralusazon11010007.outbound.protection.outlook.com [52.101.193.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26457477E34; Wed, 1 Apr 2026 14:42:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.193.7 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054530; cv=fail; b=ML8L79ESP+mF4mubduJQUgWVkTPHjQrfUO999qqYhngqKst/Hel/LdFmnvYQDTpQLlK90jtbMAWJxGt30pziTrpEPqejlR8JBQMs7/cEcljDTs5OzNRteX6aAQclJkh2lWQ7ykR0fdxC4wH7f7ryR/iBVtEOPDNFxX42X7TxHLU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054530; c=relaxed/simple; bh=vlHyOqj2oALziFiemj5f4qtRYPTJrWCxZBMbjXUyXU4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TQgxMpjPiqCQLvK4dxNPEBd7jtyXm8JawfZI1df6jhCN/Fm9JUBc4TsLMVS1PI8bbiV+oq2HUIQ+8TW8sqExtdpDhQ11BSvsRsgawDu46l2Hg19BPeEnqzwam29NUN//AGskwpGkobaGYPXyz82y0P3tXt6Jgv/KlkX2E2usFdQ= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Vd+DEjDy; arc=fail smtp.client-ip=52.101.193.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Vd+DEjDy" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=KcEKDYdsWc3qLoYnh6HRIzEv3LqQJcYrrB5TltIWeEfJkTN4CJNazeTw239uBuYKayDhMV6RdUU5k1LXTkFjpAnwT8AHODdfd2sarlHsO//pKqBZlVGb2d+SeUVoNlc28bPonFtru4+J14EWX+h+N3SD9OjYNKYJaKnu3iqfV3qxEqFYs7ZrxxHqeaCGmdySd4eAOgg8u1XmRoGmmtdjUjzjckraZzOsr6CfjK2g4Pto0h+VFLZSs1s91fDFAcx9UBvD5xHKiz+DFMQDc/gijB4R33+zn+uMA/A0188PQYYrcTeYIyEACwqA/6LwFpg1lfxtHmjAyimyJrAvprcdag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kqSpA2tf/j9DTpFbcIXKWLvhaVJ6eyht6RldoNX1ejs=; b=Ud17n3QBlvRNLvP4J4McNjIsNV0UMoOE8G9H5+any/SzFUMETlEBPBVUcyR48CbNXUx+afseNVQvi0f6vWyCunpJZPzH8s3tWRwOro4OeTb7hI/z60Ev+yspugZWZUo3DhfyS6PnQd2yUjMHcYDuWGpGh/vb9OhwRa2Rzv8szW7jOiSal7O9Il/CJBduqW1+l/AlxTofcHt0Fa2UR28LXa7jAFM0Q8gFZoMub+mkfU71Og/LZsgHeC9NLALQN/O8dTvAcaq9xMASkcoRkpzqZv3afvXQV7r45hs6aRO4HGfX7uu2yQwimWV3yrUiXaswDPUuVsMTfsx957j7mIPWag== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kqSpA2tf/j9DTpFbcIXKWLvhaVJ6eyht6RldoNX1ejs=; b=Vd+DEjDyicQcA41+Yo/0jwonWCIuBcBZlNGtJsvR7Nn3BXjjI6DnGao+ej2UjqXryxbPdPR8mhgW4fnp5THSby9rXcW9K3VLSze2to6SiDbTZLc7v3eoDH4eo3f9r26BTw4kOSYQ1tn2SLDtSxbFDNnyO9L9nRrpy9IjtaL+mNw/MYlLQVnO/eN2zhQrSF6VjsoMb5xdjIbOLhaAawEMdBFyJeGqt/9I1xRPuRNw1iTIud+sZRIkRfVxJt25GEdEP9Aq9SbZnG7GT0UJsEEn0t4kTh0DrkryIEYNutJY3lTEYnnbDeHDwXJR1q0gDmGTR8Qp80RwWF5Mb9S1Y1e/RA== Received: from BN9PR03CA0255.namprd03.prod.outlook.com (2603:10b6:408:ff::20) by SJ0PR12MB6853.namprd12.prod.outlook.com (2603:10b6:a03:47b::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 14:41:54 +0000 Received: from BN1PEPF00006001.namprd05.prod.outlook.com (2603:10b6:408:ff:cafe::61) by BN9PR03CA0255.outlook.office365.com (2603:10b6:408:ff::20) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:41:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006001.mail.protection.outlook.com (10.167.243.233) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:41:53 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:30 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:22 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 15/20] vfio/cxl: Virtualize CXL DVSEC config writes Date: Wed, 1 Apr 2026 20:09:12 +0530 Message-ID: <20260401143917.108413-16-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006001:EE_|SJ0PR12MB6853:EE_ X-MS-Office365-Filtering-Correlation-Id: fe5e03a1-e50d-4489-25e9-08de8ffcd161 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700016|376014|1800799024|7416014|13003099007|921020|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: hEn+3IPfyEgUqbyA2pWBioKAv3c40NStkDLzgIhtj0I8S4p5Ye6tfd2R52ECaPthzliam2siPGrA6e3eCQ3jqn6+YxcqW8+vbyddZmNkYimLHlbPEk0uzan55x7zqNf5IMMo5TEQ2s8KtqpzTs0Q62PvKeSREmoeSX6YMdn2RKYgYzX08kLMD6REAAWSQBqO2gGMrFJRxzr/sR9xvvuuoy/8vRRGtz6jS1AcDR67b4txWHIe7ppN3pMAoyxGBnkCKT/0Nm4VWJKjmU8VwMxsxQs0SzGWiv/rmhZ0mbgJXr3+vWTRmcLYkvdPEOfZ+H1nf+kKym2APIcC5Yon1/hHLcZpsb92j/dyR9tEopp576BeQacO4S8uJac0h9lCq/cdc3Ec3BZKk1zxxcGbf1Tv8f913y91sBgZ3mq1qRlGgpUurq37blpEskzY8pDCTKIMbx6Hv1NfY09NEiNfCJSGKCo6b29vpCcA0+alBPtPfO//dkT+ov7vlls5Y295fwjrlczGwySOcWHJ150XhMWGioT5aZBIL8nYVv6DwONfRShROCr2rtCNO+jeu0FIrNuFzvGPbMaU8criq6oWe7g63hhzTjoc8f3Y8cnAz4/jJp55Ya6JTgAuMO3xxp5kjuT8P20CjsJyHPkZ75pwRj4v+lIpvc6sMMO4TuwJTnlNVukt4cGaQIOk5jxR5WPP9bcQw/eUMYKWsAnDdwj+8CwtvkVdzUCd8UzlX+J8YvclP6izarmFz6ZCAEe+YjtQ+zeAdsbSS/jpk4nSwPQsvCMAXgdjVoxnUeQHxygUY7SxuZ4bawVesdyhqmgz5WfpdCcY X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700016)(376014)(1800799024)(7416014)(13003099007)(921020)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: FORePgYakvJhCJ8TH5029Rhruv3b8wDQlxrCAySeax5gADoLjwMyTAbi3i1nkJcW4PSngT2jBAKq/1L081twvKBeyq4vWWo6O5UMsDzzcMB3F/XKvymv1SBBV4EXBlpfC7Y2Ib8yiKelS466kh+JmF1Q5JCf836YKU4wONvF/df7/ap2dVvPpMDdmleoay9EKHHAzXzF6mHL+1OhefPs544Zw2J22DOrEf+fNR4ubrKt7Etob4KBwL2Tt2mtgJNxjUBhi8j/FGgb7kXAcpkn/KYM9GMc2ORJ6693YfQVv28HZnX/bbsW6IMoFUo9lQQQYdN43u7iCy30U3fa6N7g1fzFyvIEdjrZyGwrEHSM39zdTciuPsDP6ZZoIF70MjqKUYG0bSdB7hTdgPVFvkIQpcfuuu1OKxW6ZzfI6EJmSZGRr2y4HkAa9kT+ZnzRiAan X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:41:53.8633 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fe5e03a1-e50d-4489-25e9-08de8ffcd161 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006001.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB6853 Content-Type: text/plain; charset="utf-8" From: Manish Honap CXL devices have CXL DVSEC registers in the configuration space. Many of them affect the behaviors of the devices, e.g. enabling CXL.io/CXL.mem/CXL.cache. However, these configurations are owned by the host and a virtualization policy should be applied when handling the access from the guest. Introduce the emulation of CXL configuration space to handle the access of the virtual CXL configuration space from the guest. vfio-pci-core already allocates vdev->vconfig as the authoritative virtual config space shadow. Directly use vdev->vconfig: - DVSEC reads return data from vdev->vconfig (already populated by vfio_config_init() via vfio_ecap_init()) - DVSEC writes go through new CXL-aware write handlers that update vdev->vconfig in place - The writable DVSEC registers are marked virtual in vdev->pci_config_map Signed-off-by: Zhi Wang Signed-off-by: Manish Honap --- drivers/vfio/pci/Makefile | 2 +- drivers/vfio/pci/cxl/vfio_cxl_config.c | 306 +++++++++++++++++++++++++ drivers/vfio/pci/cxl/vfio_cxl_core.c | 4 +- drivers/vfio/pci/cxl/vfio_cxl_priv.h | 43 +++- drivers/vfio/pci/vfio_pci_config.c | 46 +++- drivers/vfio/pci/vfio_pci_priv.h | 3 + include/linux/vfio_pci_core.h | 8 +- include/uapi/cxl/cxl_regs.h | 98 ++++++++ 8 files changed, 498 insertions(+), 12 deletions(-) create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_config.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index bef916495eae..7c86b7845e8f 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only =20 vfio-pci-core-y :=3D vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio= _pci_config.o -vfio-pci-core-$(CONFIG_VFIO_CXL_CORE) +=3D cxl/vfio_cxl_core.o cxl/vfio_cx= l_emu.o +vfio-pci-core-$(CONFIG_VFIO_CXL_CORE) +=3D cxl/vfio_cxl_core.o cxl/vfio_cx= l_emu.o cxl/vfio_cxl_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) +=3D vfio_pci_zdev.o vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) +=3D vfio_pci_dmabuf.o obj-$(CONFIG_VFIO_PCI_CORE) +=3D vfio-pci-core.o diff --git a/drivers/vfio/pci/cxl/vfio_cxl_config.c b/drivers/vfio/pci/cxl/= vfio_cxl_config.c new file mode 100644 index 000000000000..dee521118dd4 --- /dev/null +++ b/drivers/vfio/pci/cxl/vfio_cxl_config.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * CXL DVSEC configuration space emulation for vfio-pci. + * + * Integrates into the existing vfio-pci-core ecap_perms[] framework using + * vdev->vconfig as the sole shadow buffer for DVSEC registers. + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include +#include + +#include "../vfio_pci_priv.h" +#include "vfio_cxl_priv.h" + +static inline u16 _cxlds_get_dvsec(struct vfio_pci_cxl_state *cxl) +{ + return (u16)cxl->cxlds.cxl_dvsec; +} + +/* Helpers to access vdev->vconfig at a DVSEC-relative offset */ +static inline u16 dvsec_virt_read16(struct vfio_pci_core_device *vdev, + u16 off) +{ + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + + return le16_to_cpu(*(u16 *)(vdev->vconfig + dvsec + off)); +} + +static inline void dvsec_virt_write16(struct vfio_pci_core_device *vdev, + u16 off, u16 val) +{ + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + + *(u16 *)(vdev->vconfig + dvsec + off) =3D cpu_to_le16(val); +} + +static inline u32 dvsec_virt_read32(struct vfio_pci_core_device *vdev, + u16 off) +{ + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + + return le32_to_cpu(*(u32 *)(vdev->vconfig + dvsec + off)); +} + +static inline void dvsec_virt_write32(struct vfio_pci_core_device *vdev, + u16 off, u32 val) +{ + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + + *(u32 *)(vdev->vconfig + dvsec + off) =3D cpu_to_le32(val); +} + +/* Individual DVSEC register write handlers */ + +static void cxl_dvsec_control_write(struct vfio_pci_core_device *vdev, + u16 new_val) +{ + u16 lock =3D dvsec_virt_read16(vdev, CXL_DVSEC_LOCK_OFFSET); + u16 cap3 =3D dvsec_virt_read16(vdev, CXL_DVSEC_CAPABILITY3_OFFSET); + u16 rev_mask =3D CXL_CTRL_RESERVED_MASK; + + if (lock & CXL_DVSEC_LOCK_CONFIG_LOCK) + return; /* register is locked after first write */ + + if (!(cap3 & CXL_DVSEC_CAP3_P2P_MEM_CAPABLE)) + rev_mask |=3D CXL_CTRL_P2P_REV_MASK; + + new_val &=3D ~rev_mask; + new_val |=3D CXL_DVSEC_CTRL_IO_ENABLE; /* IO_Enable always returns 1 */ + + dvsec_virt_write16(vdev, CXL_DVSEC_CONTROL_OFFSET, new_val); +} + +static void cxl_dvsec_status_write(struct vfio_pci_core_device *vdev, + u16 new_val) +{ + u16 cur_val =3D dvsec_virt_read16(vdev, CXL_DVSEC_STATUS_OFFSET); + + /* + * VIRAL_STATUS (bit 14) is the only writable bit; all others are + * reserved and always zero. + */ + new_val =3D cur_val & ~(new_val & CXL_DVSEC_STATUS_VIRAL_STATUS); + dvsec_virt_write16(vdev, CXL_DVSEC_STATUS_OFFSET, new_val); +} + +static void cxl_dvsec_control2_write(struct vfio_pci_core_device *vdev, + u16 new_val) +{ + struct pci_dev *pdev =3D vdev->pdev; + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + u16 abs_off =3D dvsec + CXL_DVSEC_CONTROL2_OFFSET; + u16 cap2 =3D dvsec_virt_read16(vdev, CXL_DVSEC_CAPABILITY2_OFFSET); + u16 cap3 =3D dvsec_virt_read16(vdev, CXL_DVSEC_CAPABILITY3_OFFSET); + u16 rev_mask =3D CXL_CTRL2_RESERVED_MASK; + + if (!(cap3 & CXL_DVSEC_CAP3_VOLATILE_HDM_CONFIGURABILITY)) + rev_mask |=3D CXL_CTRL2_VOLATILE_HDM_REV_MASK; + if (!(cap2 & CXL_DVSEC_CAP2_MOD_COMPLETION_CAPABLE)) + rev_mask |=3D CXL_CTRL2_MODIFIED_COMP_REV_MASK; + + new_val &=3D ~rev_mask; + + /* Cache WBI: forward to hardware. */ + if (new_val & CXL_DVSEC_CTRL2_INITIATE_CACHE_WBI) + pci_write_config_word(pdev, abs_off, + CXL_DVSEC_CTRL2_INITIATE_CACHE_WBI); + + /* + * CXL Reset: not yet supported - do not forward to HW. + * TODO: invoke CXL protocol reset via cxl subsystem + */ + if (new_val & CXL_DVSEC_CTRL2_INITIATE_CXL_RESET) + pci_warn(pdev, "vfio-cxl: CXL reset requested but not yet supported\n"); + + dvsec_virt_write16(vdev, CXL_DVSEC_CONTROL2_OFFSET, + new_val & ~CXL_CTRL2_HW_BITS_MASK); +} + +static void cxl_dvsec_status2_write(struct vfio_pci_core_device *vdev, + u16 new_val) +{ + u16 cap3 =3D dvsec_virt_read16(vdev, CXL_DVSEC_CAPABILITY3_OFFSET); + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + u16 abs_off =3D dvsec + CXL_DVSEC_STATUS2_OFFSET; + + /* RW1CS: write 1 to clear, but only if the capability is supported */ + if ((cap3 & CXL_DVSEC_CAP3_VOLATILE_HDM_CONFIGURABILITY) && + (new_val & CXL_DVSEC_STATUS2_VOLATILE_HDM_PRES_ERROR)) + pci_write_config_word(vdev->pdev, abs_off, + CXL_DVSEC_STATUS2_VOLATILE_HDM_PRES_ERROR); + /* STATUS2 is not mirrored in vconfig - reads go to hardware */ +} + +static void cxl_dvsec_lock_write(struct vfio_pci_core_device *vdev, + u16 new_val) +{ + u16 cur_val =3D dvsec_virt_read16(vdev, CXL_DVSEC_LOCK_OFFSET); + + /* Once the LOCK bit is set it can only be cleared by conventional reset = */ + if (cur_val & CXL_DVSEC_LOCK_CONFIG_LOCK) + return; + + new_val &=3D ~CXL_LOCK_RESERVED_MASK; + dvsec_virt_write16(vdev, CXL_DVSEC_LOCK_OFFSET, new_val); +} + +static void cxl_range_base_lo_write(struct vfio_pci_core_device *vdev, + u16 dvsec_off, u32 new_val) +{ + new_val &=3D ~CXL_BASE_LO_RESERVED_MASK; + dvsec_virt_write32(vdev, dvsec_off, new_val); +} + +/** + * vfio_cxl_dvsec_readfn - Per-device DVSEC read handler for CXL capable d= evices. + * @vdev: VFIO PCI core device + * @pos: Absolute byte position in PCI config space + * @count: Number of bytes to read + * @perm: Permission bits for this capability (passed through to fallbac= k) + * @offset: Byte offset within the capability structure (passed through) + * @val: Output buffer for the read value (little-endian) + * + * Called via vfio_pci_dvsec_dispatch_read() for CXL devices. Returns sha= dow + * vconfig values for virtualized DVSEC registers (CONTROL, STATUS, CONTRO= L2, + * LOCK) so that userspace reads reflect emulated state rather than raw + * hardware. All other DVSEC bytes pass through to vfio_raw_config_read(). + * + * Return: @count on success, or negative error code from the fallback rea= d. + */ +static int vfio_cxl_dvsec_readfn(struct vfio_pci_core_device *vdev, + int pos, int count, + struct perm_bits *perm, + int offset, __le32 *val) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + u16 dvsec_off; + + if (!cxl || (u16)pos < dvsec || + (u16)pos >=3D dvsec + cxl->dvsec_len) + return vfio_raw_config_read(vdev, pos, count, perm, offset, val); + + dvsec_off =3D (u16)pos - dvsec; + + switch (dvsec_off) { + case CXL_DVSEC_CONTROL_OFFSET: + case CXL_DVSEC_STATUS_OFFSET: + case CXL_DVSEC_CONTROL2_OFFSET: + case CXL_DVSEC_LOCK_OFFSET: + /* Return shadow vconfig value for virtualized registers */ + memcpy(val, vdev->vconfig + pos, count); + return count; + default: + return vfio_raw_config_read(vdev, pos, count, + perm, offset, val); + } +} + +/** + * vfio_cxl_dvsec_writefn - ecap_perms write handler for PCI_EXT_CAP_ID_DV= SEC. + * + * Installed once into ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn by + * vfio_pci_init_perm_bits() when CONFIG_VFIO_CXL_CORE=3Dy. Applies to ev= ery + * device opened under vfio-pci; the vdev->cxl NULL check distinguishes CXL + * devices from non-CXL devices that happen to expose a DVSEC capability. + * + * @vdev: VFIO PCI core device + * @pos: Absolute byte position in PCI config space + * @count: Number of bytes to write + * @perm: Permission bits for this capability (passed through to fallbac= k) + * @offset: Byte offset within the capability structure (passed through) + * @val: Value to write (little-endian) + * + * Return: @count on success; non-CXL devices continue to + * vfio_raw_config_write() which also returns @count or negative e= rror. + */ +static int vfio_cxl_dvsec_writefn(struct vfio_pci_core_device *vdev, + int pos, int count, + struct perm_bits *perm, + int offset, __le32 val) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u16 dvsec =3D _cxlds_get_dvsec(vdev->cxl); + u16 abs_off =3D (u16)pos; + u16 dvsec_off; + u16 wval16; + u32 wval32; + + if (!cxl || (u16)pos < dvsec || + (u16)pos >=3D dvsec + cxl->dvsec_len) + return vfio_raw_config_write(vdev, pos, count, perm, + offset, val); + + pci_dbg(vdev->pdev, + "vfio_cxl: DVSEC write: abs=3D0x%04x dvsec_off=3D0x%04x count=3D%d raw_v= al=3D0x%08x\n", + abs_off, abs_off - dvsec, count, le32_to_cpu(val)); + + dvsec_off =3D abs_off - dvsec; + + /* Route to the appropriate per-register handler */ + switch (dvsec_off) { + case CXL_DVSEC_CONTROL_OFFSET: + wval16 =3D (u16)le32_to_cpu(val); + cxl_dvsec_control_write(vdev, wval16); + break; + case CXL_DVSEC_STATUS_OFFSET: + wval16 =3D (u16)le32_to_cpu(val); + cxl_dvsec_status_write(vdev, wval16); + break; + case CXL_DVSEC_CONTROL2_OFFSET: + wval16 =3D (u16)le32_to_cpu(val); + cxl_dvsec_control2_write(vdev, wval16); + break; + case CXL_DVSEC_STATUS2_OFFSET: + wval16 =3D (u16)le32_to_cpu(val); + cxl_dvsec_status2_write(vdev, wval16); + break; + case CXL_DVSEC_LOCK_OFFSET: + wval16 =3D (u16)le32_to_cpu(val); + cxl_dvsec_lock_write(vdev, wval16); + break; + case CXL_DVSEC_RANGE1_BASE_HIGH_OFFSET: + case CXL_DVSEC_RANGE2_BASE_HIGH_OFFSET: + wval32 =3D le32_to_cpu(val); + dvsec_virt_write32(vdev, dvsec_off, wval32); + break; + case CXL_DVSEC_RANGE1_BASE_LOW_OFFSET: + case CXL_DVSEC_RANGE2_BASE_LOW_OFFSET: + wval32 =3D le32_to_cpu(val); + cxl_range_base_lo_write(vdev, dvsec_off, wval32); + break; + default: + /* RO registers: header, capability, range sizes - discard */ + break; + } + + return count; +} + +/** + * vfio_cxl_setup_dvsec_perms - Install per-device CXL DVSEC read/write ho= oks. + * @vdev: VFIO PCI core device + * + * Called once per device open after vfio_config_init() has seeded vdev->v= config + * from hardware. Installs vfio_cxl_dvsec_readfn and vfio_cxl_dvsec_write= fn + * as per-device DVSEC handlers so that the global ecap_perms[DVSEC] dispa= tcher + * routes reads and writes through CXL-aware emulation. + * + * Forces CXL.io IO_ENABLE in the CONTROL vconfig shadow at init time so t= he + * initial guest read returns the correct value before the first write. + */ +void vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_device *vdev) +{ + u16 ctrl =3D dvsec_virt_read16(vdev, CXL_DVSEC_CONTROL_OFFSET); + + vdev->dvsec_readfn =3D vfio_cxl_dvsec_readfn; + vdev->dvsec_writefn =3D vfio_cxl_dvsec_writefn; + + /* Force IO_ENABLE; cxl_dvsec_control_write() maintains this invariant. */ + ctrl |=3D CXL_DVSEC_CTRL_IO_ENABLE; + dvsec_virt_write16(vdev, CXL_DVSEC_CONTROL_OFFSET, ctrl); +} +EXPORT_SYMBOL_GPL(vfio_cxl_setup_dvsec_perms); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index 19d3dc205f99..a3ff90b7a22c 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -68,13 +68,13 @@ vfio_cxl_create_device_state(struct pci_dev *pdev, u16 = dvsec) * CACHE_CAPABLE is forwarded to the VMM so it knows whether a WBI * sequence is needed before FLR. */ - if (!FIELD_GET(CXL_DVSEC_MEM_CAPABLE, cap_word) || + if (!FIELD_GET(CXL_DVSEC_CAP_MEM_CAPABLE, cap_word) || (pdev->class >> 8) =3D=3D PCI_CLASS_MEMORY_CXL) { devm_kfree(&pdev->dev, cxl); return ERR_PTR(-ENODEV); } =20 - cxl->cache_capable =3D FIELD_GET(CXL_DVSEC_CACHE_CAPABLE, cap_word); + cxl->cache_capable =3D FIELD_GET(CXL_DVSEC_CAP_CACHE_CAPABLE, cap_word); =20 return cxl; } diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index 3458768445af..b86ee691d050 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -76,14 +76,43 @@ struct vfio_pci_cxl_state { #define CXL_HDM_DECODER_GLOBAL_CTRL_POISON_EN_BIT BIT(0) =20 /* - * CXL DVSEC for CXL Devices - register offsets within the DVSEC - * (CXL 4.0 8.1.3). - * Offsets are relative to the DVSEC capability base (cxl->dvsec). + * DVSEC register offsets and per-bit hardware definitions are in + * as CXL_DVSEC_*. The masks below encode + * emulation policy: which bits to ignore, which to preserve separately + * from their raw hardware state. */ -#define CXL_DVSEC_CAPABILITY_OFFSET 0xa -#define CXL_DVSEC_MEM_CAPABLE BIT(2) -/* CXL DVSEC Capability register bit 0: device supports CXL.cache (HDM-DB)= */ -#define CXL_DVSEC_CACHE_CAPABLE BIT(0) +/* DVSEC Control (0x0C): bits 13 (RsvdP) and 15 (RsvdP) are always discard= ed */ +#define CXL_CTRL_RESERVED_MASK (BIT(13) | BIT(15)) +/* bit 12 (P2P_Mem_Enable) treated as reserved if Cap3.P2P_Mem_Capable=3D0= */ +#define CXL_CTRL_P2P_REV_MASK CXL_DVSEC_CTRL_P2P_MEM_ENABLE + +/* DVSEC Status (0x0E): bits 13:0 and 15 are RsvdZ */ +#define CXL_STATUS_RESERVED_MASK (GENMASK(13, 0) | BIT(15)) + +/* + * DVSEC Control2 (0x10) emulation masks. + * + * CXL_CTRL2_HW_BITS_MASK: bits 1 (Initiate_Cache_WBI) and 2 + * (Initiate_CXL_Reset) always read 0 from hardware _ they are write-only + * action triggers per CXL 4.0 _8.1.3.8 Table 8-8. Forward these to the + * device to trigger the hardware action; clear them from vconfig shadow so + * that subsequent guest reads return 0 as hardware requires. + * + * NOTE: bit 0 (Disable_Caching) and bit 3 (CXL_Reset_Mem_Clr_Enable) are + * ordinary RW fields _ they must be preserved in vconfig, not forwarded. + */ +#define CXL_CTRL2_RESERVED_MASK GENMASK(15, 6) +#define CXL_CTRL2_HW_BITS_MASK (BIT(1) | BIT(2)) +/* bit 4 is RsvdP if Cap3.Volatile_HDM_Configurability=3D0 */ +#define CXL_CTRL2_VOLATILE_HDM_REV_MASK CXL_DVSEC_CTRL2_DESIRED_VOLATILE_= HDM +/* bit 5 is RsvdP if Cap2.Mod_Completion_Capable=3D0 */ +#define CXL_CTRL2_MODIFIED_COMP_REV_MASK CXL_DVSEC_CTRL2_MOD_COMPLETION_EN= ABLE + +/* DVSEC Lock (0x14): bits 15:1 are RsvdP */ +#define CXL_LOCK_RESERVED_MASK GENMASK(15, 1) + +/* DVSEC Range Base Low: bits 27:0 are reserved per Tables 8-15/8-19 */ +#define CXL_BASE_LO_RESERVED_MASK CXL_DVSEC_RANGE_BASE_LOW_RSVD_MASK =20 int vfio_cxl_setup_virt_regs(struct vfio_pci_core_device *vdev, struct vfio_pci_cxl_state *cxl, diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci= _config.c index 79aaf270adb2..5708837a6c99 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1085,6 +1085,49 @@ static int __init init_pci_ext_cap_pwr_perm(struct p= erm_bits *perm) return 0; } =20 +/* + * vfio_pci_dvsec_dispatch_read - per-device DVSEC read dispatcher. + * + * Installed as ecap_perms[PCI_EXT_CAP_ID_DVSEC].readfn at module init. + * Calls vdev->dvsec_readfn when a shadow-read handler has been registered + * (e.g. by vfio_cxl_setup_dvsec_perms() for CXL Type-2 devices), otherwise + * continue to vfio_raw_config_read for hardware pass-through. + * + * This indirection allows per-device DVSEC reads from vconfig shadow + * without touching the global ecap_perms[] table. + */ +static int vfio_pci_dvsec_dispatch_read(struct vfio_pci_core_device *vdev, + int pos, int count, + struct perm_bits *perm, + int offset, __le32 *val) +{ + if (vdev->dvsec_readfn) + return vdev->dvsec_readfn(vdev, pos, count, perm, offset, val); + return vfio_raw_config_read(vdev, pos, count, perm, offset, val); +} + +/* + * vfio_pci_dvsec_dispatch_write - per-device DVSEC write dispatcher. + * + * Installed as ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn at module init. + * Calls vdev->dvsec_writefn when a handler has been registered for this + * device (e.g. by vfio_cxl_setup_dvsec_perms() for CXL Type-2 devices), + * otherwise proceed to vfio_raw_config_write so that non-CXL devices + * with a DVSEC capability continue to pass writes to hardware. + * + * This indirection allows per-device DVSEC handlers to be registered + * without touching the global ecap_perms[] table. + */ +static int vfio_pci_dvsec_dispatch_write(struct vfio_pci_core_device *vdev, + int pos, int count, + struct perm_bits *perm, + int offset, __le32 val) +{ + if (vdev->dvsec_writefn) + return vdev->dvsec_writefn(vdev, pos, count, perm, offset, val); + return vfio_raw_config_write(vdev, pos, count, perm, offset, val); +} + /* * Initialize the shared permission tables */ @@ -1121,7 +1164,8 @@ int __init vfio_pci_init_perm_bits(void) ret |=3D init_pci_ext_cap_err_perm(&ecap_perms[PCI_EXT_CAP_ID_ERR]); ret |=3D init_pci_ext_cap_pwr_perm(&ecap_perms[PCI_EXT_CAP_ID_PWR]); ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn =3D vfio_raw_config_write; - ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn =3D vfio_raw_config_write; + ecap_perms[PCI_EXT_CAP_ID_DVSEC].readfn =3D vfio_pci_dvsec_dispatch_read; + ecap_perms[PCI_EXT_CAP_ID_DVSEC].writefn =3D vfio_pci_dvsec_dispatch_writ= e; =20 if (ret) vfio_pci_uninit_perm_bits(); diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 726063b6ff70..96f8361ce6f3 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -147,6 +147,7 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core_= device *vdev); void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *vdev); void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev); void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev); +void vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_device *vdev); =20 #else =20 @@ -158,6 +159,8 @@ static inline void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev) { } static inline void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev) { } +static inline void +vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_device *vdev) { } =20 #endif /* CONFIG_VFIO_CXL_CORE */ =20 diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index cd8ed98a82a3..aa159d0c8da7 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -31,7 +31,7 @@ struct p2pdma_provider; struct dma_buf_phys_vec; struct dma_buf_attachment; struct vfio_pci_cxl_state; - +struct perm_bits; =20 struct vfio_pci_eventfd { struct eventfd_ctx *ctx; @@ -141,6 +141,12 @@ struct vfio_pci_core_device { struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; struct vfio_pci_cxl_state *cxl; + int (*dvsec_readfn)(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 *val); + int (*dvsec_writefn)(struct vfio_pci_core_device *vdev, int pos, + int count, struct perm_bits *perm, + int offset, __le32 val); struct list_head sriov_pfs_item; struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; diff --git a/include/uapi/cxl/cxl_regs.h b/include/uapi/cxl/cxl_regs.h index b6fcae91d216..e9746e75e09a 100644 --- a/include/uapi/cxl/cxl_regs.h +++ b/include/uapi/cxl/cxl_regs.h @@ -59,4 +59,102 @@ #define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i) #define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i) =20 +/* + * CXL r4.0 8.1.3: DVSEC for CXL Devices + * + * Register offsets are relative to the DVSEC capability base address, + * as discovered via PCI_EXT_CAP_ID_DVSEC with DVSEC ID 0x0. + * All registers in this section are 16-bit wide. + */ + +/* DVSEC register offsets */ +#define CXL_DVSEC_CAPABILITY_OFFSET 0x0a +#define CXL_DVSEC_CONTROL_OFFSET 0x0c +#define CXL_DVSEC_STATUS_OFFSET 0x0e +#define CXL_DVSEC_CONTROL2_OFFSET 0x10 +#define CXL_DVSEC_STATUS2_OFFSET 0x12 +#define CXL_DVSEC_LOCK_OFFSET 0x14 +#define CXL_DVSEC_CAPABILITY2_OFFSET 0x16 +#define CXL_DVSEC_RANGE1_SIZE_HIGH_OFFSET 0x18 +#define CXL_DVSEC_RANGE1_SIZE_LOW_OFFSET 0x1c +#define CXL_DVSEC_RANGE1_BASE_HIGH_OFFSET 0x20 +#define CXL_DVSEC_RANGE1_BASE_LOW_OFFSET 0x24 +#define CXL_DVSEC_RANGE2_SIZE_HIGH_OFFSET 0x28 +#define CXL_DVSEC_RANGE2_SIZE_LOW_OFFSET 0x2c +#define CXL_DVSEC_RANGE2_BASE_HIGH_OFFSET 0x30 +#define CXL_DVSEC_RANGE2_BASE_LOW_OFFSET 0x34 +#define CXL_DVSEC_CAPABILITY3_OFFSET 0x38 + +/* DVSEC Range Base Low registers: bits [27:0] are reserved */ +#define CXL_DVSEC_RANGE_BASE_LOW_RSVD_MASK __GENMASK(27, 0) + +/* CXL r4.0 8.1.3.1 Table 8-5 DVSEC CXL Capability (offset 0x0A) */ +#define CXL_DVSEC_CAP_CACHE_CAPABLE _BITUL(0) +#define CXL_DVSEC_CAP_IO_CAPABLE _BITUL(1) +#define CXL_DVSEC_CAP_MEM_CAPABLE _BITUL(2) +#define CXL_DVSEC_CAP_MEM_HW_INIT_MODE _BITUL(3) +#define CXL_DVSEC_CAP_HDM_COUNT_MASK __GENMASK(5, 4) +#define CXL_DVSEC_CAP_CACHE_WBI_CAPABLE _BITUL(6) +#define CXL_DVSEC_CAP_CXL_RESET_CAPABLE _BITUL(7) +#define CXL_DVSEC_CAP_CXL_RESET_TIMEOUT_MASK __GENMASK(10, 8) +#define CXL_DVSEC_CAP_CXL_RESET_MEM_CLR_CAPABLE _BITUL(11) +#define CXL_DVSEC_CAP_TSP_CAPABLE _BITUL(12) +#define CXL_DVSEC_CAP_MLD_CAPABLE _BITUL(13) +#define CXL_DVSEC_CAP_VIRAL_CAPABLE _BITUL(14) +#define CXL_DVSEC_CAP_PM_INIT_REPORTING_CAPABLE _BITUL(15) + +/* CXL r4.0 8.1.3.2 Table 8-6 DVSEC CXL Control (offset 0x0C) */ +#define CXL_DVSEC_CTRL_CACHE_ENABLE _BITUL(0) +#define CXL_DVSEC_CTRL_IO_ENABLE _BITUL(1) +#define CXL_DVSEC_CTRL_MEM_ENABLE _BITUL(2) +#define CXL_DVSEC_CTRL_CACHE_SF_COVERAGE_MASK __GENMASK(7, 3) +#define CXL_DVSEC_CTRL_CACHE_SF_GRANULARITY_MASK __GENMASK(10, 8) +#define CXL_DVSEC_CTRL_CACHE_CLEAN_EVICTION _BITUL(11) +#define CXL_DVSEC_CTRL_P2P_MEM_ENABLE _BITUL(12) +/* bit 13: RsvdP */ +#define CXL_DVSEC_CTRL_VIRAL_ENABLE _BITUL(14) +/* bit 15: RsvdP */ + +/* CXL r4.0 8.1.3.3 Table 8-7 DVSEC CXL Status (offset 0x0E) */ +/* bits 13:0 =3D RsvdZ */ +#define CXL_DVSEC_STATUS_VIRAL_STATUS _BITUL(14) +/* bit 15 =3D RsvdZ */ + +/* CXL r4.0 8.1.3.4 Table 8-8 DVSEC CXL Control2 (offset 0x10) */ +#define CXL_DVSEC_CTRL2_DISABLE_CACHING _BITUL(0) +#define CXL_DVSEC_CTRL2_INITIATE_CACHE_WBI _BITUL(1) +#define CXL_DVSEC_CTRL2_INITIATE_CXL_RESET _BITUL(2) +#define CXL_DVSEC_CTRL2_CXL_RESET_MEM_CLR_ENABLE _BITUL(3) +#define CXL_DVSEC_CTRL2_DESIRED_VOLATILE_HDM _BITUL(4) +#define CXL_DVSEC_CTRL2_MOD_COMPLETION_ENABLE _BITUL(5) +/* bits 15:6 =3D RsvdP */ + +/* CXL r4.0 8.1.3.5 Table 8-9 DVSEC CXL Status2 (offset 0x12) */ +#define CXL_DVSEC_STATUS2_CACHE_INVALID _BITUL(0) +#define CXL_DVSEC_STATUS2_CXL_RESET_COMPLETE _BITUL(1) +#define CXL_DVSEC_STATUS2_CXL_RESET_ERROR _BITUL(2) +/* RW1CS; RsvdZ if Cap3.Volatile_HDM_Configurability=3D0 */ +#define CXL_DVSEC_STATUS2_VOLATILE_HDM_PRES_ERROR _BITUL(3) +/* bits 14:4 =3D RsvdZ */ +#define CXL_DVSEC_STATUS2_PM_INIT_COMPLETION _BITUL(15) + +/* CXL r4.0 _8.1.3.6 Table 8-10 _ DVSEC CXL Lock (offset 0x14) */ +#define CXL_DVSEC_LOCK_CONFIG_LOCK _BITUL(0) +/* bits 15:1 =3D RsvdP */ + +/* CXL r4.0 8.1.3.7 Table 8-11 DVSEC CXL Capability2 (offset 0x16) */ +#define CXL_DVSEC_CAP2_CACHE_SIZE_UNIT_MASK __GENMASK(3, 0) +#define CXL_DVSEC_CAP2_FALLBACK_CAPABILITY_MASK __GENMASK(5, 4) +#define CXL_DVSEC_CAP2_MOD_COMPLETION_CAPABLE _BITUL(6) +#define CXL_DVSEC_CAP2_NO_CLEAN_WRITEBACK _BITUL(7) +#define CXL_DVSEC_CAP2_CACHE_SIZE_MASK __GENMASK(15, 8) + +/* CXL r4.0 8.1.3.14 Table 8-20 DVSEC CXL Capability3 (offset 0x38) */ +#define CXL_DVSEC_CAP3_DEFAULT_VOLATILE_HDM_COLD_RESET _BITUL(0) +#define CXL_DVSEC_CAP3_DEFAULT_VOLATILE_HDM_WARM_RESET _BITUL(1) +#define CXL_DVSEC_CAP3_DEFAULT_VOLATILE_HDM_HOT_RESET _BITUL(2) +#define CXL_DVSEC_CAP3_VOLATILE_HDM_CONFIGURABILITY _BITUL(3) +#define CXL_DVSEC_CAP3_P2P_MEM_CAPABLE _BITUL(4) +/* bits 15:5 =3D RsvdP */ + #endif /* _UAPI_CXL_REGS_H_ */ --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azon11012054.outbound.protection.outlook.com [52.101.48.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E38BA429806; Wed, 1 Apr 2026 14:42:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.48.54 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054529; cv=fail; b=iJb7Czv/Ky8A09RL8c/RBLJ5H0ot90suhJy7cy9I+1eieGRqmsy10TA6Ox83aapBT2X2/7VWJ/Gkx3CNzIV66qRkB7Yus7s+uqF5zmKis7g1pYtuemZBwb8TPhd0fuM+dYMpzky/08eUyJcVWeJ8V0JWSimSutU28DZs3a6tv6A= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054529; c=relaxed/simple; bh=nr2QnWoBN/0g5cP9umDBp103Ls6oPH+cAhV88eIuOYQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=t+HveK8Ng/x7TTVG3gZgJLABHbZduukBr+HmEdox20JMJ5cgXBg0I375fOvR0C5BRE7NyxCVoVkiYcScMSi7GHDXX5sEMsFdp5IUUCwFieaDSoGa+66+IP/SG7hTo2i3n93oubaZiKRjHZyMNf/kRpQGpwVwa2lcFH+acXMSRB8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=qpe6hAXG; arc=fail smtp.client-ip=52.101.48.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="qpe6hAXG" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=vXZWbxEDnDcYK3cxuiPmakMXHC3ymmANhs4cVE0UnmDwmS01wesBVxLxJhUUUCHpCPD53yGEDsyAwmebjRF5W/3QJHlFyl5lArECO2qCfJkwyk0zg5AtXaGrb69yIE185vHrZn8DDugIZGyTpn8xlFSu61w2WVPbIRpPfiUp0moNPR9yQJIlSsInSjgGTqikpkiaPzWmjwyKKa2tzc12FDyHbsZE4aMnQxMFyeGcza9KUchst8YxJ2J3M/Q+LoNT+nHDpfvbnmOqWT5LkctmssqwmZ8ajUsfQaaqk64V09s2g7VCF71wEgN0uzPVqhTehKu23ACm8r2H057JDmApuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ppzCk7ulIkbQ5RJzgPPsc1aR4Kl4JNvXEdzbs/itF+Y=; b=SsE+oovnyDscWUour/bxDy6Pl69YaZonfcqOD2L2SQhHrjUD3eqNTsNso3IF6KyYNw2PD1H3BGZ2cd7ADrzyWkcz8JvlyRUCL1xr7sBqOiq0KBF7zVnj13LR9h/XEMCBWpG5SdA1RRscA8iFksO+16ePzuCawhxb2v+O7LT0EJyiDRriAkbZ2SUJP3RhC/f5NSi1LUDD/k9nV2sFihMzuTuESZ5gR2jxb+L6WQ7xEjCBxgP9xJ3ChvnqUMiU84LvfVK5le8pdgCkwDSnDk7ZkmPbCGGb/Tfz4TojicBFZSd2Meq78TCbdgPmODdOn903Be4KEs35i7n+ET5U4Gm6lQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ppzCk7ulIkbQ5RJzgPPsc1aR4Kl4JNvXEdzbs/itF+Y=; b=qpe6hAXGSn1Zrd588FWyiM98yYZX40wSqY9t5Jj4p+XscqiGgeZrpCx+NeCSksbkWvli2M0al/zKRxuLhV0ut/n/yPB9pfhToKPMR86A7iLmSUVbXK+QG2MazcsKPilrwhi2uQHYbYdSvHC0i68RhBgyMVqKCzCI1IN7NkbDziZGE0Tzi4foaL1fYd7vMuGWqE/2gJZWVSDpxPglDQZ/h8g1npPYjsGrFrdM0fAcLqB+hFiK2rZ14lrIkp3NdOdaMDqWEj0vfp26SvL0eMC/VSCChRLe+cNP/6+AlrqDTKeWnXRKr5o2KI4Wyzaqs/5scY5+sywe1vlKuaF6T5u7Mw== Received: from BY5PR04CA0019.namprd04.prod.outlook.com (2603:10b6:a03:1d0::29) by IA0PPF84D37DD5C.namprd12.prod.outlook.com (2603:10b6:20f:fc04::bd6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 14:42:01 +0000 Received: from SJ5PEPF000001D6.namprd05.prod.outlook.com (2603:10b6:a03:1d0:cafe::ad) by BY5PR04CA0019.outlook.office365.com (2603:10b6:a03:1d0::29) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 14:42:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SJ5PEPF000001D6.mail.protection.outlook.com (10.167.242.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:42:00 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:37 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:29 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 16/20] vfio/cxl: Register regions with VFIO layer Date: Wed, 1 Apr 2026 20:09:13 +0530 Message-ID: <20260401143917.108413-17-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D6:EE_|IA0PPF84D37DD5C:EE_ X-MS-Office365-Filtering-Correlation-Id: 08060c6d-7719-45ce-7f05-08de8ffcd56d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700016|1800799024|376014|7416014|921020|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: X2A+A1auVVdEU79iOl2fv8s80LGiP41KtxijqoFp8JIwak6Y1FliX0YIGCuDjX+OCMdn2DaQfFqGJmb/L1vIJXslsEenajq7cxVtt/V7u60QR0DU4YjUCPGwVFm4D/ZQYeX4NsQ73gD40JhaVSFgmM94gdzDpM7h7E0iw8e0oLLg69GKDqJlNG//DzZ06R/6+2vjIgdEjEuPKthgoo7TqF+T5zqZMlBRNKmQB1NRw3VuRJ4weBjXtv7gSvuE0yzHygwCE0urdyHOORv9pYMCdiLZvyh9UC3E4l1DJcSDRl4VCIjIgockazJS25LU0SxFmEuCHhHa3oqJBh31hrhQzoWf6aKIgCC1XteakhCXBeh12nJ0Z4gT9zxoceG+qGtQXkFxz4RdCuDoq0zHr1q1UM84wCBcN8YQ/YHFd7MBcHGkMO7JlV3qWl2erhqr3UyUsg2WFaySzQDXZ00aIHtyW8eZ0swWYW23A696s5ut8lvxqpW5FlSLkmFh/AQ8QxH7cSbsMbv+973Ks3p3crJ6D8iuogFDHK9RnKujQZ6exY3jk1NDP6leimI83W/J8FU5QC+fzR5epvFybKwFULs9jTeJwKDBY5PY0nOVwtglaW6VHxgFkf7bJRZkTbWgi7lRr5dEZ4nXkilIP15hU165a2FUN8+vDDY8Ub5W+c/MoUZrlx+MVqSjkZtVHmhmxDnq7RQJHudxuzLZgHqUaFUObbAEpszwL/ddCFpUgFpS6DJ03an1zOaZqPXGbKWrLIAt/s+CY3JfflsXy2NSwawniOfPSDs0USN68tqOwDnuFXmczgsRn+slQJRXYMM8526H X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(36860700016)(1800799024)(376014)(7416014)(921020)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3r+d70GQkjlrbbqJqfJls0Ovbe8CD7RcdvAwkjuFghEdykELqYVTLryEO9pXfdHo1153d8G1emttnOTMx1ImcPBiOzaG9CgE6eJVnmogSDssgZ0IOxtF8nKLwo2UXu36Jft3PnsV5eCZcX/F8FxSQHVPD/Fgsn7u/AhMRRk7o1n9+vjqXlIG3U3AfnMtQwFiYPioXM5mENhGdSHe7fC6yn/59IaW9H2snbpVs+CS6mFkDLzMck48rJBYKy7pN33WMKVWRMJ5ZyTYnhJyO222UkVyEnCKtYalwuL0zWOxFJK3njoLgr8YJVe4xgPOSqQxGAKW67Nn9+GtwPd3C/yXdAtAHzSLD8kQBDRgvw0JGUCmfyzVEcwHmgTOIFhfIN8w/jU9/yYi1DYhQkbEIn8+CTqCVhGxEY1/+URjtvni+dJMDHV3xndEBsjMWH0OLziy X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:42:00.6537 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 08060c6d-7719-45ce-7f05-08de8ffcd56d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D6.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PPF84D37DD5C Content-Type: text/plain; charset="utf-8" From: Manish Honap Register the DPA and component register region with VFIO layer. Region indices for both these regions are cached for quick lookup. vfio_cxl_register_cxl_region() - memremap(WB) the region HPA (treat CXL.mem as RAM, not MMIO) - Register VFIO_REGION_SUBTYPE_CXL - Records dpa_region_idx. vfio_cxl_register_comp_regs_region() - Registers VFIO_REGION_SUBTYPE_CXL_COMP_REGS with size hdm_reg_offset + hdm_reg_size - Records comp_reg_region_idx. Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 98 +++++++++++++++++++++++++++- drivers/vfio/pci/cxl/vfio_cxl_emu.c | 34 ++++++++++ drivers/vfio/pci/cxl/vfio_cxl_priv.h | 2 + drivers/vfio/pci/vfio_pci.c | 23 +++++++ drivers/vfio/pci/vfio_pci_priv.h | 11 ++++ 5 files changed, 167 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index a3ff90b7a22c..b38a04301660 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -75,6 +75,8 @@ vfio_cxl_create_device_state(struct pci_dev *pdev, u16 dv= sec) } =20 cxl->cache_capable =3D FIELD_GET(CXL_DVSEC_CAP_CACHE_CAPABLE, cap_word); + cxl->dpa_region_idx =3D -1; + cxl->comp_reg_region_idx =3D -1; =20 return cxl; } @@ -509,14 +511,19 @@ static int vfio_cxl_region_mmap(struct vfio_pci_core_= device *vdev, */ void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev) { + struct vfio_device *core_vdev =3D &vdev->vdev; struct vfio_pci_cxl_state *cxl =3D vdev->cxl; =20 lockdep_assert_held_write(&vdev->memory_lock); =20 - if (!cxl) + if (!cxl || cxl->dpa_region_idx < 0) return; =20 WRITE_ONCE(cxl->region_active, false); + unmap_mapping_range(core_vdev->inode->i_mapping, + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_NUM_REGIONS + + cxl->dpa_region_idx), + cxl->region_size, true); } =20 /* @@ -601,6 +608,7 @@ static ssize_t vfio_cxl_region_rw(struct vfio_pci_core_= device *core_dev, static void vfio_cxl_region_release(struct vfio_pci_core_device *vdev, struct vfio_pci_region *region) { + struct vfio_device *core_vdev =3D &vdev->vdev; struct vfio_pci_cxl_state *cxl =3D region->data; =20 /* @@ -610,6 +618,16 @@ static void vfio_cxl_region_release(struct vfio_pci_co= re_device *vdev, */ WRITE_ONCE(cxl->region_active, false); =20 + /* + * Remove all user mappings of the DPA region while the device is + * still alive. + */ + if (cxl->dpa_region_idx >=3D 0) + unmap_mapping_range(core_vdev->inode->i_mapping, + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_NUM_REGIONS + + cxl->dpa_region_idx), + cxl->region_size, true); + if (cxl->region_vaddr) { memunmap(cxl->region_vaddr); cxl->region_vaddr =3D NULL; @@ -622,4 +640,82 @@ static const struct vfio_pci_regops vfio_cxl_regops = =3D { .release =3D vfio_cxl_region_release, }; =20 +int vfio_cxl_register_cxl_region(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u32 flags; + int ret; + + if (!cxl) + return -ENODEV; + + if (!cxl->region || cxl->region_vaddr) + return -ENODEV; + + /* + * CXL device memory is RAM, not MMIO. Use memremap() rather than + * ioremap_cache() so the correct memory-mapping API is used. + * The WB attribute matches the cache-coherent nature of CXL.mem. + */ + cxl->region_vaddr =3D memremap(cxl->region_hpa, cxl->region_size, + MEMREMAP_WB); + if (!cxl->region_vaddr) + return -ENOMEM; + + flags =3D VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE | + VFIO_REGION_INFO_FLAG_MMAP; + + ret =3D vfio_pci_core_register_dev_region(vdev, + PCI_VENDOR_ID_CXL | + VFIO_REGION_TYPE_PCI_VENDOR_TYPE, + VFIO_REGION_SUBTYPE_CXL, + &vfio_cxl_regops, + cxl->region_size, flags, + cxl); + if (ret) { + memunmap(cxl->region_vaddr); + cxl->region_vaddr =3D NULL; + return ret; + } + + /* + * Cache the vdev->region[] index before activating the region. + * vfio_pci_core_register_dev_region() placed the new entry at + * vdev->region[num_regions - 1] and incremented num_regions. + * vfio_cxl_zap_region_locked() uses this to avoid scanning + * vdev->region[] on every FLR. + */ + cxl->dpa_region_idx =3D vdev->num_regions - 1; + + vfio_cxl_reinit_comp_regs(cxl); + + WRITE_ONCE(cxl->region_active, true); + + return 0; +} +EXPORT_SYMBOL_GPL(vfio_cxl_register_cxl_region); + +/** + * vfio_cxl_unregister_cxl_region - Undo vfio_cxl_register_cxl_region() + * @vdev: VFIO PCI device + * + * Marks the DPA region inactive and resets dpa_region_idx. + * Does NOT touch CXL subsystem state (cxl->region, cxl->cxled, cxl->cxlrd= ). + * The caller must call vfio_cxl_destroy_cxl_region() separately to release + * those objects. + */ +void vfio_cxl_unregister_cxl_region(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + + if (!cxl || cxl->dpa_region_idx < 0) + return; + + WRITE_ONCE(cxl->region_active, false); + + cxl->dpa_region_idx =3D -1; +} +EXPORT_SYMBOL_GPL(vfio_cxl_unregister_cxl_region); + MODULE_IMPORT_NS("CXL"); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_emu.c b/drivers/vfio/pci/cxl/vfi= o_cxl_emu.c index 781328a79b43..50d3718b101d 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_emu.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_emu.c @@ -473,3 +473,37 @@ void vfio_cxl_clean_virt_regs(struct vfio_pci_cxl_stat= e *cxl) kfree(cxl->comp_reg_virt); cxl->comp_reg_virt =3D NULL; } + +/* + * vfio_cxl_register_comp_regs_region - Register the COMP_REGS device regi= on. + * + * Exposes the emulated HDM decoder register state as a VFIO device region + * with type VFIO_REGION_SUBTYPE_CXL_COMP_REGS. QEMU attaches a + * notify_change callback to this region to intercept HDM COMMIT writes + * and map the DPA MemoryRegion at the appropriate GPA. + * + * The region is read+write only (no mmap) to ensure all accesses pass + * through comp_regs_dispatch_write() for proper bit-field enforcement. + */ +int vfio_cxl_register_comp_regs_region(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + u32 flags =3D VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + int ret; + + if (!cxl || !cxl->comp_reg_virt) + return -ENODEV; + + ret =3D vfio_pci_core_register_dev_region(vdev, + PCI_VENDOR_ID_CXL | + VFIO_REGION_TYPE_PCI_VENDOR_TYPE, + VFIO_REGION_SUBTYPE_CXL_COMP_REGS, + &vfio_cxl_comp_regs_ops, + cxl->hdm_reg_offset + + cxl->hdm_reg_size, flags, cxl); + if (!ret) + cxl->comp_reg_region_idx =3D vdev->num_regions - 1; + + return ret; +} +EXPORT_SYMBOL_GPL(vfio_cxl_register_comp_regs_region); diff --git a/drivers/vfio/pci/cxl/vfio_cxl_priv.h b/drivers/vfio/pci/cxl/vf= io_cxl_priv.h index b86ee691d050..b884689a1226 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_priv.h +++ b/drivers/vfio/pci/cxl/vfio_cxl_priv.h @@ -28,6 +28,8 @@ struct vfio_pci_cxl_state { __le32 *comp_reg_virt; size_t dpa_size; void __iomem *hdm_iobase; + int dpa_region_idx; + int comp_reg_region_idx; u16 dvsec_len; u8 hdm_count; u8 comp_reg_bar; diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 0c771064c0b8..22cf9ea831f9 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -120,6 +120,29 @@ static int vfio_pci_open_device(struct vfio_device *co= re_vdev) } } =20 + if (vdev->cxl) { + /* + * pci_config_map and vconfig are valid now (allocated by + * vfio_config_init() inside vfio_pci_core_enable() above). + */ + vfio_cxl_setup_dvsec_perms(vdev); + + ret =3D vfio_cxl_register_cxl_region(vdev); + if (ret) { + pci_warn(pdev, "Failed to setup CXL region\n"); + vfio_pci_core_disable(vdev); + return ret; + } + + ret =3D vfio_cxl_register_comp_regs_region(vdev); + if (ret) { + pci_warn(pdev, "Failed to register COMP_REGS region\n"); + vfio_cxl_unregister_cxl_region(vdev); + vfio_pci_core_disable(vdev); + return ret; + } + } + vfio_pci_core_finish_enable(vdev); =20 return 0; diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index 96f8361ce6f3..ae0091d5096c 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -148,6 +148,9 @@ void vfio_pci_cxl_cleanup(struct vfio_pci_core_device *= vdev); void vfio_cxl_zap_region_locked(struct vfio_pci_core_device *vdev); void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev); void vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_device *vdev); +int vfio_cxl_register_cxl_region(struct vfio_pci_core_device *vdev); +void vfio_cxl_unregister_cxl_region(struct vfio_pci_core_device *vdev); +int vfio_cxl_register_comp_regs_region(struct vfio_pci_core_device *vdev); =20 #else =20 @@ -161,6 +164,14 @@ static inline void vfio_cxl_reactivate_region(struct vfio_pci_core_device *vdev) { } static inline void vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_device *vdev) { } +static inline int +vfio_cxl_register_cxl_region(struct vfio_pci_core_device *vdev) +{ return 0; } +static inline void +vfio_cxl_unregister_cxl_region(struct vfio_pci_core_device *vdev) { } +static inline int +vfio_cxl_register_comp_regs_region(struct vfio_pci_core_device *vdev) +{ return 0; } =20 #endif /* CONFIG_VFIO_CXL_CORE */ =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012063.outbound.protection.outlook.com [52.101.43.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 601CB477E57; Wed, 1 Apr 2026 14:42:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.63 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054538; cv=fail; b=qRKAwIFJPo2Q0KjfeXKD7QUgEEjNNxMkmm4QQPQO3MSDLXxqpTh3tji0IItV00bNFAN/YB/IIlS765AI7sOVySUtFzW5RqKwqCx0YoVIzyjR7SzUTcVHJrN7lM67DbOsRphmUiMCkmrYEqlgx9IGFjB9bTqQFK2f3n6J4O401Dc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054538; c=relaxed/simple; bh=Q8L38PngQ/l5jlQqLw0Qz6hxnRAFOagekruDrGlHjaY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=i/5fSazTOiz034ONDqZX6HVTw1Xm1QsqHrREAs2gf+OgH+QPm40JPXM2ZKYwfyqxROj1aHVBs4b9TrBZ2D5NLtD/t9EHDrshbwx1pt5GlycdBy1BuddEICSvX/RZlrAhSxhB3n1NXZNT7XpcFYKYkwuECyVaLUa7qQN4eLUBQWA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=q6Wj6AIO; arc=fail smtp.client-ip=52.101.43.63 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="q6Wj6AIO" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=QRS1GNLl8eMsLeJzJbFC74AzeW7ufe4n8IAqJubKdeJ4Nd1RIhdugMVoobibRnRQMBFVy01solc73JVj3X97p9qHmzsyWmaQdBeZSPz8bFdl4sIsF5v9LdVq+MK00YOOYbPiH1R+ZPz3BVuePfbQTDesqEUkLDY4xVnmLDPvEOrBvBF/ID56edUaeB5f/TTcszfxWhiYbFCXMQr9+NExDQpQi792H4OxwP9v8/XL58TWf+Z5FaTfhNV3MEgII7d958zUpN7WY5GuaRNYarqDRp5ALDUBRax/tVaCsb4Ksh2NBhjfi2sFpMirdeIyjarGIEZP0Li1Jta8udK8Y0uaAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eRo3g7TSxllD68CKogqDR9t9boDiDW4RxoBnuNW9ZjM=; b=JUiQpzA5k1ofwmmiCu1GeYUDP5BpMCC3DmwdFQ58knATeRUrxQ+rEOsGuvpV60DCIpsBl+IZplJId4B4sj3lAEnqyuslqtDzxlDudvTczt3gSVbVfCuMvq4zj8vUhVy2Vi71rBF62cqxiN3Kk85rC4fmzj4OSng5FICNZdIaHtSX6tKJ/eYHGOlM6Oo2ufkh1+Q7IEmml9G5W0ik0hldn+HYqmS7ruSxEIPwHs28zxpfzK5YHLcVRR3itHL5Jumy/tfMX2a9cpfXiva7Ps24RVFVxER9c3lE2R52oM6Qi2nEBeO/O8lSmyPOIg5hUa0VCQ50EPz+YcFbU0wayDoyCQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eRo3g7TSxllD68CKogqDR9t9boDiDW4RxoBnuNW9ZjM=; b=q6Wj6AIOTHMyWdXFpUEk1LkjcVTDmTLhuZsr5V+Rw5YTKeC6YeqCjLkLzO4m8aTtf9aviWdydPW43di2wxllOP3Cxne+Xo2gkuls2vMHVAMc0JoLlUkcFCar8gLECAH0pE9oWtT+OqQrLHNo1MLmUECLw4m/bunGNZ2QIHKUdWBHi1btFMWZ6TEJcp0MlUqLpmHmZ/8ndA/Rvqemdrld0bxnLU5yOTEqQTPQNJRKDpsN9z/emC2XfZkmzfatXM+f6Qpbgw9mRHjY9cdCElWr7LSHB+3Q9GpzRf+89Q19MyigNqJvxdeYXiZV8iSBa6qPYWP8Fi5YgahPRAy5TeQmcg== Received: from BN9PR03CA0244.namprd03.prod.outlook.com (2603:10b6:408:ff::9) by DM4PR12MB7693.namprd12.prod.outlook.com (2603:10b6:8:103::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.12; Wed, 1 Apr 2026 14:42:08 +0000 Received: from BN1PEPF00006001.namprd05.prod.outlook.com (2603:10b6:408:ff:cafe::1a) by BN9PR03CA0244.outlook.office365.com (2603:10b6:408:ff::9) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:42:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006001.mail.protection.outlook.com (10.167.243.233) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:42:07 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:44 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:36 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 17/20] vfio/pci: Advertise CXL cap and sparse component BAR to userspace Date: Wed, 1 Apr 2026 20:09:14 +0530 Message-ID: <20260401143917.108413-18-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006001:EE_|DM4PR12MB7693:EE_ X-MS-Office365-Filtering-Correlation-Id: 3a526783-c39f-4a3b-7cdd-08de8ffcd9ab X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|7416014|36860700016|1800799024|921020|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: dSbq/qmbreBdsh/X/osRac92Q4e0oAz6WLqv6aDiEaCpaO9wHccvJ80S9Yv5YE6KsnC/1S6CKPoKgoWkhqqU9BZ91XwjcEFJ++Y5O7hrZgcmEqoPPrxPq9bKkU1bCS/XT2/4TQhGABsMfcZ63ydYF1BnJSlZLyg7pKhcdsORQEAKRq4o8QXrtBjT8ZWgHG2OckF4wCRb+RnBkJtMW2aHBMdX2hOlr3KFOmTdW3dVR/X/EqQ2zKWT5RDCl31jQi2cvfjek0g3DUJvoJT43QTgec7sKcN3RGckFvrQk/88hf56grBgkHV80g/B2RpZjXSdUkzNQ88mRu7RhK76J5Nuo9q7sKe1oCAdU+gLlwUSkxM9+XxzAy9LH6P+6ed2YkU83gsFSmvsJ7Ma2ipV7q4iqXWaW8Au1iuOIP0UN+Nq5Fky7Y1LdnxbYBkSnMYxiSIUDrRLq4dfbYHxoXTu/VPtEANio/7StrsEABfcjGLnZ6Dmp4wPYst1CIkYBe31ruqfJAaB8l3v0b7L7hCs5wbuqbdzeR2ie3KQNsNtqsL8h+FX8833S/VRJFRFLgOzvw5Hh3NTFv06NqaFLTdUnbqKjFHSLLk6coejV1Zy0NUm+slZG5obzn/knBZz9rIESVi8i3UTprxofBrc/nFEJ7AyezNwaffkTAYuWMkz59pvawP2xWKkv0bHC07Z8tzPj8v+uG5er7S6BWnB1nlbd6UNENd8F70pUlfVvnwyVJ9SHqpQYUgOUaXRz5pWMhwq1cYOKR4EOgnyAOtNSc8sE2YamNx11RP8uXBKsNFMoEn6O1dgOeb7LqMHGm2vTtqaawDP X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(376014)(7416014)(36860700016)(1800799024)(921020)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: HqQx3YCHbZVLCr22rV7ElC7F7baeXHwpIqfuyDPtjeV0gACERuCip8Wzyjuwyi9DJP8kN+PZ9Cv/M5Us7B62j1Yynsa9HqV0Aqh6flN+eGqAW/Ad4F/xwgpu/fRlurZiJqdkD86Kx9ZcifN6CQyYhKo7s7MSFsZnGZqjZs6zLjjdVtiUE0AwwnxDCZ3TMhfLHrWrM8NsS0PVBwK3z+CDp1KUdOZrTFlf3N1IpAp80XHqXVAv35exD9/iwc4V69ahrDzwmLy+d2NH9Vdh+/5sGwI23nb6qEdWlogtooUHdzA6I7a+45JnSFKIN7YhZhs1Mv0Wod10zerjXma/mlcr7xzTx2UzJaFVzc9Myx/6x1hc8qmTPk7rL/i7MyWL5UGnG1fApYGQ41AlZqjNcfcZty81SOFUrGfux5dng6yrA5ySIDVxuLhk76tBt0gGuYvu X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:42:07.7752 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3a526783-c39f-4a3b-7cdd-08de8ffcd9ab X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006001.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB7693 Content-Type: text/plain; charset="utf-8" From: Manish Honap Expose CXL device capability through the VFIO device info ioctl and give userspace access to the GPU/accelerator register windows in the component BAR while protecting the CXL component register block. vfio_cxl_get_info() fills VFIO_DEVICE_INFO_CAP_CXL with the HDM register BAR index and byte offset, commit flags, and VFIO region indices for the DPA and COMP_REGS regions. HDM decoder count and the HDM block offset within COMP_REGS are not populated; both are derivable from the CXL Capability Array in the COMP_REGS region itself. vfio_cxl_get_region_info() handles VFIO_DEVICE_GET_REGION_INFO for the component register BAR. It builds a sparse-mmap capability that advertises only the GPU/accelerator register windows, carving out the CXL component register block. Three physical layouts are handled: Topology A comp block at BAR end: one area [0, comp_reg_offset) Topology B comp block at BAR start: one area [comp_end, bar_len) Topology C comp block in the middle: two areas, one on each side vfio_cxl_mmap_overlaps_comp_regs() checks whether an mmap request overlaps [comp_reg_offset, comp_reg_offset + comp_reg_size). vfio_pci_core_mmap() calls it to reject access to the component register block while allowing mmap of the GPU register windows in the sparse capability. This replaces the earlier blanket rejection of any mmap on the component BAR index. Hook both helpers into vfio_pci_ioctl_get_info() and vfio_pci_ioctl_get_region_info() in vfio_pci_core.c. The component BAR cannot be claimed exclusively since the CXL subsystem holds persistent sub-range iomem claims during HDM decoder setup. pci_request_selected_regions() returns EBUSY; pass bars=3D0 to skip the request and map directly via pci_iomap(). Physical ownership is assured by driver binding. Signed-off-by: Zhi Wang Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 155 +++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_core.c | 31 +++++- drivers/vfio/pci/vfio_pci_priv.h | 24 +++++ drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++- 4 files changed, 221 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index b38a04301660..46430cbfa962 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -21,6 +21,161 @@ #include "../vfio_pci_priv.h" #include "vfio_cxl_priv.h" =20 +u8 vfio_cxl_get_component_reg_bar(struct vfio_pci_core_device *vdev) +{ + return vdev->cxl->comp_reg_bar; +} + +int vfio_cxl_get_region_info(struct vfio_pci_core_device *vdev, + struct vfio_region_info *info, + struct vfio_info_cap *caps) +{ + unsigned long minsz =3D offsetofend(struct vfio_region_info, offset); + struct vfio_region_info_cap_sparse_mmap *sparse; + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + resource_size_t bar_len, comp_end; + u32 nr_areas, cap_size; + int ret; + + if (!cxl) + return -ENOTTY; + + if (!info) + return -ENOTTY; + + if (info->argsz < minsz) + return -EINVAL; + + if (info->index !=3D cxl->comp_reg_bar) + return -ENOTTY; + + /* + * The device state is not fully initialised; + * fall through to the default BAR handler. + */ + if (!cxl->comp_reg_size) + return -ENOTTY; + + bar_len =3D pci_resource_len(vdev->pdev, info->index); + comp_end =3D cxl->comp_reg_offset + cxl->comp_reg_size; + + /* + * Advertise the GPU/accelerator register windows as mmappable by + * carving the CXL component register block out of the BAR. The + * number of sparse areas depends on where the block sits: + * + * [A] comp block at BAR end [gpu_regs | comp_regs]: + * comp_reg_offset > 0 && comp_end =3D=3D bar_len + * =3D 1 area: [0, comp_reg_offset) + * + * [B] comp block at BAR start [comp_regs | gpu_regs]: + * comp_reg_offset =3D=3D 0 && comp_end < bar_len + * =3D 1 area: [comp_end, bar_len) + * + * [C] comp block in middle [gpu_regs | comp_regs | gpu_regs]: + * comp_reg_offset > 0 && comp_end < bar_len + * =3D 2 areas: [0, comp_reg_offset) and [comp_end, bar_len) + */ + if (cxl->comp_reg_offset > 0 && comp_end < bar_len) + nr_areas =3D 2; + else + nr_areas =3D 1; + + cap_size =3D struct_size(sparse, areas, nr_areas); + sparse =3D kzalloc(cap_size, GFP_KERNEL); + if (!sparse) + return -ENOMEM; + + sparse->header.id =3D VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version =3D 1; + sparse->nr_areas =3D nr_areas; + + if (nr_areas =3D=3D 2) { + /* [C]: window before and after comp block */ + sparse->areas[0].offset =3D 0; + sparse->areas[0].size =3D cxl->comp_reg_offset; + sparse->areas[1].offset =3D comp_end; + sparse->areas[1].size =3D bar_len - comp_end; + } else if (cxl->comp_reg_offset =3D=3D 0) { + /* [B]: comp block at BAR start, window follows */ + sparse->areas[0].offset =3D comp_end; + sparse->areas[0].size =3D bar_len - comp_end; + } else { + /* [A]: comp block at BAR end, window precedes */ + sparse->areas[0].offset =3D 0; + sparse->areas[0].size =3D cxl->comp_reg_offset; + } + + ret =3D vfio_info_add_capability(caps, &sparse->header, cap_size); + kfree(sparse); + if (ret) + return ret; + + info->offset =3D VFIO_PCI_INDEX_TO_OFFSET(info->index); + info->size =3D bar_len; + info->flags =3D VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE | + VFIO_REGION_INFO_FLAG_MMAP; + + return 0; +} + +bool vfio_cxl_mmap_overlaps_comp_regs(struct vfio_pci_core_device *vdev, + u64 req_start, u64 req_len) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + + if (!cxl->comp_reg_size) + return false; + + return req_start < cxl->comp_reg_offset + cxl->comp_reg_size && + req_start + req_len > cxl->comp_reg_offset; +} + +int vfio_cxl_get_info(struct vfio_pci_core_device *vdev, + struct vfio_info_cap *caps) +{ + struct vfio_pci_cxl_state *cxl =3D vdev->cxl; + struct vfio_device_info_cap_cxl cxl_cap =3D {0}; + + if (!cxl) + return 0; + + /* + * Device is not fully initialised? + */ + if (WARN_ON(cxl->dpa_region_idx < 0 || cxl->comp_reg_region_idx < 0)) + return -ENODEV; + + /* Fill in from CXL device structure */ + cxl_cap.header.id =3D VFIO_DEVICE_INFO_CAP_CXL; + cxl_cap.header.version =3D 1; + /* + * COMP_REGS region starts at comp_reg_offset + CXL_CM_OFFSET within + * the BAR. This is the byte offset of the CXL.mem register area (where + * the CXL Capability Array Header lives) within the component register + * block. Userspace derives hdm_decoder_offset and hdm_count from the + * COMP_REGS region itself (CXL Capability Array traversal + HDMC read). + */ + cxl_cap.hdm_regs_offset =3D cxl->comp_reg_offset + CXL_CM_OFFSET; + cxl_cap.hdm_regs_bar_index =3D cxl->comp_reg_bar; + + if (cxl->precommitted) + cxl_cap.flags |=3D VFIO_CXL_CAP_FIRMWARE_COMMITTED; + if (cxl->cache_capable) + cxl_cap.flags |=3D VFIO_CXL_CAP_CACHE_CAPABLE; + + /* + * Populate absolute VFIO region indices so userspace can query them + * directly with VFIO_DEVICE_GET_REGION_INFO. + */ + cxl_cap.dpa_region_index =3D VFIO_PCI_NUM_REGIONS + cxl->dpa_region_idx; + cxl_cap.comp_regs_region_index =3D + VFIO_PCI_NUM_REGIONS + cxl->comp_reg_region_idx; + + return vfio_info_add_capability(caps, &cxl_cap.header, sizeof(cxl_cap)); +} + /* * Scope-based cleanup wrappers for the CXL resource APIs */ diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_c= ore.c index 48e0274c19aa..570775cc8711 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -591,7 +591,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device = *vdev) struct pci_dev *pdev =3D vdev->pdev; struct vfio_pci_dummy_resource *dummy_res, *tmp; struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; - int i, bar; + int i, bar, bars; =20 /* For needs_reset */ lockdep_assert_held(&vdev->vdev.dev_set->lock); @@ -650,8 +650,10 @@ void vfio_pci_core_disable(struct vfio_pci_core_device= *vdev) bar =3D i + PCI_STD_RESOURCES; if (!vdev->barmap[bar]) continue; + bars =3D (vdev->cxl && i =3D=3D vfio_cxl_get_component_reg_bar(vdev)) ? + 0 : (1 << bar); pci_iounmap(pdev, vdev->barmap[bar]); - pci_release_selected_regions(pdev, 1 << bar); + pci_release_selected_regions(pdev, bars); vdev->barmap[bar] =3D NULL; } =20 @@ -989,6 +991,13 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_cor= e_device *vdev, if (vdev->reset_works) info.flags |=3D VFIO_DEVICE_FLAGS_RESET; =20 + if (vdev->cxl) { + ret =3D vfio_cxl_get_info(vdev, &caps); + if (ret) + return ret; + info.flags |=3D VFIO_DEVICE_FLAGS_CXL; + } + info.num_regions =3D VFIO_PCI_NUM_REGIONS + vdev->num_regions; info.num_irqs =3D VFIO_PCI_NUM_IRQS; =20 @@ -1034,6 +1043,12 @@ int vfio_pci_ioctl_get_region_info(struct vfio_devic= e *core_vdev, struct pci_dev *pdev =3D vdev->pdev; int i, ret; =20 + if (vdev->cxl) { + ret =3D vfio_cxl_get_region_info(vdev, info, caps); + if (ret !=3D -ENOTTY) + return ret; + } + switch (info->index) { case VFIO_PCI_CONFIG_REGION_INDEX: info->offset =3D VFIO_PCI_INDEX_TO_OFFSET(info->index); @@ -1768,6 +1783,18 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev= , struct vm_area_struct *vma if (req_start + req_len > phys_len) return -EINVAL; =20 + /* + * CXL devices: mmap is permitted for the GPU/accelerator register + * windows listed in the sparse-mmap capability. Block any request + * that overlaps the CXL component register block + * [comp_reg_offset, comp_reg_offset + comp_reg_size); those registers + * must be accessed exclusively through the COMP_REGS device region so + * that the emulation layer (notify_change) intercepts every write. + */ + if (vdev->cxl && index =3D=3D vfio_cxl_get_component_reg_bar(vdev) && + vfio_cxl_mmap_overlaps_comp_regs(vdev, req_start, req_len)) + return -EINVAL; + /* * Even though we don't make use of the barmap for the mmap, * we need to request the region and the barmap tracks that. diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_p= riv.h index ae0091d5096c..2d4aadd1b35a 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -151,6 +151,14 @@ void vfio_cxl_setup_dvsec_perms(struct vfio_pci_core_d= evice *vdev); int vfio_cxl_register_cxl_region(struct vfio_pci_core_device *vdev); void vfio_cxl_unregister_cxl_region(struct vfio_pci_core_device *vdev); int vfio_cxl_register_comp_regs_region(struct vfio_pci_core_device *vdev); +int vfio_cxl_get_info(struct vfio_pci_core_device *vdev, + struct vfio_info_cap *caps); +int vfio_cxl_get_region_info(struct vfio_pci_core_device *vdev, + struct vfio_region_info *info, + struct vfio_info_cap *caps); +u8 vfio_cxl_get_component_reg_bar(struct vfio_pci_core_device *vdev); +bool vfio_cxl_mmap_overlaps_comp_regs(struct vfio_pci_core_device *vdev, + u64 req_start, u64 req_len); =20 #else =20 @@ -172,6 +180,22 @@ vfio_cxl_unregister_cxl_region(struct vfio_pci_core_de= vice *vdev) { } static inline int vfio_cxl_register_comp_regs_region(struct vfio_pci_core_device *vdev) { return 0; } +static inline int +vfio_cxl_get_info(struct vfio_pci_core_device *vdev, + struct vfio_info_cap *caps) +{ return -ENOTTY; } +static inline int +vfio_cxl_get_region_info(struct vfio_pci_core_device *vdev, + struct vfio_region_info *info, + struct vfio_info_cap *caps) +{ return -ENOTTY; } +static inline u8 +vfio_cxl_get_component_reg_bar(struct vfio_pci_core_device *vdev) +{ return U8_MAX; } +static inline bool +vfio_cxl_mmap_overlaps_comp_regs(struct vfio_pci_core_device *vdev, + u64 req_start, u64 req_len) +{ return false; } =20 #endif /* CONFIG_VFIO_CXL_CORE */ =20 diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_r= dwr.c index b38627b35c35..e95bdbdbcdb2 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -201,19 +201,29 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) { struct pci_dev *pdev =3D vdev->pdev; - int ret; + int ret, bars; void __iomem *io; =20 if (vdev->barmap[bar]) return 0; =20 - ret =3D pci_request_selected_regions(pdev, 1 << bar, "vfio"); + /* + * The CXL component register BAR cannot be claimed exclusively: the + * CXL subsystem holds persistent sub-range iomem claims during HDM + * decoder setup. pci_request_selected_regions() for the full BAR + * fails with EBUSY. Pass bars=3D0 to make the request a no-op and map + * directly via pci_iomap(). + */ + bars =3D (vdev->cxl && bar =3D=3D vfio_cxl_get_component_reg_bar(vdev)) ? + 0 : (1 << bar); + + ret =3D pci_request_selected_regions(pdev, bars, "vfio"); if (ret) return ret; =20 io =3D pci_iomap(pdev, bar, 0); if (!io) { - pci_release_selected_regions(pdev, 1 << bar); + pci_release_selected_regions(pdev, bars); return -ENOMEM; } =20 --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012065.outbound.protection.outlook.com [52.101.43.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB9D82DF13E; Wed, 1 Apr 2026 14:42:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.65 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054547; cv=fail; b=h9Sp1oj5wgHbTBvANCPSMbc4Tsm9UrnBbxvjRFTK4crvIU5DGCtCSe3341jSz/vxx1A5cW9RAJay5niJVjtgL2H+BawD4K99Euf0hetPkS0t7otHvIj/QwkPYLGDZhNm/L4remvmFyum5LoMN0Kx0/be4CVK5MpiaPRU4mviu2I= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054547; c=relaxed/simple; bh=egjyr3GbsoMvn7WsClO7k+OE707H60ktwXkjJCLf6uY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qe4abfGw19yHsg6JFE+Gwvnrlc5p9egtp8YH9Gax7umbuFctuE2sV7PeXbTSUeqKH5eg6wUdlqw5X6sk0JH4togBsB8nhH4VOBliY9EFB0o1yGCCduB/YH4PAJy4Te78hR2s5yngD/SXiB5d6Y4yDXMzX9zTEO12pMKS7W3WpW0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=tOcmzGyX; arc=fail smtp.client-ip=52.101.43.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="tOcmzGyX" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RR77uN73gx8x/KmAat/J5nnv0eCavBDtjLxxaIc7+iU7a77SJJSqH64j4gJggk0z81l1Q5PkpZttYCezBaW44WmLPcVrFC7g9Itk/Cnl6KopF8wEPxDV5tURvsee3bxDHrrbxrDxxwqQFFwh5pDxd0Wgz23GWnBlbH8FsvpwU4mX0TPVecSZmuLS4r97208Wz7HUQIH+cGXwhx2YcvNXQyyQwRBRUHhuQFZds4Lkc8dYkFHLHK+pnUBVZnOON3/99EHMhcLqE1uC8ESX/8VaAdpLu9rxht2oFlJppFC9ORT0LkxJENq93ogqbRUkOu8yvi8oC70FHynP4OjRUOaSIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JdyX/7RUtGQl+Ba9T6SmhTibqhAy83U56DgoiDuL/0Q=; b=KgA1yATYH9niQ7BWi/0J4sxhlfH9IyU9vNoq/jT63qFkS5Fj6mMFJRs/s6zsqxiAU1tuWv5gHP3PjFm16Gf+cacyTieCxE27izkDlEXITyP7DzRp3dAHjN5uD9GFa3xgdDpeb63r+pqL9v1YQKW76SIKb7wK5LoEQvyM0AtgMpHFdh9k91iDE6IafgwAeBZCx1JQLFK/FiX9Fc0dH89NOW2j1MFoyQFPiFonVZBtVIderLnCRDMlbxHqr/2Cr0HqGwT6nuzrRbPeFh/M5CPQJtZlzZLvyI2dhugKD4v+5HUySsqqvMRTEeOzXM4jyZQEDkuvpO1topiR6GudwFiwmw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JdyX/7RUtGQl+Ba9T6SmhTibqhAy83U56DgoiDuL/0Q=; b=tOcmzGyX8hB0Qw5yfkEU+KvfQIRKAqQTuceRYbDgUxdr3PkTzzTii045IHgDF588sfs8NLUM1dBCANEfyk4O8+6v8kyInkQkx1ONPDqEM7o1Vn0e39FCa3xEm4o5YPH7ngnIRKjGuNBpTs+T2qXB/1GStIcS0xh9uzP6XnOOj10Oz3uaE+zyL8hN/JT1c5kBNH4bJS9+d0VfNwWlryDKrGGkAYZczA37J0BJ6FNIhnMg9fCUfKUUJHOnquI1P/Ge1efRKRcf0ruxErYCEHISCi4Da8YstLkPzE3jsnwMtYlQXD41WtVgbVkJ+kBVuh+nOPjRqYeuLe40e607OrKzNw== Received: from BN9PR03CA0259.namprd03.prod.outlook.com (2603:10b6:408:ff::24) by LV2PR12MB5965.namprd12.prod.outlook.com (2603:10b6:408:172::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:42:14 +0000 Received: from BN1PEPF00006003.namprd05.prod.outlook.com (2603:10b6:408:ff:cafe::da) by BN9PR03CA0259.outlook.office365.com (2603:10b6:408:ff::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:42:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006003.mail.protection.outlook.com (10.167.243.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:42:13 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:51 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:44 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 18/20] vfio/cxl: Provide opt-out for CXL feature Date: Wed, 1 Apr 2026 20:09:15 +0530 Message-ID: <20260401143917.108413-19-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006003:EE_|LV2PR12MB5965:EE_ X-MS-Office365-Filtering-Correlation-Id: b418ab56-ea39-4251-0394-08de8ffcdd3f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|36860700016|82310400026|921020|22082099003|18002099003|56012099003|3122999024; X-Microsoft-Antispam-Message-Info: m4PrXcGdeCqG41VyEVdfNAjMo1a7prx/hSBo1BS4Yg0Y9wHW4AlImkQtomZ45UZ6Xva3O0ts2lVjZDAOwbo0MCsF86tCevnbnWmzICQ0Sar7PDv3H6nJql6abroZjYKdBZaym8qg3pBpxC6AC8aNauPdcly4Cv9q2HLHplNcd3IE4YpMbU1fhzs1cvicepUzrXgQcVQxmwwnXy7yhl/TYxTllFkQvLuDd6XHDwqXHp2OVrEqAqyRFdXxal6oXTTwgtRFzXh2IfGEHtbRu04+Y28TwYOlcId2PNRpyUeBSsh0hC3QW7EPYlemg4bS5dwQyHNCqUx7vuJqRaO7Ps49BA7LkykerW4NUCAa5XypJrba/fkFbgwAZrAxbW+NLKoVx1VG7Mqbux7cVQ7AxYN1p6t9tPvZJ3aOZExFPKOyOpq0wdpvaAZr2frH6s3tnLa17f48aZsXsfQkE6vX5NvHi9gjMvqcV4MFjcjbIMBjkFT/WikH/a3WMBBvqjDQcGoR6mmHMIbognmuo9g082eUceYceJnVskn03t9cAufIZCD4uS+l3pKew+Ii7gu1cPATACYGU8ls+AlIQBGoWEThq516mpeSncTd2FddQOytxyiVNSkeuGfUnPDnTwLnP2zFYVKYKAE3SyiZGFiGFxlf+URAJXH4aA+GTUuSQR2qdfmlGBM/pYe1Mvq0JXH3OptSzxeSr52AWBwKdozG6Kor1Z0IB8fpAQLFZDp7N8bamn1gmH7fzM79dG+bUsKkX3xydR7xf8REUCrh6CZB/S1C5Ch9CQU577RmQh9AqRBulRRf3p6+6lxTJC5x+t2o7HrK X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(36860700016)(82310400026)(921020)(22082099003)(18002099003)(56012099003)(3122999024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: MY9AhCxtXaDXBTVYkDqpp9PSBKi0looNWjdV+X4REnzI9+GCs+PyjKCAlXCtdwPD021g1xr/GItMQl+OgiBIOuOEQNZI6URKiLA2iml47yTyousnxJWiD3oyZ5K31u/KoBMP+LnsXt2Qc1Yt6H1gqAPZV87DlyQQVClEH1FElLoQATmFEJyEy0iRLuERQqvkAvBPlQqh/2u0Px3t5IE4KDqMDWsCpKWeDx9XNcjVg0wCClNG9xGOFb87quUszm+y06b53EwcFeTYmXUor+NdaDiaN+DSdDxa77y5JqTbYMmCp38idizW+yD7s9sPXg0bteV/Fv4e9TTssTKOs6QKfv41la7H3Vi2GdPx5/qKQBRrH0l/KmetPqwOYV+Aslq2Cb74UccRFduEb9pgrejszykRfKNQaE5tseXDpYusd6JmqoL7oQma3xZV74TIATtI X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:42:13.7502 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b418ab56-ea39-4251-0394-08de8ffcdd3f X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006003.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5965 Content-Type: text/plain; charset="utf-8" From: Manish Honap This commit provides an opt-out mechanism to disable the CXL support from vfio module. The opt-out is provided both build time and module load time. Build time option CONFIG_VFIO_CXL_CORE is used to enable/disable CXL support in vfio-pci module. For runtime disabling the CXL support, use the module parameter disable_cxl. This is a per-device opt-out on the core device set by the driver before registration. Signed-off-by: Manish Honap --- drivers/vfio/pci/cxl/vfio_cxl_core.c | 4 ++++ drivers/vfio/pci/vfio_pci.c | 9 +++++++++ include/linux/vfio_pci_core.h | 1 + 3 files changed, 14 insertions(+) diff --git a/drivers/vfio/pci/cxl/vfio_cxl_core.c b/drivers/vfio/pci/cxl/vf= io_cxl_core.c index 46430cbfa962..3ffc3e593d04 100644 --- a/drivers/vfio/pci/cxl/vfio_cxl_core.c +++ b/drivers/vfio/pci/cxl/vfio_cxl_core.c @@ -479,6 +479,10 @@ void vfio_pci_cxl_detect_and_init(struct vfio_pci_core= _device *vdev) u16 dvsec; int ret; =20 + /* Honor the user opt-out decision */ + if (vdev->disable_cxl) + return; + if (!pcie_is_cxl(pdev)) return; =20 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 22cf9ea831f9..a6b0fb882b9f 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -60,6 +60,12 @@ static bool disable_denylist; module_param(disable_denylist, bool, 0444); MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabl= ing the denylist allows binding to devices with known errata that may lead = to exploitable stability or security issues when accessed by untrusted user= s."); =20 +#if IS_ENABLED(CONFIG_VFIO_CXL_CORE) +static bool disable_cxl; +module_param(disable_cxl, bool, 0444); +MODULE_PARM_DESC(disable_cxl, "Disable CXL Type-2 extensions for all devic= es bound to vfio-pci. Variant drivers may instead set vdev->disable_cxl in = their probe for per-device control without needing this parameter."); +#endif + static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev) { switch (pdev->vendor) { @@ -189,6 +195,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const s= truct pci_device_id *id) return PTR_ERR(vdev); =20 dev_set_drvdata(&pdev->dev, vdev); +#if IS_ENABLED(CONFIG_VFIO_CXL_CORE) + vdev->disable_cxl =3D disable_cxl; +#endif vdev->pci_ops =3D &vfio_pci_dev_ops; ret =3D vfio_pci_core_register_device(vdev); if (ret) diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index aa159d0c8da7..48dc69df52fa 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -130,6 +130,7 @@ struct vfio_pci_core_device { bool needs_pm_restore:1; bool pm_intx_masked:1; bool pm_runtime_engaged:1; + bool disable_cxl:1; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from BL0PR03CU003.outbound.protection.outlook.com (mail-eastusazon11012020.outbound.protection.outlook.com [52.101.53.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58737472785; Wed, 1 Apr 2026 14:42:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.53.20 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054557; cv=fail; b=fl2/4dD/dOICXIDkU+5nNgT9darT5QAGq5OTTgVgsdGPJ11eiy/k7eLomd92vuB/jxLINs/TfgsJOfLN9AP4sPHbcpsUXGEEhTk6aV2G3KknGJHKCnKCJAGXaXYdHjM/yr0xBKcCiplWrhJiPiRR1OZwA6kJBTL87aiMjEDYMvw= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054557; c=relaxed/simple; bh=uJtgQVJj3Cc78GJXwVAJXEIliRoMsoXqNyUEWm5VIwU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EP9Bi8naheOAU78Ei5IFFS8IXvyelJHa71nJSauQHThx/1J3/C0p5PUjo791HCgm7WAxbiRC+4eBK7VWhmq16qsQ5PyJfydSYmD5nHaD4nbCZdc8M5l8AiwVWwLn1STnbwgXMucvLNKNMyPQjhJEly2be3yMvotIU18HX+6GCf8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=sDB9MVLg; arc=fail smtp.client-ip=52.101.53.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="sDB9MVLg" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=f0+uUKd6JbOs50VWJWsgMUCK7SCDrMhmeJLjBaZVQXgksOccXVugpqzyOc1NYwDJ2LFv77A8vurECV3rVBSHIFccnGuSlZFp20pfSmGJl5Jpqy6LY0qK3JmkWZo+gyiFM54EdcDIso0XHQF5yzTNCqB07gZ7TpGIBtEp6z1bRgi+eokCWP5+B/X/EgUoA8PwMWH6av4sGRV3+UCOeh+u5L7c5fGV8iay+gqrJdbau+l/kKgWREZN22qZSClsKIym//cM5SV+28YHBNYMSwsx6wzs9TUUG2lDDNDzLAzwbcfrGAhtHTHsn2NVNGXMDATsCtmzEc0Lw+xG8lHUAc/WQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gCZsi08z0F3p6R+7rSsyV4muAmrPPdPCPHq9S/ZJcMo=; b=UjU9/tqA55GGFzDVNhmiH1tuJAE3CSJO+7xlG+ubWafzokYEbuxzNeh7OaQsMZA6sqR81CRNKbpA3AH4m4udoz3mof32EmKgb/3GtDKKx4aZhFoJDjeZsPAtXGFBalJGYIeut3kUUhlNBlutc/NcfMap0RGKdhtO/2yc5/PF2admhR7Htv14YTZ+yCS6b6ei5HtnOldqKOfNq3XdbIMtOwzUqWUyEdGJTW+AZkLCGaQBnxrKBwnQ/lh19GEvn7InBA3wjrdZZqMVUZkG+tLWDNRj2rdF52HUV90Q75VQedfxV6Zbyi84ofYW5SvHf63pqFPVJYS1l5p9cpZan16YDw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gCZsi08z0F3p6R+7rSsyV4muAmrPPdPCPHq9S/ZJcMo=; b=sDB9MVLgVEj9RCgoRUgJMSJRyfYjlsHK0GW++IOmwebWcK/cns80ty5BL3JbWPNvigbvVmKol2mqwSBr95UGxzxt2/e7O5HxOmWgVbXey+eVyGJWub+2j9chxguar9w3SynZvxFWoHq/wouv7ocKCd14SeuF2DzglIU9xQQOk1O1vfaSwtaltCsxFXCvPE4E+9G2pvDKxbHmiJJhCljdT+PAFGhHCViHNUTx8snrxbuUG5B6guAhsPSbh2JEvkMONXQrTT6x8gttYk//NBzcIyz3DFBkHeQz2SWIZbAtXcI04j6g7N3XvVDyoqRraKv4g3Gh3CBRA3cFnJqFKQO2/A== Received: from BN9PR03CA0259.namprd03.prod.outlook.com (2603:10b6:408:ff::24) by LV2PR12MB5750.namprd12.prod.outlook.com (2603:10b6:408:17e::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.18; Wed, 1 Apr 2026 14:42:22 +0000 Received: from BN1PEPF00006003.namprd05.prod.outlook.com (2603:10b6:408:ff:cafe::19) by BN9PR03CA0259.outlook.office365.com (2603:10b6:408:ff::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 14:42:22 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00006003.mail.protection.outlook.com (10.167.243.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:42:21 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:59 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:51 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 19/20] docs: vfio-pci: Document CXL Type-2 device passthrough Date: Wed, 1 Apr 2026 20:09:16 +0530 Message-ID: <20260401143917.108413-20-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00006003:EE_|LV2PR12MB5750:EE_ X-MS-Office365-Filtering-Correlation-Id: cf8dd821-a9ad-4668-7aec-08de8ffce21a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|7416014|376014|36860700016|921020|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: 4IW30JanQsu+9dyVSYchqm01qkYMHUb1Tj2DUNZ8Jeb3vwUGzRWoU6eL1rCSO/+0Tc//ejl0fqB0oB2S+uqXGMv9WnGZfO+K6S35dEKb4NNpUsE359mbswtINqM5g+WQiIVWzulCx8FeTiD3iDCrN3dVw9g7RFPfi0NHSLIHCTJddfwSUnXkRHvVLRJG/e74mFFn4SVhyBKe+qj+LSvwx4HyuiUimz/TZyKevWXjzCAmsxxiLUMK18Zt/aeLMFBFNFfHYuvh2DzXXgy+LWtxU1rcWGQR6bWEz3+MdNcsuScWF8ktr6yLun8YTwlV2cKR5MB9KUvelsdtBakRFbsXboe2hjh4j1vvrjG42tZjAgOImqQEJ913gn8dE2KQ/blCk+gyVdV12IACmP/NdpaFmC6llr41yPzQpT/SnM8/ofvqABFsyMjcB/ui5cC2TP7q58ilzckb34bJaPXj6wh9fpQfkJgO+Sys6mDscKtnQhwZIiSKEjPG5FIW2AVdnpYFMOlyhGPsPFyAz+xITV0GgZQC2H1+3b67jQhsOPG9mgndv8P+hFZ8nQxNwxbK782qHEihNuwq7BZwI+alNb+yntt67T8+qaCoaSBsjsHP3VRdrC1o4BZkC6ADhvwr7D4h4GL2f2MEWriAtAAt7HqZDNdFkXpoaR3E0WGTTFSB+R8ZnTOM2y1MTiehFjcITH0EvP2I8EIC2mCMEz1eHJ5FYcOiz+f9njtSWJXY5ZA9Mu1O5LC7MkXK7YOgaBVdndKncIxEQERcmW804Gg/uAkSvC9SSe/4ROsEzxU6ACEwpTo2WVqr8ASASvqUnfW6I2Jn X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(7416014)(376014)(36860700016)(921020)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: wYk/A6RPYXCMUbFZUY8Ktuo+AgYdEwcWt1y830daRmU39TczkjVblOMifL1QVZp7qNT48wMaSTtDU4H9hDC5UV6br+Wyo45AXlbYzRhHVZBFble/S2wWsDGt7yWk4kbNhtm7kNbaFTAgziCM/430Vyxp8YSrQmUJqfFKz5AxIyHLRWDxmqckGtxtYf22fuSy8Lu49QWnjJUmu/erC9m1NbpBqFrDI7zuAb7r86vMBWlUanX9Dg3ArvgFYJZ4019vcht+jCXdl7LEKNXs+SWoyBWf9YCJ72n4RfNaZMQWxKqyFX0NgT3PMf5wQWI6JpUQ+GcoX9udIVb5E+0XnFwZK3kc9MQIwYRTeGmEztz43MKm4miClOTeXoTWWppbqLAVVsKV0SAZIjXeNMSBRPjBh8EiBO7SGNAYOs5KVpPt1HYmPy5EDLL+JP3dj1S3S+6w X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:42:21.9209 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cf8dd821-a9ad-4668-7aec-08de8ffce21a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00006003.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5750 From: Manish Honap Add Documentation/driver-api/vfio-pci-cxl.rst describing the architecture, VFIO interfaces, and operational constraints for CXL Type-2 (cache-coherent accelerator) passthrough via vfio-pci-core, and link it from the driver-api index. The document covers: - VFIO_DEVICE_FLAGS_CXL and VFIO_DEVICE_INFO_CAP_CXL: what the capability struct contains and what the FIRMWARE_COMMITTED and CACHE_CAPABLE flags m= ean - How to derive hdm_decoder_offset and hdm_count from the COMP_REGS region by traversing the CXL Capability Array to find cap ID 0x5 and reading the HDM Decoder Capability register - Topology-aware sparse mmap on the component BAR (topologies A, B, C covering comp block at end, start, or middle of the BAR) - Two extra VFIO device regions: COMP_REGS for the emulated HDM register state and the DPA memory window - DVSEC config write virtualization: what the guest sees vs. hardware - FLR coordination: DPA PTEs zapped before reset, restored after Signed-off-by: Manish Honap --- Documentation/driver-api/index.rst | 1 + Documentation/driver-api/vfio-pci-cxl.rst | 382 ++++++++++++++++++++++ 2 files changed, 383 insertions(+) create mode 100644 Documentation/driver-api/vfio-pci-cxl.rst diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/= index.rst index 1833e6a0687e..7ec661846f6b 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -47,6 +47,7 @@ of interest to most developers working on device drivers. vfio-mediated-device vfio vfio-pci-device-specific-driver-acceptance + vfio-pci-cxl =20 Bus-level documentation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/Documentation/driver-api/vfio-pci-cxl.rst b/Documentation/driv= er-api/vfio-pci-cxl.rst new file mode 100644 index 000000000000..1256e4d33fc6 --- /dev/null +++ b/Documentation/driver-api/vfio-pci-cxl.rst @@ -0,0 +1,382 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +VFIO PCI CXL Type-2 device passthrough +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Overview +-------- + +Type-2 CXL devices are PCIe accelerators (GPUs, compute ASICs, and similar) +with coherent device memory on CXL.mem. DPA is mapped into host physical +address space through HDM decoders that the kernel's CXL subsystem owns. A +guest cannot program that hardware directly. + +This ``vfio-pci`` mode hands a VMM: + +- A read/write VFIO device region (COMP_REGS) that emulates the HDM decoder + register block with CXL register rules enforced in kernel code. +- A mmapable VFIO device region (DPA) backed by the kernel-chosen host phy= sical + range for device memory. +- DVSEC config-space emulation so the guest cannot change host-owned CXL.i= o / + CXL.mem enable bits. + +Build with ``CONFIG_VFIO_CXL_CORE=3Dy``. At runtime you can turn it off wi= th:: + + modprobe vfio-pci disable_cxl=3D1 + +or, in a variant driver, set ``vdev->disable_cxl =3D true`` before registr= ation. + + +Device detection +---------------- + +At ``vfio_pci_core_register_device()`` the driver checks for a Type-2 style +setup. All of the following must hold: + +1. CXL Device DVSEC present (PCIe DVSEC Vendor ID ``0x1E98``, DVSEC ID + ``0x0000``). +2. ``Mem_Capable`` (bit 2) set in the CXL Capability register inside that = DVSEC. +3. PCI class code is **not** ``0x050210`` (CXL Type-3 memory expander). +4. An HDM Decoder capability block reachable through the Register Locator = DVSEC. +5. At least one HDM decoder committed by firmware with non-zero size. + +The CXL spec labels "Type-2" as devices with both ``Mem_Capable`` and +``Cache_Capable``. This driver also takes ``Mem_Capable``-only devices +(``Cache_Capable=3D0``), which behave like Type-3 style accelerators witho= ut the +usual class code. ``VFIO_CXL_CAP_CACHE_CAPABLE`` exposes the cache bit to +userspace so a VMM can treat FLR differently when needed. + +When detection succeeds, ``VFIO_DEVICE_FLAGS_CXL`` is ORed into +``vfio_device_info.flags`` together with ``VFIO_DEVICE_FLAGS_PCI``. + +.. note:: + + **Firmware must commit an HDM decoder before open.** The driver only + discovers DPA range and size from a decoder that firmware already commi= tted. + Devices without that, or hot-plugged setups that never get it, are out = of + scope for now. + + Follow-up options under discussion include CXL range registers in the + Device DVSEC (often enough on single-decoder parts), CDAT over DOE, mai= lbox + Get Partition Info, or a future DVSEC field from the consortium for + base/size/NUMA without extra side channels. There is also talk of a sys= fs + path, modeled on resizable BAR, where an orchestrator fixes the DPA win= dow + before vfio-pci binds so the driver still sees a committed range. + + +UAPI: VFIO_DEVICE_INFO_CAP_CXL +------------------------------ + +When ``VFIO_DEVICE_FLAGS_CXL`` is set, the device info capability chain +includes a ``vfio_device_info_cap_cxl`` structure (cap ID 6, version 1):: + + struct vfio_device_info_cap_cxl { + struct vfio_info_cap_header header; /* id=3D6, version=3D1 */ + __u8 hdm_regs_bar_index; /* BAR index containing component regs= */ + __u8 reserved[3]; + __u32 flags; /* VFIO_CXL_CAP_* flags */ + __u64 hdm_regs_offset; /* byte offset within the BAR to the + * CXL.mem register area start. This + * equals comp_reg_offset + CXL_CM_OFF= SET + * where CXL_CM_OFFSET =3D 0x1000. */ + __u32 dpa_region_index; /* VFIO region index for DPA memory */ + __u32 comp_regs_region_index; /* VFIO region index for COMP_REGS = */ + }; + /* + * hdm_count and hdm_decoder_offset are intentionally absent from this + * struct. Both are derivable from the COMP_REGS region. See the + * "Deriving HDM info from COMP_REGS" section below. + */ + + #define VFIO_CXL_CAP_FIRMWARE_COMMITTED (1 << 0) + #define VFIO_CXL_CAP_CACHE_CAPABLE (1 << 1) + +``VFIO_CXL_CAP_FIRMWARE_COMMITTED`` + At least one HDM decoder was pre-committed by firmware. The DPA region + is live at device open; the VMM can map it without waiting for a guest + COMMIT cycle. + +``VFIO_CXL_CAP_CACHE_CAPABLE`` + The device has an HDM-DB decoder (CXL.mem + CXL.cache). This mirrors t= he + ``Cache_Capable`` bit from the CXL DVSEC Capability register. The kern= el + does not run Write-Back Invalidation (WBI) before FLR; with this flag = set + that stays the VMM's job. + +DPA region size comes from ``VFIO_DEVICE_GET_REGION_INFO`` on +``dpa_region_index``, not from this struct. + + +VFIO regions +------------ + +A CXL device adds two device regions on top of the usual BARs. Their indic= es +are in ``dpa_region_index`` and ``comp_regs_region_index``. + +DPA region (``VFIO_REGION_SUBTYPE_CXL``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Flags: ``READ | WRITE | MMAP``. + +The backing store is the host physical range the kernel assigned for DPA. = The +kernel maps it with ``memremap(MEMREMAP_WB)`` because CXL device memory on= a +coherent link sits in the CPU cache hierarchy. That mapping is normal cach= ed +memory, so ``copy_to/from_user`` works without extra barriers. + +Page faults are lazy: PFNs are installed per page on first touch via +``vmf_insert_pfn``. ``mmap()`` does not populate the whole region up front. + +Region read/write through the fd uses the same ``MEMREMAP_WB`` mapping with +``copy_to/from_user``. ``ioread``/``iowrite`` MMIO helpers are not used on +this path. + +During FLR, ``unmap_mapping_range()`` drops user PTEs and ``region_active`` +clears before the reset runs. Ongoing faults or region I/O then error inst= ead +of touching a dead mapping. IOMMU ATC invalidation from the zap has to fin= ish +before the device resets; doing it the other way around can leave an SMMU +waiting on a device that no longer responds. + +After reset, the region comes back once ``COMMITTED`` shows up again in fr= esh +HDM hardware state. The VMM can fault pages in again without a new ``mmap(= )``. + +COMP_REGS region (``VFIO_REGION_SUBTYPE_CXL_COMP_REGS``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Flags: ``READ | WRITE`` (no mmap). + +Emulated registers for the CXL.mem slice of the component register block: = the +CXL Capability Array header at offset 0, then the HDM Decoder capability +starting at ``hdm_decoder_offset`` (the byte offset derived by traversing = the +CXL Capability Array =E2=80=94 see "Deriving HDM info from COMP_REGS" belo= w). +Region size from ``VFIO_DEVICE_GET_REGION_INFO`` covers the full capability +array prefix plus all HDM decoder blocks. + +Only 32-bit, 32-bit-aligned accesses are allowed. 8- and 16-bit attempts g= et +``-EINVAL``. + +Offsets below ``hdm_decoder_offset`` return the snapshot from device open. +Writes there are dropped (with a WARN); the capability array stays read-on= ly. + +From ``hdm_decoder_offset`` upward the kernel keeps a shadow +(``comp_reg_virt[]``) and applies field rules: + +- At open, hardware HDM state is snapshotted. For firmware-committed decod= ers + the LOCK bit is cleared and BASE_HI/BASE_LO are zeroed in the shadow so = the + VMM can program guest GPA; the host HPA is not carried in the shadow aft= er + that. +- ``COMMIT`` (bit 9 of CTRL): writing 1 sets ``COMMITTED`` (bit 10) in the + shadow immediately. Real hardware stays committed; the shadow tracks what + the guest should see. +- When LOCK is set, writes to BASE_HI and SIZE_HI are ignored so + firmware-committed values survive. + +Region type identifiers:: + + /* type =3D PCI_VENDOR_ID_CXL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE */ + #define VFIO_REGION_SUBTYPE_CXL 1 /* DPA memory region */ + #define VFIO_REGION_SUBTYPE_CXL_COMP_REGS 2 /* HDM register shadow */ + + +BAR access +---------- + +``VFIO_DEVICE_GET_REGION_INFO`` for ``hdm_regs_bar_index`` reports the full +BAR size with ``READ | WRITE | MMAP`` flags and a +``VFIO_REGION_INFO_CAP_SPARSE_MMAP`` capability listing the GPU or +accelerator register windows =E2=80=94 the mmappable parts of the BAR that= do **not** +contain CXL component registers. + +The number of sparse areas depends on where the CXL component register blo= ck +``[comp_reg_offset, comp_reg_offset + comp_reg_size)`` sits within the BAR: + +* **Topology A** - component block at BAR end: + ``[gpu_regs | comp_regs]`` =E2=86=92 1 area: ``[0, comp_reg_offset)`` + +* **Topology B** - component block at BAR start: + ``[comp_regs | gpu_regs]`` =E2=86=92 1 area: ``[comp_reg_size, bar_len)`` + +* **Topology C** - component block in middle: + ``[gpu_regs | comp_regs | gpu_regs]`` =E2=86=92 2 areas: + ``[0, comp_reg_offset)`` and ``[comp_reg_offset + comp_reg_size, bar_len= )`` + +VMMs **must** iterate all ``nr_areas`` entries; do not assume a single are= a or +that the first area starts at offset zero. + +The GPU/accelerator register windows listed in the sparse capability **are= ** +physically mmappable: ``mmap()`` on the VFIO device fd at the corresponding +BAR offset succeeds and yields a host-physical-backed mapping suitable for +KVM stage-2 installation. + +The CXL component register block itself **is not** mmappable. Any ``mmap(= )`` +request whose range overlaps ``[comp_reg_offset, comp_reg_offset + +comp_reg_size)`` returns ``-EINVAL``; those registers must be accessed thr= ough +the ``COMP_REGS`` device region. + + +DVSEC configuration space emulation +----------------------------------- + +With ``CONFIG_VFIO_CXL_CORE=3Dy``, vfio-pci installs a handler for +``PCI_EXT_CAP_ID_DVSEC`` (``0x23``) in the config access table. Non-CXL +devices fall through as before. + +On CXL devices, writes to these DVSEC registers are caught and reflected in +``vdev->vconfig`` (shadow config space): + ++--------------------+--------+-------------------------------------------= -------+ +| Register | Offset | Emulation = | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D+ +| CXL Control | +0x0c | RWL; IO_Enable held at 1; locked when Lock= | +| | | bit 0 is set. = | ++--------------------+--------+-------------------------------------------= -------+ +| CXL Status | +0x0e | Bit 14 (Viral_Status) is RW1CS. = | ++--------------------+--------+-------------------------------------------= -------+ +| CXL Control2 | +0x10 | Bits 1 and 2 forwarded to hardware. = | ++--------------------+--------+-------------------------------------------= -------+ +| CXL Status2 | +0x12 | Bit 3 forwarded when Capability3 bit 3 is = set. | ++--------------------+--------+-------------------------------------------= -------+ +| CXL Lock | +0x14 | RWO; once set, Control becomes read-only u= ntil | +| | | conventional reset. = | ++--------------------+--------+-------------------------------------------= -------+ +| Range Base Hi/Lo | varies | Stored in vconfig; Base Low [27:0] reserve= d bits | +| | | cleared on write. = | ++--------------------+--------+-------------------------------------------= -------+ + +Reads return the shadow. Read-only registers (Capability, Size High/Low) a= re +filled from hardware at open. + + +FLR and reset +------------- + +FLR goes through ``vfio_pci_ioctl_reset()``. The CXL-specific part is: + +1. ``vfio_cxl_zap_region_locked()`` runs under the write side of + ``memory_lock``. It clears ``region_active`` and calls + ``unmap_mapping_range()`` on the DPA inode mapping so user PTEs go away. + Concurrent faults or fd I/O hit the inactive flag and error. IOMMU ATC = must + drain before reset (see the DPA region notes above). + +2. After FLR, ``vfio_cxl_reactivate_region()`` reads HDM hardware again in= to + ``comp_reg_virt[]``. If ``COMMITTED`` is set (common when firmware left= the + decoder committed), ``region_active`` turns back on and the VMM can ref= ault + without remapping. + + +Known limitations +----------------- + +**Pre-committed HDM decoder required** + See `Device detection`_ and the note there. + +**CXL hot-plug not supported** + Slots need to be present and programmed by firmware at boot. + +**CXL.cache Write-Back Invalidation not implemented** + For HDM-DB devices (``VFIO_CXL_CAP_CACHE_CAPABLE``), the kernel does n= ot + run WBI before FLR. The VMM must do it and expose Back-Invalidation in= the + guest topology where required. + + +VMM integration notes +--------------------- + +For a ``VFIO_CXL_CAP_FIRMWARE_COMMITTED`` device (what works today):: + + /* 1. Get device info and locate the CXL cap */ + vfio_device_get_info(fd, &dinfo); + assert(dinfo.flags & VFIO_DEVICE_FLAGS_CXL); + cxl =3D find_cap(&dinfo, VFIO_DEVICE_INFO_CAP_CXL); + + /* 2. Get DPA and COMP_REGS region sizes */ + get_region_info(fd, cxl->dpa_region_index, &dpa_ri); + get_region_info(fd, cxl->comp_regs_region_index, &comp_ri); + + /* 3. Map DPA region at a guest physical address */ + gpa_base =3D allocate_guest_phys(dpa_ri.size); + mmap(gpa_base, dpa_ri.size, PROT_READ|PROT_WRITE, + MAP_SHARED|MAP_FIXED, vfio_fd, + (off_t)cxl->dpa_region_index << VFIO_PCI_OFFSET_SHIFT); + + /* 4. Derive hdm_decoder_offset from COMP_REGS (see section below) */ + uint64_t hdm_decoder_offset =3D derive_hdm_offset(vfio_fd, comp_ri); + + /* 5. Write guest GPA into HDM Decoder 0 BASE via COMP_REGS pwrite */ + u32 base_hi =3D gpa_base >> 32; + comp_off =3D (off_t)cxl->comp_regs_region_index << VFIO_PCI_OFFSET_SHI= FT; + pwrite(vfio_fd, &base_hi, 4, + comp_off + hdm_decoder_offset + CXL_HDM_DECODER0_BASE_HIGH_OFFS= ET); + + /* 6. Build guest CXL topology using gpa_base and dpa_ri.size */ + build_cfmws(gpa_base, dpa_ri.size); + + /* 7. If CACHE_CAPABLE: issue WBI before any guest FLR */ + +Extra detail: + +- DPA size is ``dpa_ri.size`` from region info. +- ``CXL_HDM_DECODER0_BASE_HIGH_OFFSET`` lives in ``include/uapi/cxl/cxl_re= gs.h``. +- On the BAR, ``mmaps[0].size`` from the sparse-mmap cap on + ``hdm_regs_bar_index`` splits GPU MMIO (BAR fd) from the CXL block (COMP= _REGS + region). +- If ``VFIO_CXL_CAP_CACHE_CAPABLE`` is set, the guest CXL topology should + advertise Back-Invalidation and the VMM should run WBI before FLR. + + +Deriving HDM info from COMP_REGS +--------------------------------- + +``hdm_decoder_offset`` and ``hdm_count`` are not in ``vfio_device_info_cap= _cxl`` +because both are directly readable from the ``COMP_REGS`` region. + +**Finding hdm_decoder_offset:** + +Read dwords from the COMP_REGS region starting at offset 0 (the CXL Capabi= lity +Array). ``comp_off`` is the VFIO file offset for the COMP_REGS region: +``(off_t)cxl->comp_regs_region_index << VFIO_PCI_OFFSET_SHIFT``:: + + /* Dword 0: CXL Capability Array Header */ + pread(fd, &hdr, 4, comp_off + 0); + /* bits[15:0] must be 1 (CM_CAP_HDR_CAP_ID) */ + /* bits[31:24] =3D number of capability entries */ + num_caps =3D (hdr >> 24) & 0xff; /* CXL_CM_CAP_HDR_ARRAY_SIZE_MASK */ + + /* Walk entries at dword 1..num_caps */ + for (i =3D 1; i <=3D num_caps; i++) { + pread(fd, &entry, 4, comp_off + i * 4); + cap_id =3D entry & 0xffff; /* CXL_CM_CAP_HDR_ID_MASK */ + if (cap_id =3D=3D 0x5) { /* CXL_CM_CAP_CAP_ID_HDM */ + hdm_decoder_offset =3D (entry >> 20) & 0xfff; /* CXL_CM_CAP_PT= R_MASK */ + break; + } + } + +**Finding hdm_count:** + +Read the HDM Decoder Capability register (HDMC) at ``hdm_decoder_offset + = 0``:: + + pread(fd, &hdmc, 4, comp_off + hdm_decoder_offset); + field =3D hdmc & 0xf; /* CXL_HDM_DECODER_COUNT_MASK bits[3:0] */ + hdm_count =3D field ? field * 2 : 1; /* 0=E2=86=921, N=E2=86=92N*2 de= coders */ + +All constants are in ``include/uapi/cxl/cxl_regs.h``. + + +Kernel configuration +-------------------- + +``CONFIG_VFIO_CXL_CORE`` (bool) + CXL Type-2 passthrough in ``vfio-pci-core``. Needs ``CONFIG_VFIO_PCI_C= ORE``, + ``CONFIG_CXL_BUS``, and ``CONFIG_CXL_MEM``. + +References +---------- + +* CXL Specification 4.0, 8.1.3 - PCIe DVSEC for CXL Devices +* CXL Specification 4.0, 8.2.4.20 - CXL HDM Decoder Capability Structure +* ``include/uapi/linux/vfio.h`` - ``VFIO_DEVICE_INFO_CAP_CXL``, + ``VFIO_REGION_SUBTYPE_CXL``, ``VFIO_REGION_SUBTYPE_CXL_COMP_REGS`` +* ``include/uapi/cxl/cxl_regs.h`` - ``CXL_CM_OFFSET``, + ``CXL_CM_CAP_HDR_ARRAY_SIZE_MASK``, ``CXL_CM_CAP_HDR_ID_MASK``, + ``CXL_CM_CAP_PTR_MASK``, ``CXL_HDM_DECODER_COUNT_MASK``, + ``CXL_HDM_DECODER0_BASE_HIGH_OFFSET`` --=20 2.25.1 From nobody Wed Apr 1 20:37:31 2026 Received: from DM5PR21CU001.outbound.protection.outlook.com (mail-centralusazon11011005.outbound.protection.outlook.com [52.101.62.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0922347DF92; Wed, 1 Apr 2026 14:42:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.62.5 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054571; cv=fail; b=SlSSS/4MuP/InaxiapSpSTSbAJElVm8etQAj4tv62XM38ssPhu9gV2dzEV6F/9LP1ud796h/8Flcmhktp/Bh2xH6KhEJSdo5QB3bSSaEaZzMHyFBYZZQZ3XZdnDxIFhT4qorBLh+joypJD03SnT5cqptOBe0aaN7otLWVdkOb5k= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775054571; c=relaxed/simple; bh=2rr7y30pUE1DoIZLeKSK4BDDV3Me8otLx+M2LBJxUO0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uFGvWgZDiDgiSkE2gTmXU+HU8HDvzo7tf1YBUbtuduEq7NwSAw2N/Wnh3SZA5uvOGAceZz93zMTq6zPmi0KRj1buNMxj+oIRO65Pvpx9UQM+X2FNYJyptlQPosPIKGYXKNm0UkpqMTflnTWHTPAxlOulWspdx30zKiICoe1JBX0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=CJdANlb8; arc=fail smtp.client-ip=52.101.62.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="CJdANlb8" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=gOxVWFeH8Cr5QnfwXM59CVMUwc+GqLCZbHcUP19Fl3iZZxYlI5JBJ6gemNOaJuUGuCeNVXaoaBG9qKWZ7ltr63km0R3/Jb2qC9I5y6Y4nPJfmEcOAvcgIfOw1HJk1GNo8f3TRumxHe8nMfafQWzx2XuOr2YKcLU4qAtuWelF9xGUqB3qLzEHWXz/+XCznx4greHY/n1MuOVhCIjw3e2Qex5oiuOa2bf6RGC1nj9X7KzroJPpIzTyz3M+vJxiiNX67pr6yjc+LRJdMil4LqeOjCeZtdpogT58XnOSQ+If6CQCIe+c03gPxfdImLd3Sp5uGJhehHGC7YWymV58/AtjBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vjCsG53TfBRVec16a3ioXovRaqDjw+chXjBxWlfAVf0=; b=Vf9esN3Z6mot6VG9PMhJzbA1K40ZnybB7VpbV7LlrLmVGEvL5a29CIQcBkcFBcSNPcZESQk8rLHSDtNMiHkaqtOhVGO2H+bEd0l6iJvvToJf0/IZmHKnUNl/tEjc/XmqiRRZ3TiQloaAt+YCfT/Sf4wwayrvGJd6rbVDp/axqRG3aZwLbP8dGhDeXnv9tSOhc1/6Mri9NRnYDmxmnhbU5FLI27nFCjK5Gr2QhlRocRDmRRG6N1CC715lWHtD1LTCcSWVd8gUWG3tcXs6hf/zIBLiZgy43BG/nOEAnx+7kgV/js8U6W8hzHSWdw84hb1SJhVw97vNA1q0lrrJ4qYsWg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vjCsG53TfBRVec16a3ioXovRaqDjw+chXjBxWlfAVf0=; b=CJdANlb8V2dofl/T59hEIjBo74pY7Glo2/6FNfAS/LvdockPvAujhD4wZRkqgfdVcf82uLFSa965ViMhmCBla3yR5m5HIHvPHcpP0N91QQMuC3TmldTi/X34J+b92VdP3hXHD0v+IBiq5PrORu+TVKQeGW3SpwkZ2JtdyTbKnpKMXZCquIPdYP9URiVplpyA/XqtWe21x6sVZw+vVED9/5v1lYbUw4F6ALMwOwTIwx/MXC02pBe9hci+ECHlY+C3jvkHfkfXCwNrpZa18qSDlLAgq3ValSkzNKmKxko3olPooFpPk8q13cAkG/qTTZcvN0hplYjbvgLJhLGPKWjgaw== Received: from BL1P222CA0005.NAMP222.PROD.OUTLOOK.COM (2603:10b6:208:2c7::10) by IA0PR12MB8255.namprd12.prod.outlook.com (2603:10b6:208:404::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 14:42:30 +0000 Received: from BN1PEPF00005FFF.namprd05.prod.outlook.com (2603:10b6:208:2c7:cafe::cb) by BL1P222CA0005.outlook.office365.com (2603:10b6:208:2c7::10) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 14:42:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF00005FFF.mail.protection.outlook.com (10.167.243.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 14:42:30 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:42:06 -0700 Received: from nvidia-4028GR-scsim.nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 07:41:59 -0700 From: To: , , , , , , , , , , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 20/20] selftests/vfio: Add CXL Type-2 VFIO assignment test Date: Wed, 1 Apr 2026 20:09:17 +0530 Message-ID: <20260401143917.108413-21-mhonap@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20260401143917.108413-1-mhonap@nvidia.com> References: <20260401143917.108413-1-mhonap@nvidia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF00005FFF:EE_|IA0PR12MB8255:EE_ X-MS-Office365-Filtering-Correlation-Id: 91458d26-46a1-4ed3-c0e3-08de8ffce730 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|36860700016|82310400026|921020|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: 1RCWw4hPZ3WpuvB9xRmHjeX98DHPWQuruL18fUYIwGUGXjKzNhsNowrtb/CooSmKDlYYzprax56pgTiFJMmOBzBxm72c/ulVToL8TU4MG0/O12MV8KsGY+sTPtW6vAkP3dQK9fi6eUx4ID7+CBOrDzygQppafKfslyFB+IuomXK0N80+dJbzvM8nCRRVGiKFTX1s+t4AGgHTHHpizX4QCES5BTUudhBpZeHr2dv1LMSXTJBQ4NXn2q4+Vv7ZY23CSSbr6MzfEJv2eRJFoQHw+hzInYv9Kfmctu0+Aa92dvj8m9PAx2p2iWCVzkFrnpk0JZQcSytspNp54EQur7rEElqDXgJEaJvtlYmzNZ3hacLT3d8kHfc+60F8I98lGJAPb//eMp+k7+XEpj13Vz4ScrCJA8iSNITvN6oq1U/WVCLi6HOU4Ug33vInV9JknUDT2NuxdzBIAyyW8i1iq4HToTAkKhKLVhdq5hepsI7yJfPnCS0bvmaM9O650W20TVm6rFr7aXY2q4p4qs73ZZ/kJU0PELEwyZp0h10j3ITHwviX0kPGsoPj7I5gXXBkMSGUN2zgjkb+I1LjBNOAP3snaHyc5gzdiZeJsss5bsbhaef+nT9CMXxA9zDUM7kXzDkSYKwlgqG0bJqA+Ouh800eUSv14MuRwvogNM3U60rHOtBwgm5nLVI7/7s1046eEoMTlohbtaQP7SuR1f21BTpw/MVNckYdT9f2uYMZmbhOraqANSqSUmRJN5ZADzU7/jVNBCJcmaI/6sBQr7rkBHGKK5fmwW4sJvMRLcQ1WIx0o/ovpIqPRhykYli9nTV1xu8Q X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(36860700016)(82310400026)(921020)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Oo0ZhNW0E2ZCtPHIttj9iQZmdbG0y2bcX36cMWEBLhhygcesr9SrEajDHJABzGqL2IRuq8vd0+ZV00jWpxsTTqmz5GnypXMk7txBNNqtaABj7leGOcRSUI5grUEsN9N4/c2aAVEK4pWwuvbvCoMSC7g7zrvY0uL27FgVAJAGWRue7VF6NSGFhExOt3qFNcvk5ix5+WFFYQ17Y32wdNdcqyRI2bhoFnmv0FyvUQlsAxCN2NX1KT12RSJIqTJ1P0r3k5PFdSkd+lPzdNFLWRje7cvg8QEJamZsk96ck+3Ma8zOTzI/UymcDyyqozU6m3sPACJ+EOn6IyPxMC+cgyHKlFWF2VLBRi6WmYvvVIhN+zCH6SYUBSt2fXvjqX6FgP4RW8kyCsW49wriHzfhdydYJqYDzHFUeswi+iWZ0v5Xh/BvaHW64QRtPwrKHlryJDV/ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 14:42:30.4556 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 91458d26-46a1-4ed3-c0e3-08de8ffce730 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF00005FFF.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB8255 From: Manish Honap Add vfio_cxl_type2_test and build it from the vfio selftest Makefile. The binary expects a PCI BDF (argv or VFIO_SELFTESTS_BDF) with the device already on vfio-pci and CONFIG_VFIO_CXL_CORE enabled. It exercises: - VFIO_DEVICE_GET_INFO, - GET_REGION_INFO, - VFIO_DEVICE_INFO_CAP_CXL capability list, - sparse component-BAR vs DPA/COMP_REG regions, - HDM decoder emulation (masks, commit, lock), - DVSEC-backed config where the driver exposes it. Large region read/write loops and FLR-heavy test cases are still pending; Need to revisit these in next version of patches. vfio_pci_device_setup() skips auto-mmap for BARs that carry sparse-mmap capabilities; those require the caller to mmap only the windows advertised by the capability. Signed-off-by: Manish Honap --- tools/testing/selftests/vfio/Makefile | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 3 +- .../selftests/vfio/vfio_cxl_type2_test.c | 920 ++++++++++++++++++ 3 files changed, 923 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vfio/vfio_cxl_type2_test.c diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftest= s/vfio/Makefile index 3c796ca99a50..2cac98302609 100644 --- a/tools/testing/selftests/vfio/Makefile +++ b/tools/testing/selftests/vfio/Makefile @@ -4,6 +4,7 @@ TEST_GEN_PROGS +=3D vfio_iommufd_setup_test TEST_GEN_PROGS +=3D vfio_pci_device_test TEST_GEN_PROGS +=3D vfio_pci_device_init_perf_test TEST_GEN_PROGS +=3D vfio_pci_driver_test +TEST_GEN_PROGS +=3D vfio_cxl_type2_test =20 TEST_FILES +=3D scripts/cleanup.sh TEST_FILES +=3D scripts/lib.sh diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/tes= ting/selftests/vfio/lib/vfio_pci_device.c index fac4c0ecadef..98832acc31ac 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -257,7 +257,8 @@ static void vfio_pci_device_setup(struct vfio_pci_devic= e *device) struct vfio_pci_bar *bar =3D device->bars + i; =20 vfio_pci_region_get(device, i, &bar->info); - if (bar->info.flags & VFIO_REGION_INFO_FLAG_MMAP) + if ((bar->info.flags & VFIO_REGION_INFO_FLAG_MMAP) && + !(bar->info.flags & VFIO_REGION_INFO_FLAG_CAPS)) vfio_pci_bar_map(device, i); } =20 diff --git a/tools/testing/selftests/vfio/vfio_cxl_type2_test.c b/tools/tes= ting/selftests/vfio/vfio_cxl_type2_test.c new file mode 100644 index 000000000000..272412a7b22f --- /dev/null +++ b/tools/testing/selftests/vfio/vfio_cxl_type2_test.c @@ -0,0 +1,920 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * vfio_cxl_type2_test - selftests for CXL Type-2 device passthrough via v= fio-pci + * + * Tests the UAPI and emulation layer introduced by CONFIG_VFIO_CXL_CORE + * + * Usage: + * ./vfio_cxl_type2_test + * or set the environment variable VFIO_SELFTESTS_BDF before running. + * + * The device must be a CXL Type-2 device (e.g. a GPU with coherent memory= ). + * Tests adapt automatically to firmware-committed (COMMITTED/COMMIT_LOCK = set) + * and CONFIG_LOCK-set hardware states instead of skipping. + * + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include +#include +#include +#include +#include + +#include +#include + +#include +#include +#include +#include + +#include + +#include "kselftest_harness.h" + +/* Userspace equivalents of kernel helpers not available in user headers */ +#ifndef BIT +#define BIT(n) (1u << (n)) +#endif +#ifndef GENMASK +#define GENMASK(h, l) (((~0u) >> (31 - (h))) & ((~0u) << (l))) +#endif +#define VFIO_PCI_INDEX_TO_OFFSET(idx) ((uint64_t)(idx) << 40) + +static const char *device_bdf; + +/* ------------------------------------------------------------------ */ +/* CXL UAPI constants (mirrors include/uapi/linux/vfio.h) */ +/* ------------------------------------------------------------------ */ + +#define VFIO_DEVICE_INFO_CAP_CXL 6 + +#define PCI_VENDOR_ID_CXL 0x1e98 + +#ifndef VFIO_REGION_SUBTYPE_CXL +#define VFIO_REGION_SUBTYPE_CXL 1 +#endif +#ifndef VFIO_REGION_SUBTYPE_CXL_COMP_REGS +#define VFIO_REGION_SUBTYPE_CXL_COMP_REGS 2 +#endif + +/* + * HDM Decoder register layout within the component register block. + * Offsets relative to the start of the HDM decoder capability block. + * The HDM decoder block begins at hdm_decoder_offset within the COMP_REGS + * region; add hdm_decoder_offset before indexing into the region. + */ +#define HDM_GLOBAL_CTRL_OFFSET 0x04 +#define HDM_DECODER_FIRST_OFFSET 0x10 +#define HDM_DECODER_STRIDE 0x20 +#define HDM_DECODER_BASE_LO 0x00 +#define HDM_DECODER_BASE_HI 0x04 +#define HDM_DECODER_SIZE_LO 0x08 +#define HDM_DECODER_SIZE_HI 0x0c +#define HDM_DECODER_CTRL 0x10 + +#define HDM_CTRL_COMMIT BIT(9) +#define HDM_CTRL_COMMITTED BIT(10) +#define HDM_CTRL_RESERVED_MASK (BIT(15) | GENMASK(31, 28)) + +#define CXL_LOCK_RESERVED_MASK GENMASK(15, 1) + +/* ------------------------------------------------------------------ */ +/* Helpers */ +/* ------------------------------------------------------------------ */ + +/* + * Walk the vfio_device_info capability chain embedded in @buf. + * Returns a pointer to the capability with the given @id, or NULL. + */ +static const struct vfio_info_cap_header * +find_device_cap(const void *buf, size_t bufsz, uint16_t id) +{ + const struct vfio_device_info *info =3D buf; + const struct vfio_info_cap_header *cap; + + if (!(info->flags & VFIO_DEVICE_FLAGS_CAPS) || !info->cap_offset) + return NULL; + + cap =3D (const struct vfio_info_cap_header *) + ((const char *)buf + info->cap_offset); + + while (true) { + if (cap->id =3D=3D id) + return cap; + if (!cap->next) + return NULL; + cap =3D (const struct vfio_info_cap_header *) + ((const char *)buf + cap->next); + if ((const char *)cap + sizeof(*cap) > (const char *)buf + bufsz) + return NULL; + } +} + +/* + * Walk the vfio_region_info capability chain embedded in @buf. + * Returns a pointer to the capability with the given @id, or NULL. + * @buf must have been obtained from VFIO_DEVICE_GET_REGION_INFO with + * argsz large enough to hold the full capability chain. + */ +static const struct vfio_info_cap_header * +find_region_cap(const void *buf, size_t bufsz, uint16_t id) +{ + const struct vfio_region_info *info =3D buf; + const struct vfio_info_cap_header *cap; + + if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS) || !info->cap_offset) + return NULL; + + cap =3D (const struct vfio_info_cap_header *) + ((const char *)buf + info->cap_offset); + + while (true) { + if (cap->id =3D=3D id) + return cap; + if (!cap->next) + return NULL; + cap =3D (const struct vfio_info_cap_header *) + ((const char *)buf + cap->next); + if ((const char *)cap + sizeof(*cap) > (const char *)buf + bufsz) + return NULL; + } +} + +/* + * Read a 32-bit value from the COMP_REGS region at @offset (HDM-relative). + */ +static uint32_t comp_regs_read32(struct vfio_pci_device *dev, + uint32_t region_idx, uint64_t offset) +{ + uint32_t val; + loff_t pos =3D (loff_t)VFIO_PCI_INDEX_TO_OFFSET(region_idx) + offset; + ssize_t r; + + r =3D pread(dev->fd, &val, sizeof(val), pos); + if (r !=3D sizeof(val)) + return ~0u; + return val; +} + +/* + * Write a 32-bit value to the COMP_REGS region at @offset. + * Mirrors the error-propagation contract of comp_regs_read32() which retu= rns + * ~0u on a short or failed pread. + */ +static ssize_t comp_regs_write32(struct vfio_pci_device *dev, + uint32_t region_idx, uint64_t offset, + uint32_t val) +{ + loff_t pos =3D (loff_t)VFIO_PCI_INDEX_TO_OFFSET(region_idx) + offset; + + return pwrite(dev->fd, &val, sizeof(val), pos); +} + +/* + * HDM register accessors. + * + * The COMP_REGS region starts at the CXL component register block + * start (comp_reg_offset). The HDM decoder capability block begins at + * hdm_decoder_offset within this region. These helpers add + * hdm_decoder_offset so that callers can continue to use the HDM-relative + * offsets defined by the macros above. + */ +static uint32_t hdm_regs_read32(struct vfio_pci_device *dev, + uint32_t region_idx, + uint64_t hdm_decoder_offset, + uint64_t hdm_off) +{ + return comp_regs_read32(dev, region_idx, hdm_decoder_offset + hdm_off); +} + +static ssize_t hdm_regs_write32(struct vfio_pci_device *dev, + uint32_t region_idx, + uint64_t hdm_decoder_offset, + uint64_t hdm_off, + uint32_t val) +{ + return comp_regs_write32(dev, region_idx, hdm_decoder_offset + hdm_off, v= al); +} + +/* + * Traverse the CXL Capability Array at COMP_REGS region offset 0 to find = the + * HDM Decoder capability block offset and decoder count. + * + * COMP_REGS region layout at offset 0 (CXL Capability Array): + * Dword 0 bits[31:24] (CXL_CM_CAP_HDR_ARRAY_SIZE_MASK): entry count N. + * Dwords 1..N at offset (cap*4): bits[15:0] =3D cap ID (CXL_CM_CAP_HDR_= ID_MASK), + * bits[31:20] =3D byte offset from COMP_REGS start (CXL_CM_CAP_PTR_MASK= ). + * + * HDM Decoder cap ID =3D 0x5 (CXL_CM_CAP_CAP_ID_HDM). + * HDMC at hdm_decoder_offset+0 bits[3:0]: count =3D (field=3D=3D0) ? 1 : = field*2. + * + * Returns true on success; sets *hdm_off and *hdm_cnt. + */ +static bool find_hdm_decoder_info(struct vfio_pci_device *dev, + uint32_t comp_regs_idx, + uint64_t *hdm_off, uint8_t *hdm_cnt) +{ + uint32_t hdr, num_caps, i; + + /* Read CXL Capability Array Header (dword 0) */ + hdr =3D comp_regs_read32(dev, comp_regs_idx, 0); + if (hdr =3D=3D ~0u) + return false; + + /* Validate: bits[15:0] must be CM_CAP_HDR_CAP_ID (1) */ + if ((hdr & 0xffff) !=3D 1) + return false; + + /* bits[31:24] =3D number of capability entries */ + num_caps =3D (hdr >> 24) & 0xff; + + for (i =3D 1; i <=3D num_caps; i++) { + uint32_t entry =3D comp_regs_read32(dev, comp_regs_idx, i * 4); + uint32_t cap_id =3D entry & 0xffff; /* CXL_CM_CAP_HDR_ID_MASK */ + + if (cap_id =3D=3D 0x5) { /* CXL_CM_CAP_CAP_ID_HDM */ + uint32_t hdmc; + uint32_t field; + + /* bits[31:20]: byte offset from COMP_REGS start */ + *hdm_off =3D (entry >> 20) & 0xfff; + + /* Read HDMC register at hdm_decoder_offset + 0 */ + hdmc =3D comp_regs_read32(dev, comp_regs_idx, *hdm_off); + if (hdmc =3D=3D ~0u) + return false; + + /* bits[3:0]: 0 =3D 1 decoder, N =3D N*2 decoders */ + field =3D hdmc & 0xf; + *hdm_cnt =3D field ? (uint8_t)(field * 2) : 1; + return true; + } + } + return false; +} + +/* + * Find the CXL DVSEC capability base in config space. + */ +#define PCI_DVSEC_VENDOR_ID_CXL 0x1e98 +#define PCI_DVSEC_ID_CXL_DEVICE 0x0000 +#define PCI_EXT_CAP_ID_DVSEC 0x23 + +static uint16_t find_cxl_dvsec(struct vfio_pci_device *dev) +{ + uint16_t pos =3D PCI_CFG_SPACE_SIZE; /* 0x100 */ + int iter =3D 0; + + while (pos && iter++ < 64) { + uint32_t hdr =3D vfio_pci_config_readl(dev, pos); + uint32_t hdr1, hdr2; + uint16_t cap_id =3D hdr & 0xffff; + uint16_t next =3D (hdr >> 20) & 0xffc; + + if (cap_id =3D=3D PCI_EXT_CAP_ID_DVSEC) { + hdr1 =3D vfio_pci_config_readl(dev, pos + 4); + hdr2 =3D vfio_pci_config_readl(dev, pos + 8); + /* + * PCIe DVSEC Header 1 layout (Table 9-16): + * Bits [15: 0] =3D DVSEC Vendor ID + * Bits [19:16] =3D DVSEC Revision + * Bits [31:20] =3D DVSEC Length + * DVSEC Header 2 layout: + * Bits [15: 0] =3D DVSEC ID + */ + if ((hdr1 & 0xffff) =3D=3D PCI_DVSEC_VENDOR_ID_CXL && + (hdr2 & 0xffff) =3D=3D PCI_DVSEC_ID_CXL_DEVICE) + return pos; + } + pos =3D next; + } + return 0; +} + +/* ------------------------------------------------------------------ */ +/* Fixture */ +/* ------------------------------------------------------------------ */ + +FIXTURE(cxl_type2) { + struct iommu *iommu; + struct vfio_pci_device *dev; + + /* Filled in during FIXTURE_SETUP from the CXL cap */ + struct vfio_device_info_cap_cxl cxl_cap; + uint16_t dvsec_base; + + /* + * Sizes derived from VFIO_DEVICE_GET_REGION_INFO at setup time. + * These are not in the CXL cap struct; query the region directly. + */ + uint64_t dpa_size; /* size of the DPA region */ + uint64_t hdm_regs_size; /* size of the COMP_REGS region */ + + /* + * HDM decoder info derived from the COMP_REGS region at setup time. + * hdm_count and hdm_decoder_offset are no longer in the UAPI cap struct; + * they are derived by traversing the CXL Capability Array and reading + * the HDM Decoder Capability register (HDMC). + */ + uint64_t hdm_decoder_offset; /* byte offset in COMP_REGS to HDM block */ + uint8_t hdm_count; /* number of HDM decoders */ + + /* DPA mmap pointer (may be NULL if test skips mmap sub-tests) */ + void *dpa_mmap; + size_t dpa_mmap_size; +}; + +FIXTURE_SETUP(cxl_type2) +{ + uint8_t infobuf[512] =3D {}; + struct vfio_device_info *info =3D (void *)infobuf; + const struct vfio_device_info_cap_cxl *cap; + + self->iommu =3D iommu_init(default_iommu_mode); + self->dev =3D vfio_pci_device_init(device_bdf, self->iommu); + + /* Query device info with space for capability chain */ + info->argsz =3D sizeof(infobuf); + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_INFO, info)); + + if (!(info->flags & VFIO_DEVICE_FLAGS_CXL)) { + printf("Device %s is not a CXL Type-2 device =E2=80=94 skipping\n", + device_bdf); + SKIP(return, "not a CXL Type-2 device"); + } + + cap =3D (const struct vfio_device_info_cap_cxl *) + find_device_cap(infobuf, sizeof(infobuf), + VFIO_DEVICE_INFO_CAP_CXL); + ASSERT_NE(NULL, cap); + memcpy(&self->cxl_cap, cap, sizeof(*cap)); + + /* + * Populate dpa_size and hdm_regs_size from region queries. + */ + { + struct vfio_region_info ri =3D { .argsz =3D sizeof(ri) }; + + ri.index =3D cap->dpa_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, + VFIO_DEVICE_GET_REGION_INFO, &ri)); + self->dpa_size =3D ri.size; + + ri.index =3D cap->comp_regs_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, + VFIO_DEVICE_GET_REGION_INFO, &ri)); + self->hdm_regs_size =3D ri.size; + } + + /* + * Derive hdm_decoder_offset and hdm_count from the COMP_REGS region. + * These fields were removed from vfio_device_info_cap_cxl to keep the + * UAPI minimal; userspace derives them via the CXL Capability Array. + */ + ASSERT_TRUE(find_hdm_decoder_info(self->dev, + cap->comp_regs_region_index, + &self->hdm_decoder_offset, + &self->hdm_count)); + + self->dvsec_base =3D find_cxl_dvsec(self->dev); + self->dpa_mmap =3D MAP_FAILED; + self->dpa_mmap_size =3D 0; +} + +FIXTURE_TEARDOWN(cxl_type2) +{ + if (self->dpa_mmap !=3D MAP_FAILED && self->dpa_mmap_size) + munmap(self->dpa_mmap, self->dpa_mmap_size); + vfio_pci_device_cleanup(self->dev); + iommu_cleanup(self->iommu); +} + +/* ------------------------------------------------------------------ */ +/* Tests: VFIO_DEVICE_GET_INFO */ +/* ------------------------------------------------------------------ */ + +/* + * CXL and PCI flags must both be set; CAPS must be set since we have a ca= p. + */ +TEST_F(cxl_type2, device_flags) +{ + uint8_t infobuf[512] =3D {}; + struct vfio_device_info *info =3D (void *)infobuf; + + info->argsz =3D sizeof(infobuf); + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_INFO, info)); + + ASSERT_TRUE(info->flags & VFIO_DEVICE_FLAGS_CXL); + ASSERT_TRUE(info->flags & VFIO_DEVICE_FLAGS_PCI); + ASSERT_TRUE(info->flags & VFIO_DEVICE_FLAGS_CAPS); + + printf("device flags: 0x%x num_regions: %u\n", + info->flags, info->num_regions); +} + +/* + * The CXL capability must report sane HDM and DPA values. + * hdm_count and hdm_decoder_offset are no longer in the cap struct; they + * are derived from the COMP_REGS region in FIXTURE_SETUP and stored in + * self->hdm_count and self->hdm_decoder_offset. + */ +TEST_F(cxl_type2, cxl_cap_fields) +{ + const struct vfio_device_info_cap_cxl *c =3D &self->cxl_cap; + + ASSERT_EQ(VFIO_DEVICE_INFO_CAP_CXL, c->header.id); + ASSERT_EQ(1, c->header.version); + + /* Must have at least one HDM decoder (derived from HDMC bits[3:0]) */ + ASSERT_GT(self->hdm_count, 0); + + /* COMP_REGS region size must be non-zero and 4-byte aligned */ + ASSERT_GT(self->hdm_regs_size, 0ULL); + ASSERT_EQ(0ULL, self->hdm_regs_size % 4); + + /* + * hdm_decoder_offset is derived from the CXL Capability Array. + * It must be: + * - non-zero (the CXL Capability Array Header precedes the HDM block) + * - dword-aligned + * - strictly less than hdm_regs_size (HDM block fits in the region) + */ + ASSERT_GT(self->hdm_decoder_offset, 0ULL); + ASSERT_EQ(0ULL, self->hdm_decoder_offset % 4); + ASSERT_LT(self->hdm_decoder_offset, self->hdm_regs_size); + + /* Region indices must not be ~0U (sentinel for "not found") */ + ASSERT_NE(~0U, c->dpa_region_index); + ASSERT_NE(~0U, c->comp_regs_region_index); + + /* The two regions must be distinct */ + ASSERT_NE(c->dpa_region_index, c->comp_regs_region_index); + + /* + * FIRMWARE_COMMITTED: decoder was pre-programmed by firmware; DPA + * region is immediately live. dpa_size must be non-zero in this case. + */ + if (c->flags & VFIO_CXL_CAP_FIRMWARE_COMMITTED) + ASSERT_GT(self->dpa_size, 0ULL); + + printf("hdm_count=3D%u dpa_size=3D0x%llx hdm_regs_size=3D0x%llx " + "hdm_decoder_offset=3D0x%llx " + "dpa_idx=3D%u comp_regs_idx=3D%u flags=3D0x%x " + "(firmware_committed=3D%d cache_capable=3D%d)\n", + self->hdm_count, (unsigned long long)self->dpa_size, + (unsigned long long)self->hdm_regs_size, + (unsigned long long)self->hdm_decoder_offset, + c->dpa_region_index, c->comp_regs_region_index, c->flags, + !!(c->flags & VFIO_CXL_CAP_FIRMWARE_COMMITTED), + !!(c->flags & VFIO_CXL_CAP_CACHE_CAPABLE)); +} + +/* ------------------------------------------------------------------ */ +/* Tests: VFIO_DEVICE_GET_REGION_INFO */ +/* ------------------------------------------------------------------ */ + +/* + * The component register BAR must report its real (non-zero) size with + * READ/WRITE/MMAP flags and a VFIO_REGION_INFO_CAP_SPARSE_MMAP capability. + * The sparse areas advertise the GPU/accelerator register windows =E2=80= =94 the + * mmappable parts of the BAR that do NOT contain CXL component registers. + * + * Three topologies are possible depending on where comp_regs sits in the = BAR: + * Topology A [gpu_regs | comp_regs] =E2=86=92 1 area: [0, comp_reg= _offset) + * Topology B [comp_regs | gpu_regs] =E2=86=92 1 area: [comp_end, b= ar_len) + * Topology C [gpu_regs | comp_regs | gpu_regs] =E2=86=92 2 areas + * + * In all cases each sparse area is a GPU register window; no area may ove= rlap + * the CXL component register block at [comp_reg_offset, comp_reg_offset + + * comp_reg_size). + */ +TEST_F(cxl_type2, component_bar_sparse_mmap) +{ + struct vfio_region_info probe =3D { .argsz =3D sizeof(probe) }; + struct vfio_region_info *reg; + const struct vfio_region_info_cap_sparse_mmap *sparse; + uint32_t bar_idx =3D self->cxl_cap.hdm_regs_bar_index; + uint64_t comp_reg_offset; + uint64_t total_gpu_size; + uint8_t *buf; + uint32_t needed; + uint32_t i; + + /* First probe: learn required buffer size and basic flags */ + probe.index =3D bar_idx; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, &probe)); + + ASSERT_GT(probe.size, 0ULL); + ASSERT_TRUE(probe.flags & VFIO_REGION_INFO_FLAG_READ); + ASSERT_TRUE(probe.flags & VFIO_REGION_INFO_FLAG_WRITE); + ASSERT_TRUE(probe.flags & VFIO_REGION_INFO_FLAG_MMAP); + + /* Kernel must signal caps are present by expanding argsz */ + ASSERT_GT(probe.argsz, (uint32_t)sizeof(probe)); + needed =3D probe.argsz; + + buf =3D calloc(1, needed); + ASSERT_NE(NULL, buf); + reg =3D (struct vfio_region_info *)buf; + reg->argsz =3D needed; + reg->index =3D bar_idx; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, reg)); + + /* Must carry a sparse-mmap cap */ + sparse =3D (const struct vfio_region_info_cap_sparse_mmap *) + find_region_cap(buf, needed, VFIO_REGION_INFO_CAP_SPARSE_MMAP); + ASSERT_NE(NULL, sparse); + + /* 1 area (topology A or B) or 2 areas (topology C); never more */ + ASSERT_GE(sparse->nr_areas, 1U); + ASSERT_LE(sparse->nr_areas, 2U); + + /* + * comp_reg_offset =3D hdm_regs_offset - CXL_CM_OFFSET. + * hdm_regs_offset is the BAR-relative address of the CXL.mem area + * start, which sits CXL_CM_OFFSET (0x1000) bytes into the component + * register block. + */ + ASSERT_GE(self->cxl_cap.hdm_regs_offset, (uint64_t)CXL_CM_OFFSET); + comp_reg_offset =3D self->cxl_cap.hdm_regs_offset - CXL_CM_OFFSET; + + total_gpu_size =3D 0; + for (i =3D 0; i < sparse->nr_areas; i++) { + uint64_t area_start =3D sparse->areas[i].offset; + uint64_t area_end =3D area_start + sparse->areas[i].size; + + /* Each area must be non-empty and fit within the BAR */ + ASSERT_GT(sparse->areas[i].size, 0ULL); + ASSERT_LE(area_end, reg->size); + + /* + * No sparse area may overlap the CXL component register block. + * Use hdm_regs_offset as a witness point: it is comp_reg_offset + * + CXL_CM_OFFSET, guaranteed inside the block. + */ + ASSERT_FALSE(area_start <=3D self->cxl_cap.hdm_regs_offset && + self->cxl_cap.hdm_regs_offset < area_end); + + total_gpu_size +=3D sparse->areas[i].size; + + printf(" sparse area[%u]: offset=3D0x%llx size=3D0x%llx\n", i, + (unsigned long long)area_start, + (unsigned long long)sparse->areas[i].size); + } + + /* GPU windows together must be strictly smaller than the full BAR */ + ASSERT_LT(total_gpu_size, reg->size); + + printf("component BAR %u: bar_size=3D0x%llx comp_reg_offset=3D0x%llx " + "nr_areas=3D%u total_gpu=3D0x%llx flags=3D0x%x\n", + bar_idx, (unsigned long long)reg->size, + (unsigned long long)comp_reg_offset, + sparse->nr_areas, (unsigned long long)total_gpu_size, + reg->flags); + + free(buf); +} + +/* + * DPA region must be readable, writable, and mmappable. + * Its size must be non-zero (verified in fixture setup via self->dpa_size= ). + */ +TEST_F(cxl_type2, dpa_region_info) +{ + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + + reg.index =3D self->cxl_cap.dpa_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, ®)); + + ASSERT_EQ(self->dpa_size, reg.size); + ASSERT_TRUE(reg.flags & VFIO_REGION_INFO_FLAG_READ); + ASSERT_TRUE(reg.flags & VFIO_REGION_INFO_FLAG_WRITE); + ASSERT_TRUE(reg.flags & VFIO_REGION_INFO_FLAG_MMAP); + + printf("DPA region: size=3D0x%llx offset=3D0x%llx flags=3D0x%x\n", + (unsigned long long)reg.size, + (unsigned long long)reg.offset, reg.flags); +} + +/* + * COMP_REGS region must be readable and writable but not mmappable. + * Its size covers [comp_reg_offset, comp_reg_offset + hdm_regs_size), whi= ch + * includes both the CXL Capability Array prefix (hdm_decoder_offset bytes) + * and the HDM decoder block. Size is available in self->hdm_regs_size + * (populated from this same region query at fixture setup time). + */ +TEST_F(cxl_type2, comp_regs_region_info) +{ + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + + reg.index =3D self->cxl_cap.comp_regs_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, ®)); + + ASSERT_EQ(self->hdm_regs_size, reg.size); + ASSERT_TRUE(reg.flags & VFIO_REGION_INFO_FLAG_READ); + ASSERT_TRUE(reg.flags & VFIO_REGION_INFO_FLAG_WRITE); + ASSERT_FALSE(reg.flags & VFIO_REGION_INFO_FLAG_MMAP); + + printf("COMP_REGS region: size=3D0x%llx offset=3D0x%llx flags=3D0x%x\n", + (unsigned long long)reg.size, + (unsigned long long)reg.offset, reg.flags); +} + +/* ------------------------------------------------------------------ */ +/* Tests: DPA region mmap */ +/* ------------------------------------------------------------------ */ + +/* + * mmap() the DPA region and verify the first page can be read. + * The region uses lazy fault insertion so the first access triggers the + * vfio_cxl_region_page_fault path. + */ +TEST_F(cxl_type2, dpa_mmap_fault) +{ + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + size_t map_size; + void *ptr; + uint8_t *p; + uint8_t val; + + reg.index =3D self->cxl_cap.dpa_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, ®)); + + /* Map just the first 2MB or the full region, whichever is smaller */ + map_size =3D (size_t)reg.size < (size_t)(2 * SZ_1M) + ? (size_t)reg.size : (size_t)(2 * SZ_1M); + + ptr =3D mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_SHARED, self->dev->fd, (off_t)reg.offset); + ASSERT_NE(MAP_FAILED, ptr); + + self->dpa_mmap =3D ptr; + self->dpa_mmap_size =3D map_size; + + /* First access - triggers vmf_insert_pfn */ + p =3D (uint8_t *)ptr; + val =3D *p; + (void)val; + + printf("DPA mmap: ptr=3D%p size=3D0x%zx first byte=3D0x%02x\n", + ptr, map_size, (uint8_t)val); + + /* Write a pattern and read it back */ + *p =3D 0xab; + ASSERT_EQ(0xab, *p); +} + +/* + * mmap() of the COMP_REGS region (no MMAP flag) must fail. + */ +TEST_F(cxl_type2, comp_regs_no_mmap) +{ + struct vfio_region_info reg =3D { .argsz =3D sizeof(reg) }; + void *ptr; + + reg.index =3D self->cxl_cap.comp_regs_region_index; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, ®)); + + ptr =3D mmap(NULL, (size_t)reg.size, PROT_READ, + MAP_SHARED, self->dev->fd, (off_t)reg.offset); + ASSERT_EQ(MAP_FAILED, ptr); + + printf("mmap of COMP_REGS correctly failed (errno=3D%d)\n", errno); +} + +/* + * mmap() of the CXL component register block within the component BAR must + * fail with EINVAL. The kernel blocks any mmap request whose range overl= aps + * [comp_reg_offset, comp_reg_offset + comp_reg_size) even though the BAR = as + * a whole carries the MMAP flag (GPU windows are mmappable). + * + * hdm_regs_offset (=3D comp_reg_offset + CXL_CM_OFFSET) is a page-aligned + * address guaranteed to lie inside the component register block. + */ +TEST_F(cxl_type2, comp_reg_mmap_blocked) +{ + struct vfio_region_info bar_reg =3D { .argsz =3D sizeof(bar_reg) }; + void *ptr; + + bar_reg.index =3D self->cxl_cap.hdm_regs_bar_index; + ASSERT_EQ(0, ioctl(self->dev->fd, VFIO_DEVICE_GET_REGION_INFO, &bar_reg)); + ASSERT_TRUE(bar_reg.flags & VFIO_REGION_INFO_FLAG_MMAP); + + /* + * hdm_regs_offset is page-aligned and is comp_reg_offset + CXL_CM_OFFSET + * (0x1000), so it is always within the component register block. + */ + ASSERT_EQ(0ULL, self->cxl_cap.hdm_regs_offset % SZ_4K); + + ptr =3D mmap(NULL, (size_t)SZ_4K, PROT_READ, + MAP_SHARED, self->dev->fd, + (off_t)(bar_reg.offset + self->cxl_cap.hdm_regs_offset)); + ASSERT_EQ(MAP_FAILED, ptr); + ASSERT_EQ(EINVAL, errno); + + printf("comp_reg_mmap_blocked: hdm_regs_offset=3D0x%llx correctly blocked= " + "(errno=3D%d)\n", + (unsigned long long)self->cxl_cap.hdm_regs_offset, errno); +} + +/* ------------------------------------------------------------------ */ +/* Tests: COMP_REGS region (HDM decoder emulation) */ +/* ------------------------------------------------------------------ */ + +/* + * Reading HDM Capability (offset 0x00) must return a non-zero value + * consistent with at least one decoder being present. + * Bits [3:0] encode the HDM decoder count. + */ +TEST_F(cxl_type2, hdm_cap_read) +{ + uint32_t cap; + uint32_t idx =3D self->cxl_cap.comp_regs_region_index; + uint64_t hdm_off =3D self->hdm_decoder_offset; + + cap =3D hdm_regs_read32(self->dev, idx, hdm_off, CXL_HDM_DECODER_CAP_OFFS= ET); + ASSERT_NE(~0u, cap); + + /* + * Verify the live HDMC register matches the count we derived in setup. + * Encoding: bits[3:0] =3D 0 =E2=86=92 1 decoder; N =E2=86=92 N*2 decoder= s. + */ + { + uint32_t field =3D cap & 0xf; + uint8_t expected =3D field ? (uint8_t)(field * 2) : 1; + + ASSERT_EQ(self->hdm_count, expected); + } + + printf("HDM Capability register: 0x%08x decoder_count_field=3D%u hdm_co= unt=3D%u\n", + cap, cap & 0xf, self->hdm_count); +} + +/* + * HDM decoder COMMIT -> COMMITTED transition. + * + * On firmware-committed hardware (COMMITTED already set) the COMMIT path + * is not exercisable. Instead verify the committed state is self-consist= ent: + * COMMITTED set, BASE/SIZE non-zero and large enough to cover dpa_size, a= nd + * reserved bits cleared by the emulation layer. + * + * On hardware where the decoder is not yet committed, exercise the full + * COMMIT=3D1 -> COMMITTED=3D1 path followed by COMMIT=3D0 -> COMMITTED=3D= 0. + */ +TEST_F(cxl_type2, hdm_ctrl_commit_to_committed) +{ + uint32_t idx =3D self->cxl_cap.comp_regs_region_index; + uint64_t hdm_off =3D self->hdm_decoder_offset; + uint64_t base_lo_off =3D HDM_DECODER_FIRST_OFFSET + HDM_DECODER_BASE_LO; + uint64_t base_hi_off =3D HDM_DECODER_FIRST_OFFSET + HDM_DECODER_BASE_HI; + uint64_t size_lo_off =3D HDM_DECODER_FIRST_OFFSET + HDM_DECODER_SIZE_LO; + uint64_t size_hi_off =3D HDM_DECODER_FIRST_OFFSET + HDM_DECODER_SIZE_HI; + uint64_t ctrl_off =3D HDM_DECODER_FIRST_OFFSET + HDM_DECODER_CTRL; + uint32_t ctrl_readback; + uint32_t base_lo, base_hi, size_lo, size_hi; + uint64_t dec_base, dec_size; + + ctrl_readback =3D hdm_regs_read32(self->dev, idx, hdm_off, ctrl_off); + + if (ctrl_readback & HDM_CTRL_COMMITTED) { + /* + * Firmware-committed decoder: verify the committed state is + * self-consistent. + * + * BASE is expected to be zero: the kernel clears BASE_LO/HI in + * the shadow for firmware-committed decoders so that the host + * HPA does not leak to the guest. The VMM will write the guest + * GPA into BASE before booting the VM. + * + * SIZE must cover at least dpa_size, and reserved bits must be + * clear (the emulation scrubs them on every write). + */ + base_lo =3D hdm_regs_read32(self->dev, idx, hdm_off, base_lo_off); + base_hi =3D hdm_regs_read32(self->dev, idx, hdm_off, base_hi_off); + size_lo =3D hdm_regs_read32(self->dev, idx, hdm_off, size_lo_off); + size_hi =3D hdm_regs_read32(self->dev, idx, hdm_off, size_hi_off); + dec_base =3D ((uint64_t)base_hi << 32) | (base_lo & ~GENMASK(27, 0)); + dec_size =3D ((uint64_t)size_hi << 32) | (size_lo & ~GENMASK(27, 0)); + + ASSERT_EQ(0ULL, dec_base); + ASSERT_GE(dec_size, self->dpa_size); + ASSERT_EQ(0u, ctrl_readback & HDM_CTRL_RESERVED_MASK); + + printf("Decoder 0 firmware-committed: ctrl=3D0x%08x " + "base=3D0x%llx (zeroed by kernel) size=3D0x%llx dpa_size=3D0x%llx= \n", + ctrl_readback, + (unsigned long long)dec_base, + (unsigned long long)dec_size, + (unsigned long long)self->dpa_size); + return; + } + + /* Decoder not committed: exercise COMMIT=3D1 -> COMMITTED=3D1 path */ + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, base_lo_off, 0x100= 00000)); + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, base_hi_off, 0)); + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, size_lo_off, 0x100= 00000)); + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, size_hi_off, 0)); + + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, ctrl_off, HDM_CTRL= _COMMIT)); + ctrl_readback =3D hdm_regs_read32(self->dev, idx, hdm_off, ctrl_off); + ASSERT_TRUE(ctrl_readback & HDM_CTRL_COMMITTED); + + printf("HDM decoder 0 CTRL after COMMIT=3D1: 0x%08x (COMMITTED set)\n", + ctrl_readback); + + ASSERT_EQ(4, hdm_regs_write32(self->dev, idx, hdm_off, ctrl_off, 0)); + ctrl_readback =3D hdm_regs_read32(self->dev, idx, hdm_off, ctrl_off); + ASSERT_FALSE(ctrl_readback & HDM_CTRL_COMMITTED); + + printf("HDM decoder 0 CTRL after COMMIT=3D0: 0x%08x (COMMITTED cleared)\n= ", + ctrl_readback); +} + +/* + * CXL Lock (DVSEC offset 0x14): + * - Reserved bits GENMASK(15,1) must be cleared. + * - Once locked, CXL Control writes must be discarded. + * + * On firmware-committed hardware CONFIG_LOCK is set before OS load by the + * BIOS. + * In this case verify: + * (a) Lock reserved bits are zero, + * (b) a write to CXL Control is silently discarded by the emulation. + * Both are directly testable without needing to transition from unlocked + * to locked. + * + * On hardware where CONFIG_LOCK is not yet set, exercise the full sequenc= e: + * write reserved bits (must be cleared), set CONFIG_LOCK, verify Control + * writes are then discarded. + */ +TEST_F(cxl_type2, dvsec_lock_semantics) +{ + uint16_t dvsec =3D self->dvsec_base; + uint16_t lock_val, ctrl_before, ctrl_after; + + if (!dvsec) + SKIP(return, "CXL DVSEC not found in config space"); + + lock_val =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_LOCK_OFFSET); + + if (lock_val & CXL_DVSEC_LOCK_CONFIG_LOCK) { + /* + * Lock is already set: verify reserved bits are zero in the + * current shadow, then verify a Control write is discarded. + */ + ASSERT_EQ(0u, lock_val & CXL_LOCK_RESERVED_MASK); + + ctrl_before =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_CONTROL_OFFSET); + /* Attempt to flip CXL_Mem_Enable (bit 2) */ + vfio_pci_config_writew(self->dev, dvsec + CXL_DVSEC_CONTROL_OFFSET, + ctrl_before ^ BIT(2)); + ctrl_after =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_CONTROL_OFFSET); + ASSERT_EQ(ctrl_before, ctrl_after); + + printf("CONFIG_LOCK set: lock=3D0x%04x, " + "Control write discarded (ctrl=3D0x%04x unchanged)\n", + lock_val, ctrl_after); + return; + } + + /* Lock is not set: exercise reserved-bit masking and lock-set sequence */ + vfio_pci_config_writew(self->dev, dvsec + CXL_DVSEC_LOCK_OFFSET, + CXL_LOCK_RESERVED_MASK); + lock_val =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_LOCK_OFFSET); + ASSERT_EQ(0u, lock_val & CXL_LOCK_RESERVED_MASK); + ASSERT_FALSE(lock_val & CXL_DVSEC_LOCK_CONFIG_LOCK); + + ctrl_before =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_CONTROL_OFFSET); + vfio_pci_config_writew(self->dev, dvsec + CXL_DVSEC_LOCK_OFFSET, + CXL_DVSEC_LOCK_CONFIG_LOCK); + lock_val =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_LOCK_OFFSET); + ASSERT_TRUE(lock_val & CXL_DVSEC_LOCK_CONFIG_LOCK); + + vfio_pci_config_writew(self->dev, dvsec + CXL_DVSEC_CONTROL_OFFSET, + ctrl_before ^ BIT(0)); + ctrl_after =3D vfio_pci_config_readw(self->dev, + dvsec + CXL_DVSEC_CONTROL_OFFSET); + ASSERT_EQ(ctrl_before, ctrl_after); + + printf("Lock set, Control write discarded: " + "lock=3D0x%04x ctrl_before=3D0x%04x ctrl_after=3D0x%04x\n", + lock_val, ctrl_before, ctrl_after); +} + +/* ------------------------------------------------------------------ */ +/* main */ +/* ------------------------------------------------------------------ */ + +int main(int argc, char *argv[]) +{ + device_bdf =3D vfio_selftests_get_bdf(&argc, argv); + return test_harness_run(argc, argv); +} --=20 2.25.1