From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D38112B17C for ; Wed, 30 Apr 2025 00:12:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971956; cv=none; b=jBbYY+303BL1+Q9tMne3AHELnfKbD9a7Dife18MAeVKd7pZfLlXILvRBPhR7KbPp9Vfu+mEEeNMkYFCReCbls6rKqzLwjuXlrqApN/8VdxR/rmO3lUOth1HDQNR4khKSlR8e0kkSxYFrQz3z92+g6QAjzxJu8Px82RYqlemymSY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971956; c=relaxed/simple; bh=qJvAeC9PJ0yrfTjybs1QJ63QO49Ry6kSSEW1PYb/KCM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=csF/vGcFnMZrsAGIKsCObRMYtft8DZ2BPQgYoZBGI4oP4i0kakkUXCysuH9B1JELG3KMrs/f04+MFRh8Ify4jAIDO7BcDmjUpyFpgeXgln8cf4TDvV0Xkvn9fhEGXTmgh8ka87i8j4bGhmfWKIBADQPo9tePZu5tblCrJbJO0xU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=B5mR4VJV; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="B5mR4VJV" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4768f90bf36so75863961cf.0 for ; Tue, 29 Apr 2025 17:12:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971954; x=1746576754; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1s5gcOP/bkzD9yLDRr1nu8e/0M24YgRWFUxLhi7LbcA=; b=B5mR4VJVK5aZiN3MIE2zDyFkz1/I+UUKDVZ/Jglqg0mM2oAo1Fj3YBFchXnDN2DD90 OdjBNFWqkeyfTE/oNh20SSTk0CiaYHL7y8K2JXXk5JuDo6ovGpN7BxM0uQdRte6NK/Nz koFYtJMacdZ7yVXB35IyAAICy6HRkXUrYt+ZOed6L5BXKnw1z0vwo5/7R1LgLcN4Dmi8 6/wnPdbv90D4Xt+9X8HB/s7BdpFDevdvWGMT2c/i2bQkirYEK/FnTj2X5QTUGCa+u7q5 fyykCrkTZmhTsV6vTlxLZq3Oo3GEVsD19y5FHkNZpke2W4Rbi8foDQfvnOca5xhiSN3u YXpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971954; x=1746576754; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1s5gcOP/bkzD9yLDRr1nu8e/0M24YgRWFUxLhi7LbcA=; b=izR05DTXWphLCN8aifRqaqlG+GMmUW+mje0ylolAsJHsGzx1hOC17vRQrJbG4nycMg utO10WfKy+gP+SBc8qNMynscxA/E/hfr0klcG6DIgPMnV5mysAS5zCAwWMEM2apxS3De SkiN6VMNvcHS5miM5+fHC4d/5yF8wiFFYqSOrbksuXwJjStgjQlVlPoaBtczraa9nxfs bwSTz0vhiPZaajdI2gAhoAYvjltwFiFkdlHATmJWie3fTVKWF3HKUTp6dab6/oVJeTID lmzmLZp2iLVCvFdw0YFLeHS8sYAc0c00iEhhF+xoTYPCMyxaXUaewzqymADEJ1KMleGD WE3A== X-Forwarded-Encrypted: i=1; AJvYcCW9lUGG7b7R2Wxkt6rkZDj5UlzB+jHGsJqf81if+ItqjemDeHcs+E/RFV+qExoIewS6YW5ePIW8zayzQqg=@vger.kernel.org X-Gm-Message-State: AOJu0YzlLzp62kNK5eKrr+JGcADcH+fqhjyQiTaVq4T954hgPgE+kQmS 8tTGhj4J8y0z5frC0FYLumzpLY+xgLy9cxUbruUVRtFk7WwvK98AaW0J+5TqpyMa9M9qjxcnJ1m l X-Gm-Gg: ASbGncsxW/7MFyjkcVEv4ASjktWz1jKPWahP+ZSchKTbtuGmzbTkalIwBMl3TzmExO9 gUhDP0+lRxgNuQfIDyVSIEzl2anId7XE+hW2zosTZLL3mN8Y65mHlqtlSqd+/9+agRA8egr4PDz qrQBRYEo07oo43w6AxtcIJHeNgax8erWnNOygwa9r70vJ/UDwmVN3V5GcTnxGPE4MPSq9G3f1c2 SgSkmYjnBL6Whd2rNLj1fsuUmwFwh2y12vOF1s5w+QtUzllqVoQsy3tKFqDFDZ0VMcDDt342Pq5 IiZMGhuOpfEqXfLgYUjPbDeZ6TDmhrfiVEdQHA9dMTgLwCSFH/2tFr880iQv1z73utJntdI7q+Y OUwbHRD46emkzjfKVmUfALdNyQHDw X-Google-Smtp-Source: AGHT+IEKGR2FHHd25LiYQesqqVIWnTFxWxv53o72vrO3uUoe9coBak8e3HrW/K1B4aZLE7+Z/ygLCA== X-Received: by 2002:ac8:6f10:0:b0:476:79d2:af57 with SMTP id d75a77b69052e-489e4a8942amr9867851cf.22.1745971954008; Tue, 29 Apr 2025 17:12:34 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:33 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 01/17] cxl: update documentation structure in prep for new docs Date: Tue, 29 Apr 2025 20:12:08 -0400 Message-ID: <20250430001224.1028656-2-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restructure the cxl folder to make adding docs per-page cleaner. Signed-off-by: Gregory Price --- .../theory-of-operation.rst} | 0 Documentation/driver-api/cxl/index.rst | 22 ++++++++++++++++--- .../cxl/{ =3D> linux}/access-coordinates.rst | 0 3 files changed, 19 insertions(+), 3 deletions(-) rename Documentation/driver-api/cxl/{memory-devices.rst =3D> devices/theor= y-of-operation.rst} (100%) rename Documentation/driver-api/cxl/{ =3D> linux}/access-coordinates.rst (= 100%) diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentatio= n/driver-api/cxl/devices/theory-of-operation.rst similarity index 100% rename from Documentation/driver-api/cxl/memory-devices.rst rename to Documentation/driver-api/cxl/devices/theory-of-operation.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 965ba90e8fb7..dfc0a4aa9003 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -4,12 +4,28 @@ Compute Express Link =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 +CXL device configuration has a complex handoff between platform (Hardware, +BIOS, EFI), OS (early boot, core kernel, driver), and user policy decisions +that have impacts on each other. The docs here break up configurations st= eps. + .. toctree:: :maxdepth: 1 + :caption: Overview =20 - memory-devices - access-coordinates - + self maturity-map =20 +.. toctree:: + :maxdepth: 2 + :caption: Device Reference + + devices/theory-of-operation + +.. toctree:: + :maxdepth: 1 + :caption: Linux Kernel Configuration + + linux/access-coordinates + + .. only:: subproject and html diff --git a/Documentation/driver-api/cxl/access-coordinates.rst b/Document= ation/driver-api/cxl/linux/access-coordinates.rst similarity index 100% rename from Documentation/driver-api/cxl/access-coordinates.rst rename to Documentation/driver-api/cxl/linux/access-coordinates.rst --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 339FB13D503 for ; Wed, 30 Apr 2025 00:12:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971959; cv=none; b=UC4z3Y1tFEnMsodnrMQ/no9JgcsAjg7miSQS5nZThxYFsqj/QqSOwnp9Ujqq5iNaJRkzg2qnbL9xItr6w7CI+l2J2nq/nOmXGyiD2CTwUkZyPvXtImvfP9A7L8keBwz5xNWs4JeMbiVcBdKkps709q8xcAPUcyC4F6oFee4e8CI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971959; c=relaxed/simple; bh=e5jugkeDpF7bmgh2TQtKLh2FIKa3OmtMEyl3FUUtO6A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qfce/AO+6YQS8PHSeBTnhkoK2yCfd5x2A6zr8usgSSaSjXl+/pVqLOxOjUAQqew2bFdGO+ZufbLz9wwzmhS0d255xxRr69wKgS/emzaD2ALiXsr6UsYwyh5IktfiAVa+yVj8U1pVYfkj7rEmh6GIgmOSUau+SNNhn5wGlBgGULY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=KGnYNqMs; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="KGnYNqMs" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4774ce422easo80639931cf.1 for ; Tue, 29 Apr 2025 17:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971956; x=1746576756; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4xZ+86saRq9pG/NaKQJUIWXjX8l/pag4UramD96/2zo=; b=KGnYNqMskWvRNV5VGq4Tng9hJj1/C0PewR1NCDc8h841f5kWGvBdMi2eHuw0FMjpg8 FrrRVstkzUsXCrDVULZQaUBXX3riJKMsRnCRersdaKZQsz2MBn1CLyjE/uyH/wrfDkzG oniFsrNkj95DfK3V//9h6rVnWh9UKSZSp0gRt1QggRgNDCbxs1c6Lv8tUfvNmH7vdGj6 0Y42IETfnhh6dYh5UMH6ClQTMHvnNJNIU9qgYP/91BELEF3yOf8OOzbtq3Q87TJVgpqq 3RF/tHlyvUHoYMkBXCcceWllR9lFTVj7m1Er+vWtudtA+rQ5HQIyx9R1pjqPnd1OvST7 1InQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971956; x=1746576756; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4xZ+86saRq9pG/NaKQJUIWXjX8l/pag4UramD96/2zo=; b=svtHlzidsFPqi8eIbXDDMtUX5OBPOClHP2k5KD7b/ufynvib0e+Qa5Hek55vl2dVJc Zxi+XJwc4/sY6x+wk1Rx00lzijoeW8hVXKodxf/SsehGHauPyg8LvuP6LfgBSSRxOqlU Hd6zWt/L8hZJEV1Np2v6miWIQcVaty1CcRI9jlpguGiYbcqEw0VjEyeSSVxdasrMs4Cn JPkverOVlvnDm7yR4VDtGyfji1vWeHIx8swhaVRdgJ9nBHGDn13EiegKF4qXn3qB80lM ItxjdkO/Ga2jtauLoA0YFH2/cLpanZtlIDCxe1Xz+GRvW7tvN6uB8JsTmZDegvFoCivR fRog== X-Forwarded-Encrypted: i=1; AJvYcCWmeSgy9/Gadzvacp8Zr1KFxBHwdoDRbzx1qRXYA/1GkUWOt9r+b/CywQJA+xWiGGeqc0jHrrAZSMGHkDw=@vger.kernel.org X-Gm-Message-State: AOJu0Yw4rcEz4/eot3Y/IyvW9wgZaA3YbkLkbYmWKrTmZutGzu4sItFU OSkHEag2xYOw4tKku5C9kPGPoxlnKvCpDoPZVpB7XjgtCVgkeinfkH+URDiS0d8= X-Gm-Gg: ASbGncv6gJAXihg1VZvB2lnvHOnRDGHvFDQ5LSzN8fVX+ZFz5tEIjPuW142TowMHEOO g/omehyQl93Yr+h9ghHtu7gehYGJ3xAugp+t55GgXk2vhNVzgUftXI5cMy1fJXBTLWwA4EQAH/7 TdKomOKhDNGQY0DD8S/Dl8IjRBQ8X4uDMrSNI8nPVDpFaI9U/zDQXp/R06BWRBc2NFJ7zoOlBXM 9SSHrCooA5ivXAjp6hexpmUqSawIzmP+zw8CbZ8KraG9rw3+lM7v83GtEEzT2QOW1ZzlcPJNt77 4gsV9nDxw7uaLgmXchNj8FybzVRUP12XDioqfjEUZnioPtWT9AaK+/MK10xNB2PNf8OGTJdnBvA naQYIDvgFwTbT1iclYU+ecoYwNdgp X-Google-Smtp-Source: AGHT+IFa3NzkxwjUZP3GFMLSdHf5Tca22UuV8p7IVcGENTe+mIIlXYwehMOv1AXXT0q/70/ru9M5rQ== X-Received: by 2002:a05:622a:5448:b0:476:7bd1:68dd with SMTP id d75a77b69052e-489e69a78ecmr11590101cf.50.1745971955928; Tue, 29 Apr 2025 17:12:35 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:35 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 02/17] cxl: docs/devices - device reference and uefi placeholder Date: Tue, 29 Apr 2025 20:12:09 -0400 Message-ID: <20250430001224.1028656-3-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a simple device primer sufficient to understand the theory of operation documentation. Add carve-out for CDAT with a TODO. Signed-off-by: Gregory Price --- .../driver-api/cxl/devices/device-types.rst | 169 ++++++++++++++++++ Documentation/driver-api/cxl/devices/uefi.rst | 9 + Documentation/driver-api/cxl/index.rst | 2 + 3 files changed, 180 insertions(+) create mode 100644 Documentation/driver-api/cxl/devices/device-types.rst create mode 100644 Documentation/driver-api/cxl/devices/uefi.rst diff --git a/Documentation/driver-api/cxl/devices/device-types.rst b/Docume= ntation/driver-api/cxl/devices/device-types.rst new file mode 100644 index 000000000000..e8dd051c2c71 --- /dev/null +++ b/Documentation/driver-api/cxl/devices/device-types.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Devices and Protocols +##################### + +The type of CXL device (Memory, Accelerator, etc) dictates many configurat= ion steps. This section +covers some basic background on device types and on-device resources used = by the platform and OS +which impact configuration. + +Protocols +********* + +There are three core protocols to CXL. For the purpose of this documentat= ion, +we will only discuss very high level definitions as the specific hardware +details are largely abstracted away from Linux. See the CXL specification +for more details. + +CXL.io +=3D=3D=3D=3D=3D=3D +The basic interaction protocol, similar to PCIe configuration mechanisms. +Typically used for initialization, configuration, and I/O access for anyth= ing +other than memory (CXL.mem) or cache (CXL.cache) operations. + +The Linux CXL driver exposes access to .io functionalty via the various sy= sfs +interfaces and /dev/cxl/ devices (which exposes direct access to device +mailboxes). + +CXL.cache +=3D=3D=3D=3D=3D=3D=3D=3D=3D +The mechanism by which a device may coherently access and cache host memor= y. + +Largely transparent to Linux once configured. + +CXL.mem +=3D=3D=3D=3D=3D=3D=3D +The mechanism by which the CPU may coherently access and cache device memo= ry. + +Largely transparent to Linux once configured. + + +Device Types +************ + +Type-1 +=3D=3D=3D=3D=3D=3D + +A Type-1 CXL device: + +* Supports cxl.io and cxl.cache protocols +* Implements a fully coherent cache +* Allow Device-to-Host coherence and Host-to-Device snoops. +* Does NOT have host-managed device memory (HDM) + +Typical examples of type-1 devices is a Smart NIC - which may want to +directly operate on host-memory (DMA) to store incoming packets. These +devices largely rely on CPU-attached memory. + +Type-2 +=3D=3D=3D=3D=3D=3D + +A Type-2 CXL Device: + +* Supports cxl.io, cxl.cache, and cxl.mem protocols +* Optionally implements coherent cache and Host-Managed Device Memory +* Is typically an accelerator device w/ high bandwidth memory. + +The primary difference between a type-1 and type-2 device is the presence +of host-managed device memory, which allows the device to operate on a +local memory bank - while the CPU sill has coherent DMA to the same memory. + +The allows things like GPUs to expose their memory via DAX devices or file +descriptors, allows drivers and programs direct access to device memory +rather than use block-transfer semantics. + +Type-3 +=3D=3D=3D=3D=3D=3D + +A Type-3 CXL Device + +* Supports cxl.io and cxl.mem +* Implements Host-Managed Device Memory +* May provide either Volatile or Persistent memory capacity (or both). + +A basic example of a type-3 device is a simple memory expanded, whose +local memory capacity is exposed to the CPU for access directly via +basic coherent DMA. + +Switch +=3D=3D=3D=3D=3D=3D + +A CXL switch is a device capacity of routing any CXL (and by extension, PC= Ie) +protocol between an upstream, downstream, or peer devices. Many devices, = such +as Multi-Logical Devices, imply the presence of switching in some manner. + +Logical Devices and Heads +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + +A CXL device may present one or more "Logical Devices" to one or more hosts +(via physical "Heads"). + +A Single-Logical Device (SLD) is a device which presents a single device to +one or more heads. + +A Multi-Logical Device (MLD) is a device which may present multiple devices +to one or more devices. + +A Single-Headed Device exposes only a single physical connection. + +A Multi-Headed Device exposes multiple physical connections. + +MHSLD +----- +A Multi-Headed Single-Logical Device (MHSLD) exposes a single logical +device to multiple heads which may be connected to one or more discrete +hosts. An example of this would be a simple memory-pool which may be +statically configured (prior to boot) to expose portions of its memory +to Linux via the CEDT ACPI table. + +MHMLD +----- +A Multi-Headed Multi-Logical Device (MHMLD) exposes multiple logical +devices to multiple heads which may be connected to one or more discrete +hosts. An example of this would be a Dynamic Capacity Device or which +may be configured at runtime to expose portions of its memory to Linux. + +Example Devices +*************** + +Memory Expander +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +The simplest form of Type-3 device is a memory expander. A memory expander +exposes Host-Managed Device Memory (HDM) to Linux. This memory may be +Volatile or Non-Volatile (Persistent). + +Memory Expanders will typically be considered a form of Single-Headed, +Single-Logical Device - as its form factor will typically be an add-in-card +(AIC) or some other similar form-factor. + +The Linux CXL driver provides support for static or dynamic configuration = of +basic memory expanders. The platform may program decoders prior to OS init +(e.g. auto-decoders), or the user may program the fabric if the platform +defers these operations to the OS. + +Multiple Memory Expanders may be added to an external chassis and exposed = to +a host via a head attached to a CXL switch. This is a "memory pool", and +would be considered an MHSLD or MHMLD depending on the management capabili= ties +provided by the switch platform. + +As of v6.14, Linux does not provide a formalized interface to manage non-D= CD +MHSLD or MHMLD devices. + +Dynamic Capacity Device (DCD) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D + +A Dynamic Capacity Device is a Type-3 device which provides dynamic manage= ment +of memory capacity. The basic premise of a DCD to provide an allocator-like +interface for physical memory capacity to a "Fabric Manager" (an external, +privileged host with privileges to change configurations for other hosts). + +A DCD manages "Memory Extents", which may be volatile or persistent. Exten= ts +may also be exclusive to a single host or shared across multiple. + +As of v6.14, Linux does not provide a formalized interface to manage DCD +devices, however there is active work on LKML targeting future release. + +Example T2 Device +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Todo diff --git a/Documentation/driver-api/cxl/devices/uefi.rst b/Documentation/= driver-api/cxl/devices/uefi.rst new file mode 100644 index 000000000000..a51583e6c44c --- /dev/null +++ b/Documentation/driver-api/cxl/devices/uefi.rst @@ -0,0 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +UEFI Data +######### + +Coherent Device Attribute Table (CDAT) +************************************** + +todo diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index dfc0a4aa9003..4dc99a6b08bd 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -19,6 +19,8 @@ that have impacts on each other. The docs here break up = configurations steps. :maxdepth: 2 :caption: Device Reference =20 + devices/device-types + devices/uefi devices/theory-of-operation =20 .. toctree:: --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DEBE150997 for ; Wed, 30 Apr 2025 00:12:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971961; cv=none; b=Pg2si+2e+qnsp5xlk+QH2cuNb++7ABiqWkY5f+a5qSHOOG3Rij56ZBQ6CRKXQG+VZylUxRHCvkOud0R8VLapnaQfh1AML9+3qxrBq5UmEBG0XRQH+pJHu1LrCJhwRZAYaK8nUETiIIhq6arVhVLTgJzlI9kft9E6557ms6HpBOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971961; c=relaxed/simple; bh=TJRzvtOcEZDWDTNl4nSpUSla4nmEEtSDMsHWKes/U8Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XyV+RZ2egGnD66oDWSGO1QLIssLuX6+SWBq2a/UJ+6T25QcIwBg4u5LZqN2m0pvVuur1Ktd/pAs/S4H3nttowjighEAz/ubmik4m/B5RDgrBAadNyiLxnGfcY63uSx0Y1bPhonIj5Dw5tIbitoZIGgKzrYwElMQvDyDXGERrUO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=W36ccfOj; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="W36ccfOj" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-47662449055so42225871cf.1 for ; Tue, 29 Apr 2025 17:12:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971958; x=1746576758; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pz4UQ+Wge1xHKt3y/MSWGIk9eVOTEVkWdrgaSHZA/wE=; b=W36ccfOjNSLCvfR53gBHm/d9+BK3+UCwb0nVdQc5FCZpZ+7CGwgQYmBxweucm7oVJ+ v0+Iwu0hRwat4Ite+Q/A0ayVJiYg8rdCCg3KK0GdW6LIOCHHMnTmf8j+lbawqQcoDCrm YWWbt/Ipoteq2ygKui9BPDEbj48RtPGmRVeYpM85WgZl5Wx0Aw6ybGqIGC3E9J1ocJNe 4MMbkAHYHgcULmOt98OVROpsORIs6w8VovEnMusbFErJsQ5zAv7aQyDWekruBM74enmT OGD8bSrgmU5QOXBHG4T6giYeByjd4ydSzzeLSppSYZOXLuWSoF93FKmaJvS/pbs1W5P9 j9fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971958; x=1746576758; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pz4UQ+Wge1xHKt3y/MSWGIk9eVOTEVkWdrgaSHZA/wE=; b=FprXOmXHI6rTCgxm/SzCpvCHm3wyq+Ma/eRIt683j76mm0ug3rdy+9dwyAHYVqJwbe BIQnonbZ8+jFXOO+Pn30irTVWMNoKnRxUY5+l+YF1yH2Z8Ed4W94YZfBHxe/ZSKe3ktF /AsHP+JGNexD4m0mG4oV0OvLiZTIde/6A06MT7D8riqCFyhzxl0DCwFxAOF05u5+n/rp lXbArQE/pJVTn/rJWDG1/+mD19nCJtGnkv0I8Fydw4WtNMuWVCObDViuz1siSjzFhz2o XEVeXfLFR4M+OC8iT6sTEWKtD98pvJBUIExXRm1gEF831ITNs2v579d3D0slCVCPGPwT ZfJw== X-Forwarded-Encrypted: i=1; AJvYcCXrz0G57bLt3+5KzR3BSfVmAOWGE8E2Q8BZnGKNxOK+M6SxRO99lJi8mQithe2i1BYWw9QUB4NvWk8ncf8=@vger.kernel.org X-Gm-Message-State: AOJu0YzTRMFuKm/Oz6fQxi00hI+mF9qID0+vkRgFiIcnsycm731RfNUI S0Rp+ACsOZ0+NNWiCeFUz4hsJVaASDlLBJKzdt69upgR8GleDlrM68yPcD9nFoM= X-Gm-Gg: ASbGncuz0xF0sWAYfSlY4GPGVxVWdqd4JTdoATGp8t2l6tICdVy1sywc66dCHttH4Jy D0CE7/kplPG6Tcltx732bWof6fZhJ2V4ngUh1sJwMf4Eeph30feUWOGK6bKq8GZezyx4hsk67gU PpcHk4MtvAzxotLiw7sacpIgGjgw0VXu5O3JwtT+LgwI5/dLH1rEOA2tDWihYXRllEp2Reo4IAm hP+PNVSBfsu42V1n6lHG8pUiiHxJjGDEVSi/SDIfvQgrc6Iryd5gSca8I5kYy6wK7M2Io4F9PoE p/wA4f/C8tAZo6TJwxJjvJlh7BsY1gNf1G6GPlY3nNDmW0Uf3syH5sd6/9QlUSke6sAPB1kQIG0 geNzsuYzxuHCZufJJbEEnciTXigym X-Google-Smtp-Source: AGHT+IFAs3fb/lD26OIqHoAciKrHVh44tYCONI63s56rmd4jvJ0V7S0OOlc5UI4z1GQc7Gnm5d9iHw== X-Received: by 2002:a05:622a:1b0c:b0:476:903a:b7f1 with SMTP id d75a77b69052e-489c34c7bacmr19217621cf.11.1745971958194; Tue, 29 Apr 2025 17:12:38 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:37 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 03/17] cxl: docs/platform/bios-and-efi documentation Date: Tue, 29 Apr 2025 20:12:10 -0400 Message-ID: <20250430001224.1028656-4-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add some docs on CXL configurations done in bios/efi that affect linux configuration - information vendors may care to consider. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 6 + .../driver-api/cxl/platform/bios-and-efi.rst | 261 ++++++++++++++++++ 2 files changed, 267 insertions(+) create mode 100644 Documentation/driver-api/cxl/platform/bios-and-efi.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 4dc99a6b08bd..7f4055503a43 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -23,6 +23,12 @@ that have impacts on each other. The docs here break up= configurations steps. devices/uefi devices/theory-of-operation =20 +.. toctree:: + :maxdepth: 2 + :caption: Platform Configuration + + platform/bios-and-efi + .. toctree:: :maxdepth: 1 :caption: Linux Kernel Configuration diff --git a/Documentation/driver-api/cxl/platform/bios-and-efi.rst b/Docum= entation/driver-api/cxl/platform/bios-and-efi.rst new file mode 100644 index 000000000000..0d83aa817e9d --- /dev/null +++ b/Documentation/driver-api/cxl/platform/bios-and-efi.rst @@ -0,0 +1,261 @@ +.. SPDX-License-Identifier: GPL-2.0 + +BIOS/EFI Configuration +###################### + +BIOS and EFI are largely responsible for configuring static information ab= out +devices (or potential future devices) such that Linux can build the approp= riate +logical representations of these devices. + +At a high level, this is what occurs during this phase of configuration. + +* The bootloader starts the BIOS/EFI. + +* BIOS/EFI do early device probe to determine static configuration + +* BIOS/EFI creates ACPI Tables that describe static config for the OS + +* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc) + +* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot proc= ess. + +Much of what this section is concerned with is ACPI Table production and +static memory map configuration. More detail on these tables can be found +under Platform Configuration -> ACPI Table Reference. + +.. note:: + Platform Vendors should read carefully, as this sections has recommenda= tions + on physical memory region size and alignment, memory holes, HDM interle= ave, + and what linux expects of HDM decoders trying to work with these featur= es. + +UEFI Settings +************* +If your platform supports it, the :code:`uefisettings` command can be used= to +read/write EFI settings. Changes will be reflected on the next reboot. Kex= ec +is not a sufficient reboot. + +One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit. +When this is enabled, this bit tells linux to defer management of a memory +region to a driver (in this case, the CXL driver). Otherwise, the memory is +treated as "normal memory", and is exposed to the page allocator during +:code:`__init`. + +uefisettings examples +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:code:`uefisettings identify` :: + + uefisettings identify + + bios_vendor: xxx + bios_version: xxx + bios_release: xxx + bios_date: xxx + product_name: xxx + product_family: xxx + product_version: xxx + +On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:= `CXL +Memory Attribute` field. This may be called something else on your platfo= rm. + +:code:`uefisettings get "CXL Memory Attribute"` :: + + selector: xxx + ... + question: Question { + name: "CXL Memory Attribute", + answer: "Enabled", + ... + } + +Physical Memory Map +******************* + +Physical Address Region Alignment +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D + +As of Linux v6.14, the hotplug memory system requires memory regions to be +uniform in size and alignment. While the CXL specification allows for mem= ory +regions as small as 256MB, the supported memory block size and alignment f= or +hotplugged memory is architecture-defined. + +A Linux memory blocks may be as small as 128MB and increase in powers of t= wo. + +* On ARM, the default block size and alignment is either 128MB or 256MB. + +* On x86, the default block size is 256MB, and increases to 2GB as the + capacity of the system increases up to 64GB. + +For best support across versions, platform vendors should place CXL memory= at +a 2GB aligned base address, and regions should be 2GB aligned. This also = helps +prevent the creating thousands of memory devices (one per block). + +Memory Holes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Holes in the memory map are tricky. Consider a 4GB device located at base +address 0x100000000, but with the following memory map :: + + --------------------- + | 0x100000000 | + | CXL | + | 0x1BFFFFFFF | + --------------------- + | 0x1C0000000 | + | MEMORY HOLE | + | 0x1FFFFFFFF | + --------------------- + | 0x200000000 | + | CXL CONT. | + | 0x23FFFFFFF | + --------------------- + +There are two issues to consider: + +* decoder programming, and +* memory block alignment. + +If your architecture requires 2GB uniform size and aligned memory blocks, = the +only capacity Linux is capable of mapping (as of v6.14) would be the capac= ity +from `0x100000000-0x180000000`. The remaining capacity will be stranded, = as +they are not of 2GB aligned length. + +Assuming your architecture and memory configuration allows 1GB memory bloc= ks, +this memory map is supported and this should be presented as multiple CFMWS +in the CEDT that describe each side of the memory hole separately - along = with +matching decoders. + +Multiple decoders can (and should) be used to manage such a memory hole (s= ee +below), but each chunk of a memory hole should be aligned to a reasonable = block +size (larger alignment is always better). If you intend to have memory ho= les +in the memory map, expect to use one decoder per contiguous chunk of host +physical memory. + +As of v6.14, Linux does provide support for memory hotplug of multiple +physical memory regions separated by a memory hole described by a single +HDM decoder. + + +Decoder Programming +******************* +If BIOS/EFI intends to program the decoders to be statically configured, +there are a few things to consider to avoid major pitfalls that will +prevent Linux compatibility. Some of these recommendations are not +required "per the specification", but Linux makes no guarantees of support +otherwise. + + +Translation Point +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Per the specification, the only decoders which **TRANSLATE** Host Physical +Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders= **. +All other decoders in the fabric are intended to route accesses without +translating the addresses. + +This is heavily implied by the specification, see: :: + + CXL Specification 3.1 + 8.2.4.20: CXL HDM Decoder Capability Structure + - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder = Flow + - Implementation Note: Device Decoder Logic + +Given this, Linux makes a strong assumption that decoders between CPU and +endpoint will all be programmed with addresses ranges that are subsets of +their parent decoder. + +Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specificatio= ns +"hand off" responsibility between domains, some early adopting platforms +attempted to do translation at the originating memory controller or host +bridge. This configuration requires a platform specific extension to the +driver and is not officially endorsed - despite being supported. + +It is *highly recommended* **NOT** to do this; otherwise, you are on your = own +to implement driver support for your platform. + +Interleave and Configuration Flexibility +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +If providing cross-host-bridge interleave, a CFMWS entry in the CEDT must = be +presented with target host-bridges for the interleaved device sets (there = may +be multiple behind each host bridge). + +If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CED= T is +required for that host bridge - if it covers the entire capacity of the de= vices +behind the host bridge. + +If intending to provide users flexibility in programming decoders beyond t= he +root, you may want to provide multiple CFMWS entries in the CEDT intended = for +different purposes. For example, you may want to consider adding: + +1) A CFMWS entry to cover all interleavable host bridges. +2) A CFMWS entry to cover all devices on a single host bridge. +3) A CFMWS entry to cover each device. + +A platform may choose to add all of these, or change the mode based on a B= IOS +setting. For each CFMWS entry, Linux expects descriptions of the described +memory regions in the SRAT to determine the number of NUMA nodes it should +reserve during early boot / init. + +As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even= if +a matching SRAT entry does not exist; however, this is not guaranteed in t= he +future and such a configuration should be avoided. + +Memory Holes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +If your platform includes memory holes intersparsed between your CXL memor= y, it +is recommended to utilize multiple decoders to cover these regions of memo= ry, +rather than try to program the decoders to accept the entire range and exp= ect +Linux to manage the overlap. + +For example, consider the Memory Hole described above :: + + --------------------- + | 0x100000000 | + | CXL | + | 0x1BFFFFFFF | + --------------------- + | 0x1C0000000 | + | MEMORY HOLE | + | 0x1FFFFFFFF | + --------------------- + | 0x200000000 | + | CXL CONT. | + | 0x23FFFFFFF | + --------------------- + +Assuming this is provided by a single device attached directly to a host b= ridge, +Linux would expect the following decoder programming :: + + ----------------------- ----------------------- + | root-decoder-0 | | root-decoder-1 | + | base: 0x100000000 | | base: 0x200000000 | + | size: 0xC0000000 | | size: 0x40000000 | + ----------------------- ----------------------- + | | + ----------------------- ----------------------- + | HB-decoder-0 | | HB-decoder-1 | + | base: 0x100000000 | | base: 0x200000000 | + | size: 0xC0000000 | | size: 0x40000000 | + ----------------------- ----------------------- + | | + ----------------------- ----------------------- + | ep-decoder-0 | | ep-decoder-1 | + | base: 0x100000000 | | base: 0x200000000 | + | size: 0xC0000000 | | size: 0x40000000 | + ----------------------- ----------------------- + +With a CEDT configuration with two CFMWS describing the above root decoder= s. + +Linux makes no guarantee of support for strange memory hole situations. + +Multi-Media Devices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +The CFMWS field of the CEDT has special restriction bits which describe wh= ether +the described memory region allows volatile or persistent memory (or both)= . If +the platform intends to support either: + +1) A device with multiple medias, or +2) Using a persistent memory device as normal memory + +A platform may wish to create multiple CEDT CFMWS entries to describe the = same +memory, with the intent of allowing the end user flexibility in how that m= emory +is configured. Linux does not presently have strong requirements in this a= rea. --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35DF617A2F5 for ; Wed, 30 Apr 2025 00:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971964; cv=none; b=QBYqwLwur0WTs7BpeeL6KIykGwVUqmuMk9MellLMfMSIaWP9SwbyuW8qg3PgFx8OcsAb+8hW22Fy332uNNTgDNagvprO9iBwVefGxzNMwoZsa/T4d/bZwqjw/A3cUdfuRoMhlHz4ydzRnOuoE5beHKiBT70Tm+LbgDKypY9jGPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971964; c=relaxed/simple; bh=6LnVPk4ddfSWsWM6rob3ta1q1te+xa2ccs2iHDmWUN8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T9g5N43yr5tmisHYF/ApS6lU6SVzdZGV7ZHOjHjdjTDf1qd1K6Cer71JZoi8XSi0sZ3fcL0t72ckW2b+4SKTKU+Yx9wj/sZQTWA5L4TZXWRdueVLsICle5DGsQckZ1GBhoOweesf1G1cxStdx3stmRQAAtffITNwiVSvuz8VZIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=Zy0aXy+P; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="Zy0aXy+P" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-47686580529so78870191cf.2 for ; Tue, 29 Apr 2025 17:12:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971960; x=1746576760; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eLzbAedNoVAFhTOnIuAY3V9RiD91bMliZ7SgGl61EFo=; b=Zy0aXy+PVSKcAZR17oSVchucSzQOUgDDcFADsIMZa/7T265cmLfuP9qzX/1VMkIGaw dGJmd91/oaNEfA3/NHMjPnhD7PPKKT8VtVGqi0e+OgVEprdr5JTm1e9y+2MnrcuDTqFv Jzb8xN28seTJ7//KvH+PiX0zC390MEpUeRppn6J3iPpn+mD977vY35jZ7GjNzLvMhw3a 6gZuR74kIhXz14FxVQz4kY62GxNcis2sa9GgbRi8Hx5vDSBZFik5BiwducN03UWDiYMF 6z87ej7G0wxbxKwydnCTkH8wETGK5xbcGgtfdcuMxCWeoDUA9JcAht+wgmx3iQoyHPCG /xjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971960; x=1746576760; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eLzbAedNoVAFhTOnIuAY3V9RiD91bMliZ7SgGl61EFo=; b=n7TzPcYNEjGBZHpgvUr+76Z/MozUwK5sM9QUOsjKND/XvI5tAbHIltKvImfBWp7RXr Ss+uOjbNrYGaHIYkmRb1ToU0aAown76A7Kh6qGAakahRwkRRikfczA6IKfqWO+AlX79L 7CkODloR4h3dULk/h1eY2Pf5vjvQlGelcycwD3ZhbMOX0oMlIDZPCSuHAwyeSJKQQr7F i84lrscgF+ccUmrjwLdnCyKsO+exzTHZkVgaw81c6Ov1og75FPKxoWPaMoFJJtLb0ZaI J4aboA4PTLLk60VqboFUM1C745uN17AEcewAishgW5ezZRDi6iVIjZwCUj0+1WdnDinP 6IZQ== X-Forwarded-Encrypted: i=1; AJvYcCUVcRd70x6Auzklcf6/nLVvSzNcmLXeHJ6ywkazgdoQQoBJ8oD3zNefxryLWfU6I3xwxNhNxNBzdbwVVcM=@vger.kernel.org X-Gm-Message-State: AOJu0YxBGSg3pfzCBiWvBaRAq2VEh8CaXR9k3r73/Zu+b1/w/HLDihjX Y3hPD1EEIq6Z/FlwjycolFz2rQD4HvCutNtdckUiCCRUm9WRqLSFVFePzqP7zRE= X-Gm-Gg: ASbGnct/5eINZDNB5hF9CvMKj30XhsQOXk9GD6Bq5f6ZFIQmHvYrWRFGXNKpf4mMU7X wqIuxodNL0wt7w8+9Wb8oxllVDK66RAFYqeSUQElUBsZOo4ZQk5MN8JBTEC3+ZCSVc5ziIUuySc GByVhTeX0RlaUphQlYpxPA8IHakb0Z3YHKc1eFe6LPpBjAlV4InJ0L4qXOQuce7iPT9M6wWY1GQ LEr+RhnuRdwmlNc8bhgnk+QoxcRIuJm+e5LYt7DjXTD34f6d5rxCmennxyJEQ8Kn3EYR8xtRQ80 D1ATNZ3CGOKInNI6+YMiEHckGdF81XvphqfWxvN/PR4/lIdZu7m0h84DpOOvWysMpQpyQNQSfvG nXNlfcAhhWea0HvOgf/TDW8OoW/7H X-Google-Smtp-Source: AGHT+IEJX7CKaTXcTN4ER3TV27+/lxml9Rphd1ejhWw2h1O2jcm/w02x8Uf3U7CSZBTwWLfpegVv1w== X-Received: by 2002:a05:622a:1243:b0:478:f03c:b3dc with SMTP id d75a77b69052e-489c5212c9cmr18879541cf.41.1745971960048; Tue, 29 Apr 2025 17:12:40 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:39 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 04/17] cxl: docs/platform/acpi reference documentation Date: Tue, 29 Apr 2025 20:12:11 -0400 Message-ID: <20250430001224.1028656-5-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add basic ACPI table information needed to understand the CXL driver probe process. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../driver-api/cxl/platform/acpi.rst | 83 +++++++++++++++++++ .../driver-api/cxl/platform/acpi/cedt.rst | 52 ++++++++++++ .../driver-api/cxl/platform/acpi/dsdt.rst | 27 ++++++ .../driver-api/cxl/platform/acpi/hmat.rst | 28 +++++++ .../driver-api/cxl/platform/acpi/slit.rst | 17 ++++ .../driver-api/cxl/platform/acpi/srat.rst | 37 +++++++++ 7 files changed, 245 insertions(+) create mode 100644 Documentation/driver-api/cxl/platform/acpi.rst create mode 100644 Documentation/driver-api/cxl/platform/acpi/cedt.rst create mode 100644 Documentation/driver-api/cxl/platform/acpi/dsdt.rst create mode 100644 Documentation/driver-api/cxl/platform/acpi/hmat.rst create mode 100644 Documentation/driver-api/cxl/platform/acpi/slit.rst create mode 100644 Documentation/driver-api/cxl/platform/acpi/srat.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 7f4055503a43..e47671e268b2 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -28,6 +28,7 @@ that have impacts on each other. The docs here break up = configurations steps. :caption: Platform Configuration =20 platform/bios-and-efi + platform/acpi =20 .. toctree:: :maxdepth: 1 diff --git a/Documentation/driver-api/cxl/platform/acpi.rst b/Documentation= /driver-api/cxl/platform/acpi.rst new file mode 100644 index 000000000000..9d1dfc4f2b8e --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi.rst @@ -0,0 +1,83 @@ +.. SPDX-License-Identifier: GPL-2.0 + +ACPI Tables +########### + +ACPI is the "Advanced Configuration and Power Interface", which is a stand= ard +that defines how platforms and OS manage power and configure computer hard= ware. +For the purpose of this theory of operation, when referring to "ACPI" we w= ill +usually refer to "ACPI Tables" - which are the way a platform (BIOS/EFI) +communicates static configuration information to the operation system. + +The Following ACPI tables contain *static* configuration and performance d= ata about CXL devices. + +.. toctree:: + :maxdepth: 1 + + acpi/cedt.rst + acpi/srat.rst + acpi/hmat.rst + acpi/slit.rst + acpi/dsdt.rst + +The SRAT table may also contain generic port/initiator content that is int= ended to describe the generic port, but not information about the rest of t= he path to the endpoint. + +Linux uses these tables to configure kernel resources for statically confi= gured (by BIOS/EFI) CXL devices, such as: + +- NUMA nodes +- Memory Tiers +- NUMA Abstract Distances +- SystemRAM Memory Regions +- Weighted Interleave Node Weights + +ACPI Debugging +************** + +The :code:`acpidump -b` command dumps the ACPI tables into binary format. + +The :code:`iasl -d` command disassembles the files into human readable for= mat. + +Example :code:`acpidump -b && iasl -d cedt.dat` :: + + /* + * Intel ACPI Component Architecture + * AML/ASL+ Disassembler version 20210604 (64-bit version) + * Copyright (c) 2000 - 2021 Intel Corporation + * + * Disassembly of cedt.dat, Fri Apr 11 07:47:31 2025 + * + * ACPI Data Table [CEDT] + * + * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue + */ + [000h 0000 4] Signature : "CEDT" [CXL Early Discovery Table] + ... + +Common Issues +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Most failures described here result in a failure of the driver to surface +memory as a DAX device and/or kmem. + +* CEDT CFMWS targets list UIDs do not match CEDT CHBS UIDs. +* CEDT CFMWS targets list UIDs do not match DSDT CXL Host Bridge UIDs. +* CEDT CFMWS Restriction Bits are not correct. +* CEDT CFMWS Memory regions are poorly aligned. +* CEDT CFMWS Memory regions spans a platform memory hole. +* CEDT CHBS UIDs do not match DSDT CXL Host Bridge UIDs. +* CEDT CHBS Specification version is incorrect. +* SRAT is missing regions described in CEDT CFMWS. + + * Result: failure to create a NUMA node for the region, or + region is placed in wrong node. + +* HMAT is missing data for regions described in CEDT CFMWS. + + * Result: NUMA node being placed in the wrong memory tier. + +* SLIT has bad data. + + * Result: Lots of performance mechanisms in the kernel will be very unha= ppy. + +All of these issues will appear to users as if the driver is failing to +support CXL - when in reality they are all the failure of a platform to +configure the ACPI tables correctly. diff --git a/Documentation/driver-api/cxl/platform/acpi/cedt.rst b/Document= ation/driver-api/cxl/platform/acpi/cedt.rst new file mode 100644 index 000000000000..1636131e218b --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi/cedt.rst @@ -0,0 +1,52 @@ +.. SPDX-License-Identifier: GPL-2.0 + +CEDT - CXL Early Discovery Table +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + +The CXL Early Discovery Table is generated by BIOS to describe the CXL mem= ory regions configured at boot by the BIOS. + +CHBS +---- +The CXL Host Bridge Structure describes CXL host bridges. Other than desc= ribing device register information, it reports the specific host bridge UID= for this host bridge. These host bridge ID's will be referenced in other = tables. + +Example :: + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000007 <- Host bridge _UID + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010370400000 + Register length : 0000000000010000 + +CFMWS +----- +The CXL Fixed Memory Window structure describes a memory region associated= with one or more CXL host bridges (as described by the CHBS). It addition= ally describes any inter-host-bridge interleave configuration that may have= been programmed by BIOS. + +Example :: + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 000000C050000000 <- Memory Region + Window size : 0000003CA0000000 + Interleave Members (2^n) : 01 <- Interleave configuration + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 <- Host Bridge _UID + Next Target : 00000006 <- Host Bridge _UID + +The restriction field dictates what this SPA range may be used for (memory= type, voltile vs persistent, etc). One or more bits may be set. :: + + Bit[0]: CXL Type 2 Memory + Bit[1]: CXL Type 3 Memory + Bit[2]: Volatile Memory + Bit[3]: Persistent Memory + Bit[4]: Fixed Config (HPA cannot be re-used) + +INTRA-host-bridge interleave (multiple devices on one host bridge) is NOT = reported in this structure, and is solely defined via CXL device decoder pr= ogramming (host bridge and endpoint decoders). diff --git a/Documentation/driver-api/cxl/platform/acpi/dsdt.rst b/Document= ation/driver-api/cxl/platform/acpi/dsdt.rst new file mode 100644 index 000000000000..7e10bfab98d6 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi/dsdt.rst @@ -0,0 +1,27 @@ +.. SPDX-License-Identifier: GPL-2.0 + +DSDT - Differentiated system Description Table +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This table describes what peripherals a machine has. + +This table's UIDs for CXL devices - specifically host bridges, must be +consistent with the contents of the CEDT, otherwise the CXL driver will +fail to probe correctly. + +Example Compute Express Link Host Bridge :: + + Scope (_SB) + { + Device (S0D0) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */)= // _HID: Hardware ID + Name (_CID, Package (0x02) // _CID: Compatible ID + { + EisaId ("PNP0A08") /* PCI Express Bus */, + EisaId ("PNP0A03") /* PCI Bus */ + }) + ... + Name (_UID, 0x05) // _UID: Unique ID + ... + } diff --git a/Documentation/driver-api/cxl/platform/acpi/hmat.rst b/Document= ation/driver-api/cxl/platform/acpi/hmat.rst new file mode 100644 index 000000000000..d604c5123440 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi/hmat.rst @@ -0,0 +1,28 @@ +.. SPDX-License-Identifier: GPL-2.0 + +HMAT - Heterogeneous Memory Attribute Table +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The Heterogeneous Memory Attributes Table contains information such as cac= he attributes and bandwidth and latency details for memory proximity domain= s. For the purpose of this document, we will only discuss the SSLIB entry. + +SLLBI +----- +The System Locality Latency and Bandwidth Information records latency and = bandwidth information for proximity domains. + +This table is used by Linux to configure interleave weights and memory tie= rs. + +Example (Heavily truncated for brevity) :: + + Structure Type : 0001 [SLLBI] + Data Type : 00 <- Latency + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Entry : 0080 <- DRAM LTC + Entry : 0100 <- CXL LTC + + Structure Type : 0001 [SLLBI] + Data Type : 03 <- Bandwidth + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Entry : 1200 <- DRAM BW + Entry : 0200 <- CXL BW diff --git a/Documentation/driver-api/cxl/platform/acpi/slit.rst b/Document= ation/driver-api/cxl/platform/acpi/slit.rst new file mode 100644 index 000000000000..56126f7ca250 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi/slit.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +SLIT - System Locality Information Table +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The system locality information table provides "abstract distances" betwee= n accessor and memory nodes. Node without initiators (cpus) are infinitely= (FF) distance away from all other nodes. + +The abstract distance described in this table does not describe any real l= atency of bandwidth information. + +Example :: + + Signature : "SLIT" [System Locality Information Table] + Localities : 0000000000000004 + Locality 0 : 10 20 20 30 + Locality 1 : 20 10 30 20 + Locality 2 : FF FF 0A FF + Locality 3 : FF FF FF 0A diff --git a/Documentation/driver-api/cxl/platform/acpi/srat.rst b/Document= ation/driver-api/cxl/platform/acpi/srat.rst new file mode 100644 index 000000000000..7dce043346c3 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/acpi/srat.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0 + +SRAT - Static Resource Affinity Table +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The System/Static Resource Affinity Table describes resource (CPU, Memory)= affinity to "Proximity Domains". This table is technically optional, but f= or performance information (see "HMAT") to be enumerated by linux it must b= e present. + +There is a careful dance between the CEDT and SRAT tables and how NUMA nod= es are created. If things don't look quite the way you expect - check the = SRAT Memory Affinity entries and CEDT CFMWS to determine what your platform= actually supports in terms of flexible topologies. + +The SRAT may statically assign portions of a CFMWS SPA range to a specific= proximity domains. See linux numa creation for more information about how= this presents in the NUMA topology. + +Proximity Domain +---------------- +A proximity domain is ROUGHLY equivalent to "NUMA Node" - though a 1-to-1 = mapping is not guaranteed. There are scenarios where "Proximity Domain 4" = may map to "NUMA Node 3", for example. (See "NUMA Node Creation") + +Memory Affinity +--------------- +Generally speaking, if a host does any amount of CXL fabric (decoder) prog= ramming in BIOS - an SRAT entry for that memory needs to be present. + +Example :: + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000001 <- NUMA Node 1 + Reserved1 : 0000 + Base Address : 000000C050000000 <- Physical Memory Region + Address Length : 0000003CA0000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + +Generic Initiator / Port +------------------------ + +todo --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50F8C199FA2 for ; Wed, 30 Apr 2025 00:12:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971965; cv=none; b=QpOV26NtJdgzf0MpHt8WfGon2fun9v6/rA9oBleafqzOirWmSDVHb6vX183PezGr4yEkCn13DLDNvRKJhh1cWTRyeYnkd/5DTlQGdpNx78fU/0NrifTXeBcqrSgdYrBTPVdnJD6pt+lXE5/FwndFgnAZ7TvkBdVKqDAoy5SYUzA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971965; c=relaxed/simple; bh=zYUTlpuZRl1KFq+TAP+s4up7B7gktukHww7X4/FQrhg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CHzqd9deZr3ZsinEeZPjxqrjJz7gr6q/1VvIKxjafIrBxv3y1igjoGacYIJM83wGGPAXrweLdNx42YgXjeeZvlMyJK0l1lyn2jNNsSXODHXzp5+SUl48TtFErmKpuelzrJ1K8hHC+MVxSkV/EUEESq9bJTGES/clIo66Mh1033w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=rvraVp9e; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="rvraVp9e" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-47677b77725so85864351cf.3 for ; Tue, 29 Apr 2025 17:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971962; x=1746576762; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MQ2qi/sFfPuNDbIhtDL8JDuIq6SqSYXpy/jUK0vn2/Q=; b=rvraVp9eDkZIC78t+gdA273m2N35PA6aGifvo/HrJHTkx00cNSsZBng1MDvjyn2w9Z V/0Tf/Gl4YscNjOcuYh3llPeWQ4upEEfYmrZtBkmI+7L7l2wrWACbFchNkCoiTPbEnfd 3+2jjXjdxnlMYPdHPqo6QszwvzHiV46nbMWnF96yDwuKorAvJY1g31I+5yV5Cx+MkIgl pI+RVS8mf5FCdpEPDYWTxGRA06FsLnY4CHbLZw6nisjdNeXBtz7wPDi97pxpmKaV9QXb 8plXxwYdx62joOBsxVubjiQ4RAAdXrTZuwR8K2gg786q2T5Yb7RqoA0UN/fqGpx7r2F3 RVaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971962; x=1746576762; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MQ2qi/sFfPuNDbIhtDL8JDuIq6SqSYXpy/jUK0vn2/Q=; b=JTb+8BYUo/DDT8swhZfdhGzYRPxBPDn37JgO9xqIDS4Oty7jbhceQ2paw/qqY8z0pf EFJz/Yld9RGzyAgrvGl5pJ6BKBGlpn1qbdlvU2dmgSKgEyOdm1jpPvxQZq/4FgQFVYy0 TOKd8WaNxW75ZDmBS/UG3oH4obumyXwOqBOgHoe83nXeZ6tUb94QOg7j5w0e44IxfM4e CVBpnYipEs6dGdw8bFpDU+d9FXcGcIQP3PYPp6ra9i96WSKo3HGL822upqjxJO0ZgqK4 66hOyQ/oZzXTIoK8It4cKf6xhxDAl4dMKSxvbBYOtDBpN4iTU7wCpLbXLXIyJFp8icWE pkCQ== X-Forwarded-Encrypted: i=1; AJvYcCVJsAJ+Ugl3fi5xtpEncfjCZCKGcYcUBUxshuJliv+4sfQJ5kOekxmzVEtXIIknG5COYz2m6TSSi0FJNS0=@vger.kernel.org X-Gm-Message-State: AOJu0YysJAi/q4V0NEJ9ojTwRGo6qUOXAs7N0FH+5qJR0HNyGvvjW/+o nPqbdaN//FDaYD/OxocBvQbf4b1Pi0wrHfo/CufonmRVhGlbgczuIJQDmySKPyM= X-Gm-Gg: ASbGncsnC3aM019juQAnXwp8KuglQd4VpD4h4HiDQWKyFdH4jdgMw8/4SBJbItPJ/Ih fxDQUaC82xv/+IQ0W9H4qCsp0ZNrjWONtunLb2Ey2Ji2738UPXYHqVNluEGN9ju3EuYgelAGO3j MEe0W8THW7Ijew4sxvI/kqfgDMuqkJUsB5SL5aLi5M2yCUZpHzzXL0LlYihhX32x9xxuIB09OUG vK68XJeP+yrB3yeo6Xtcz2i2F28lQXGA8SNiLKh7j9Dr2RHIgnrLgkf68bKwr4DiK2CwRhSVaJV kJ14rO3LMyH2fLJZDN0CjDx3wG9bI8N8wFW3vwzsA45pH/RB/KXJeILYpV1yhhgoba04M8amH+f gm5OUvfM4Ud1q/EFzNDHiFdpi7vkD X-Google-Smtp-Source: AGHT+IHu7wBCZMByPPxE6SIRMBJ9R0COwH0rCtLvymU2XLo8hRzOBYARDmvp0Oad7GQ5j53f7X7Elw== X-Received: by 2002:a05:622a:1149:b0:476:95dd:520e with SMTP id d75a77b69052e-489c3d89e9bmr20747201cf.16.1745971962093; Tue, 29 Apr 2025 17:12:42 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:41 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 05/17] cxl: docs/platform/example-configs documentation Date: Tue, 29 Apr 2025 20:12:12 -0400 Message-ID: <20250430001224.1028656-6-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add example ACPI Table configurations for different sample platforms. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../cxl/platform/example-configs.rst | 13 + .../example-configurations/flexible.rst | 296 ++++++++++++++++++ .../example-configurations/hb-interleave.rst | 107 +++++++ .../multi-dev-per-hb.rst | 90 ++++++ .../example-configurations/one-dev-per-hb.rst | 136 ++++++++ 6 files changed, 643 insertions(+) create mode 100644 Documentation/driver-api/cxl/platform/example-configs.r= st create mode 100644 Documentation/driver-api/cxl/platform/example-configura= tions/flexible.rst create mode 100644 Documentation/driver-api/cxl/platform/example-configura= tions/hb-interleave.rst create mode 100644 Documentation/driver-api/cxl/platform/example-configura= tions/multi-dev-per-hb.rst create mode 100644 Documentation/driver-api/cxl/platform/example-configura= tions/one-dev-per-hb.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index e47671e268b2..afc66759eed2 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -29,6 +29,7 @@ that have impacts on each other. The docs here break up = configurations steps. =20 platform/bios-and-efi platform/acpi + platform/example-configs =20 .. toctree:: :maxdepth: 1 diff --git a/Documentation/driver-api/cxl/platform/example-configs.rst b/Do= cumentation/driver-api/cxl/platform/example-configs.rst new file mode 100644 index 000000000000..90a10d7473c6 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/example-configs.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Example Platform Configurations +############################### + +.. toctree:: + :maxdepth: 1 + :caption: Contents + + example-configurations/one-dev-per-hb.rst + example-configurations/multi-dev-per-hb.rst + example-configurations/hb-interleave.rst + example-configurations/flexible.rst diff --git a/Documentation/driver-api/cxl/platform/example-configurations/f= lexible.rst b/Documentation/driver-api/cxl/platform/example-configurations/= flexible.rst new file mode 100644 index 000000000000..13a97c03e25a --- /dev/null +++ b/Documentation/driver-api/cxl/platform/example-configurations/flexible= .rst @@ -0,0 +1,296 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Flexible Presentation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +This system has a single socket with two CXL host bridges. Each host bridge +has two CXL memory expanders with a 4GB of memory (32GB total). + +On this system, the platform designer wanted to provide the user flexibili= ty +to configure the memory devices in various interleave or NUMA node +configurations. So they provided every combination. + +Things to note: + +* Cross-Bridge interleave is described in one CFMWS that covers all capaci= ty. +* One CFMWS is also described per-host bridge. +* One CFMWS is also described per-device. +* This SRAT describes one-node for each of the above CFMWS. +* The HMAT describes performance for each node in the SRAT. + +CEDT :: + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000007 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010370400000 + Register length : 0000000000010000 + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000006 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010380800000 + Register length : 0000000000010000 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000001000000000 + Window size : 0000000400000000 + Interleave Members (2^n) : 01 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + Second Target : 00000006 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000002000000000 + Window size : 0000000200000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000002200000000 + Window size : 0000000200000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000006 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000003000000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000003100000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000003200000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000006 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000003300000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000006 + +SRAT :: + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000001 + Reserved1 : 0000 + Base Address : 0000001000000000 + Address Length : 0000000400000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000002 + Reserved1 : 0000 + Base Address : 0000002000000000 + Address Length : 0000000200000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000003 + Reserved1 : 0000 + Base Address : 0000002200000000 + Address Length : 0000000200000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000004 + Reserved1 : 0000 + Base Address : 0000003000000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000005 + Reserved1 : 0000 + Base Address : 0000003100000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000006 + Reserved1 : 0000 + Base Address : 0000003200000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000007 + Reserved1 : 0000 + Base Address : 0000003300000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + +HMAT :: + + Structure Type : 0001 [SLLBI] + Data Type : 00 [Latency] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Target Proximity Domain List : 00000003 + Target Proximity Domain List : 00000004 + Target Proximity Domain List : 00000005 + Target Proximity Domain List : 00000006 + Target Proximity Domain List : 00000007 + Entry : 0080 + Entry : 0100 + Entry : 0100 + Entry : 0100 + Entry : 0100 + Entry : 0100 + Entry : 0100 + Entry : 0100 + + Structure Type : 0001 [SLLBI] + Data Type : 03 [Bandwidth] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Target Proximity Domain List : 00000003 + Target Proximity Domain List : 00000004 + Target Proximity Domain List : 00000005 + Target Proximity Domain List : 00000006 + Target Proximity Domain List : 00000007 + Entry : 1200 + Entry : 0400 + Entry : 0200 + Entry : 0200 + Entry : 0100 + Entry : 0100 + Entry : 0100 + Entry : 0100 + +SLIT :: + + Signature : "SLIT" [System Locality Information Table] + Localities : 0000000000000003 + Locality 0 : 10 20 20 20 20 20 20 20 + Locality 1 : FF 0A FF FF FF FF FF FF + Locality 2 : FF FF 0A FF FF FF FF FF + Locality 3 : FF FF FF 0A FF FF FF FF + Locality 4 : FF FF FF FF 0A FF FF FF + Locality 5 : FF FF FF FF FF 0A FF FF + Locality 6 : FF FF FF FF FF FF 0A FF + Locality 7 : FF FF FF FF FF FF FF 0A + +DSDT :: + + Scope (_SB) + { + Device (S0D0) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x07) // _UID: Unique ID + } + ... + Device (S0D5) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x06) // _UID: Unique ID + } + } diff --git a/Documentation/driver-api/cxl/platform/example-configurations/h= b-interleave.rst b/Documentation/driver-api/cxl/platform/example-configurat= ions/hb-interleave.rst new file mode 100644 index 000000000000..fa0885d82deb --- /dev/null +++ b/Documentation/driver-api/cxl/platform/example-configurations/hb-inter= leave.rst @@ -0,0 +1,107 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +Cross-Host-Bridge Interleave +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +This system has a single socket with two CXL host bridges. Each host bridge +has a single CXL memory expander with a 4GB of memory. + +Things to note: + +* Cross-Bridge interleave is described. +* The expanders are described by a single CFMWS. +* This SRAT describes one-node for both host bridges. +* The HMAT describes a single node's performance. + +CEDT :: + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000007 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010370400000 + Register length : 0000000000010000 + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000006 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010380800000 + Register length : 0000000000010000 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000001000000000 + Window size : 0000000200000000 + Interleave Members (2^n) : 01 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + Second Target : 00000006 + +SRAT :: + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000001 + Reserved1 : 0000 + Base Address : 0000001000000000 + Address Length : 0000000200000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + +HMAT :: + + Structure Type : 0001 [SLLBI] + Data Type : 00 [Latency] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Entry : 0080 + Entry : 0100 + + Structure Type : 0001 [SLLBI] + Data Type : 03 [Bandwidth] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Entry : 1200 + Entry : 0400 + +SLIT :: + + Signature : "SLIT" [System Locality Information Table] + Localities : 0000000000000003 + Locality 0 : 10 20 + Locality 1 : FF 0A + +DSDT :: + + Scope (_SB) + { + Device (S0D0) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x07) // _UID: Unique ID + } + ... + Device (S0D5) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x06) // _UID: Unique ID + } + } diff --git a/Documentation/driver-api/cxl/platform/example-configurations/m= ulti-dev-per-hb.rst b/Documentation/driver-api/cxl/platform/example-configu= rations/multi-dev-per-hb.rst new file mode 100644 index 000000000000..6adf7c639490 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/example-configurations/multi-de= v-per-hb.rst @@ -0,0 +1,90 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D +Multiple Devices per Host Bridge +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + +In this example system we will have a single socket and one CXL host bridg= e. +There are two CXL memory expanders with 4GB attached to the host bridge. + +Things to note: + +* Intra-Bridge interleave is not described here. +* The expanders are described by a single CEDT/CFMWS. +* This CEDT/SRAT describes one node for both devices. +* There is only one proximity domain the HMAT for both devices. + +CEDT :: + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000007 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010370400000 + Register length : 0000000000010000 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000001000000000 + Window size : 0000000200000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + +SRAT :: + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000001 + Reserved1 : 0000 + Base Address : 0000001000000000 + Address Length : 0000000200000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + +HMAT :: + + Structure Type : 0001 [SLLBI] + Data Type : 00 [Latency] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Entry : 0080 + Entry : 0100 + + Structure Type : 0001 [SLLBI] + Data Type : 03 [Bandwidth] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Entry : 1200 + Entry : 0200 + +SLIT :: + + Signature : "SLIT" [System Locality Information Table] + Localities : 0000000000000003 + Locality 0 : 10 20 + Locality 1 : FF 0A + +DSDT :: + + Scope (_SB) + { + Device (S0D0) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x07) // _UID: Unique ID + } + ... + } diff --git a/Documentation/driver-api/cxl/platform/example-configurations/o= ne-dev-per-hb.rst b/Documentation/driver-api/cxl/platform/example-configura= tions/one-dev-per-hb.rst new file mode 100644 index 000000000000..8b732dc8c5b6 --- /dev/null +++ b/Documentation/driver-api/cxl/platform/example-configurations/one-dev-= per-hb.rst @@ -0,0 +1,136 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D +One Device per Host Bridge +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +This system has a single socket with two CXL host bridges. Each host bridge +has a single CXL memory expander with a 4GB of memory. + +Things to note: + +* Cross-Bridge interleave is not being used. +* The expanders are in two separate but adjascent memory regions. +* This CEDT/SRAT describes one-node per device +* The expanders have the same performance and will be in the same memory t= ier. + +CEDT :: + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000007 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010370400000 + Register length : 0000000000010000 + + Subtable Type : 00 [CXL Host Bridge Structure] + Reserved : 00 + Length : 0020 + Associated host bridge : 00000006 + Specification version : 00000001 + Reserved : 00000000 + Register base : 0000010380800000 + Register length : 0000000000010000 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000001000000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000007 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Reserved : 00 + Length : 002C + Reserved : 00000000 + Window base address : 0000001100000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + Reserved : 0000 + Granularity : 00000000 + Restrictions : 0006 + QtgId : 0001 + First Target : 00000006 + +SRAT :: + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000001 + Reserved1 : 0000 + Base Address : 0000001000000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + + Subtable Type : 01 [Memory Affinity] + Length : 28 + Proximity Domain : 00000002 + Reserved1 : 0000 + Base Address : 0000001100000000 + Address Length : 0000000100000000 + Reserved2 : 00000000 + Flags (decoded below) : 0000000B + Enabled : 1 + Hot Pluggable : 1 + Non-Volatile : 0 + +HMAT :: + + Structure Type : 0001 [SLLBI] + Data Type : 00 [Latency] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Entry : 0080 + Entry : 0100 + Entry : 0100 + + Structure Type : 0001 [SLLBI] + Data Type : 03 [Bandwidth] + Target Proximity Domain List : 00000000 + Target Proximity Domain List : 00000001 + Target Proximity Domain List : 00000002 + Entry : 1200 + Entry : 0200 + Entry : 0200 + +SLIT :: + + Signature : "SLIT" [System Locality Information Table] + Localities : 0000000000000003 + Locality 0 : 10 20 20 + Locality 1 : FF 0A FF + Locality 2 : FF FF 0A + +DSDT :: + + Scope (_SB) + { + Device (S0D0) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x07) // _UID: Unique ID + } + ... + Device (S0D5) + { + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) //= _HID: Hardware ID + ... + Name (_UID, 0x06) // _UID: Unique ID + } + } --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB17A190676 for ; Wed, 30 Apr 2025 00:12:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971966; cv=none; b=j2mEomGe5zDopTO3jZL9AyW+df4Ntw0OC3XBAG2dFwoHXFVjRECNO/mlL3BlkdHfQ0UdblmlJa4pvAnGiZtzDVaz6VmjXAYcolKZ2qB9vqo3VFgqOI8e6gqqYRD0w7GR+0bqUMzE1SxysolweFmQZa9FlNLA/defTgqlBJvYz/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971966; c=relaxed/simple; bh=BDXb7lgzBNxpVwjYFyJjd4t2j9sJSet0d7GASa7svQw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nshHo6GW7gdOZMwSLHeNgMnHUFeSOFe5WfT2StpfcsB7PE1t8isSwBOgnrt5SvBUACtxuP9RbjdTnffWVzx8VfeSytJgLldPDHeXnvS8QBHIHmWcMsueEr0RJRqiBRkM+8jhHgAz52MVXIsd/Ww2m/3HIa4mBfB4LsQFOpjk0xg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=D74bUmKd; arc=none smtp.client-ip=209.85.160.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="D74bUmKd" Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-476ab588f32so111273071cf.2 for ; Tue, 29 Apr 2025 17:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971964; x=1746576764; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rp9atE4oqYHZI7zGuWpnaPq/N7S8aRGjs0Vq4QAx9No=; b=D74bUmKdC8yw+6rYsyb5RBpIe7ez1AfZ614t7zjQOWnl7yC7wOIZGvzGjg7Woa28Tj C6yRWUSwOAybDK+KDlI6TM+1swzxA0HchazMtCLBgQMnCDoyih5y2MgahR9ubq6/ynvg WVoj3/aGWUIgNIhq6FGfdWhqcSwcq4yTLhbiSsO6bGWucrHkeV1JKx30Rir97cXAO7yb Qnmg3ISIXRRrhrbiNNkUTDLC43gzoXoaE2zZX7ZkGx4v1MCuF8s2vgfzPl+yVBq5s37y ExCNCpAwT9Eh7hSt9eOrVPDsuk/RbE01OsFwqWOXGtCkHFNmw2UYZDhVz1bRwNu5hXvX v+RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971964; x=1746576764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rp9atE4oqYHZI7zGuWpnaPq/N7S8aRGjs0Vq4QAx9No=; b=V+TcMK+UoFufzBcxuuNPLFj6ekgjl9keroMp/YmOXOi54OE6HkmfbP8Ty/sE1FAqgB Dq8/HXHdW8ygiSHsC02ANsf/2NgzOG//g34hFCfVDJ4fwWR5Zkeon9WrHzjotpFhYGDL SYJyDXZ5xJFZiczj2aAAv3EAddL91fZDOeWDrSoKxAUigtXoG2ocPcHFLQ6sBheEUtez R0u1sBc4Ma+/q+g9s3sKmhaaBYmdnAwsmMvY0IgjBiv3cRQjf2FJCA0vJ4/T0F8eNQ6N 31NcH5lgT3G6utD7iPbECotqwrhd05JTSuGN1d3/E78stlb+/RbZTzITqH1pEar2nBLA RYpQ== X-Forwarded-Encrypted: i=1; AJvYcCW9MnuRcnjkUgQtnZkrQ2x3fgyi+Aa1yS4HXcvVeiKbAYf+GpM5wNrXt243NyqsWHVqday4hsHz1g9YLUs=@vger.kernel.org X-Gm-Message-State: AOJu0YxZoUmrcDUnhq+A6N0R5o0+MCkKSGgCNnlqJ1dL7Jyz7v9a8RsX RsGK4Z0EQFsGrK2f3REHkh4T0aSKBs2w5AWl7tLB6IeOgNQfQ7p/vUkO5Ey1jmQ= X-Gm-Gg: ASbGncsp8sapXgHgj0ZVJ33Vx62C1S+dfboFmts9fg9hISIp7lMxwQP5HtT3wgOMCq/ n/an1C7SKa0S9LbMr4+Ra7+q3waokEVBCQ1bAB8GCta7bqwnEYv9uxaLMWIikWCpMvIWlvbj6pt Y+NJG+Qt2ZpSLu4/eHthuGrgKeoKbxJIQtXasl7JS+yuWcnOIZEQWhOGe9lRnhgqO9oZbXB37dM ZBP/mlE+F0oa3EfZ24qlqPE94Rg2WPc+hg4eTpwStF48Vm9vchBRRAbUmEB6ziP1k9bih4/imRJ MZbELjYfiU28eTWgFN9Ix4YUcsXfsjuXMGFM6ekfG38oW8wUuvuctF/TV5LcArgB79OcrxkyRH0 ZLbBJdSKhSlYoxxfOV0+T3sV+6bIM X-Google-Smtp-Source: AGHT+IFWuPCqJzafZiXkTM0FkbazmzhXWCCT/3TEiGPXkXbgX4OBk3+9TL3sm/DfypsQTK99+QJYgA== X-Received: by 2002:a05:622a:1f8d:b0:477:4df:9a58 with SMTP id d75a77b69052e-489e4a8d38fmr11558301cf.18.1745971963895; Tue, 29 Apr 2025 17:12:43 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:43 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 06/17] cxl: docs/linux - overview Date: Tue, 29 Apr 2025 20:12:13 -0400 Message-ID: <20250430001224.1028656-7-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add type-3 device configuration overview that explains the probe process for a type-3 device from early-boot through memory-hotplug. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 3 +- .../driver-api/cxl/linux/overview.rst | 104 ++++++++++++++++++ 2 files changed, 106 insertions(+), 1 deletion(-) create mode 100644 Documentation/driver-api/cxl/linux/overview.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index afc66759eed2..01c0284fc273 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -32,9 +32,10 @@ that have impacts on each other. The docs here break up= configurations steps. platform/example-configs =20 .. toctree:: - :maxdepth: 1 + :maxdepth: 2 :caption: Linux Kernel Configuration =20 + linux/overview linux/access-coordinates =20 =20 diff --git a/Documentation/driver-api/cxl/linux/overview.rst b/Documentatio= n/driver-api/cxl/linux/overview.rst new file mode 100644 index 000000000000..33017ccb84f1 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/overview.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Overview +######## + +This section presents the configuration process of a CXL Type-3 memory dev= ice, +and how it is ultimately exposed to users as either a :code:`DAX` device or +normal memory pages via the kernel's page allocator. + +Portions marked with a bullet are points at which certain kernel objects +are generated. + +1) Early Boot + + a) BIOS, Build, and Boot Parameters + + i) EFI_MEMORY_SP + ii) CONFIG_EFI_SOFT_RESERVE + iii) CONFIG_MHP_DEFAULT_ONLINE_TYPE + iv) nosoftreserve + + b) Memory Map Creation + + i) EFI Memory Map / E820 Consulted for Soft-Reserved + + * CXL Memory is set aside to be handled by the CXL driver + + * IO Resources are created for CFMWS entry + + c) NUMA Node Creation + + * ACPI CEDT and SRAT table are used to create Nodes from Proximity dom= ains (PXM) + + d) Memory Tier Creation + + * A default memory_tier is created with all nodes. + + e) Contiguous Memory Allocation + + * Any requested CMA is allocated from Online nodes + + f) Init Finishes, Drivers start probing + +2) ACPI and PCI Drivers + + a) Detect CXL device, marking it for probe by CXL driver + + b) This portion will not be covered specifically. + +3) CXL Driver Operation + + a) Base device creation + + * root, port, and memdev devices created + * CEDT CFMWS IO Resource creation + + b) Decoder creation + + * root, switch, and endpoint decoders created + + c) Logical device creation + + * memory_region and endpoint devices created + + d) Devices are associated with each other + + * If auto-decoder (BIOS-programmed decoders), driver validates + configurations, builds associations, and locks configs at probe time. + + * If user-configured, validation and associations are built at + decoder-commit time. + + e) Regions surfaced as DAX region + + * dax_region created + + * DAX device created via DAX driver + +4) DAX Driver Operation + + a) DAX driver surfaces DAX region as one of two dax device modes + + * kmem - dax device is converted to hotplug memory blocks + + * DAX kmem IO resource creation + + * hmem - dax device is left as daxdev to be accessed as a file. + + * If hmem, journey ends here. + + b) DAX kmem surfaces memory region to Memory Hotplug to add to page + allocator as "driver managed memory" + +5) Memory Hotplug + + a) mhp component surfaces a dax device memory region as multiple memory + blocks to the page allocator + + * blocks appear in :code:`/sys/bus/memory/devices` and linked to a NUM= A node + + b) blocks are onlined into the requested zone (NORMAL or MOVABLE) + + * Memory is marked "Driver Managed" to avoid kexec from using it as re= gion + for kernel updates --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A0991AA1E0 for ; Wed, 30 Apr 2025 00:12:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971970; cv=none; b=iT84dtIY2exvL19+iIMp6dOPCFQIssiTy5LeTAVfA2EXGTyUjdbEXBUEuVWqqjCWHReOdPvzGNv9Ngg2NMQDsFQhcs+YdX/c4ahXv3KNryXw4aFiedL4SPdJz9xcU+5sh/JBW6VFKVWzHhCiyf9SfuwCmecQ06iNPa7A7IHJupM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971970; c=relaxed/simple; bh=JE870abXD6mvQOdGZhbrZh7Dlq87uN7FQJrDxkuYRko=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oYDdfK+ZwokeBoDp3JThkpJMhLU/KUFEef5WurjqdOsnefWIRGm2Y2VL3jMmB/7Y12LOExbFcV+vluOhELLissDRhKQCDUxuinpTDURMGoKmbZq+LNFWdOat6r5vMllj0WqiZAgNED4HHYWFW4VUBLqjiztXUBJylA2uZDjqOsA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=lUSBEjAF; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="lUSBEjAF" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-477296dce8dso75549981cf.3 for ; Tue, 29 Apr 2025 17:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971966; x=1746576766; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OzbJ+lbcTmilx4tIqHT+9/1H84YazWKPPxqiI2QgM8c=; b=lUSBEjAFbpw40mLUFTYrk5tygPrNk5yzEx0cIsNAqfRVGPXnnhUhDgT8aGhs60DP7L 211tjICVhpddCL+9AduuNtTRt3IYX5akJWr18eJVrDdk1ykl9CS2rK8IjD+ZvR0h3uNy D+d7AhCpCiR0SYtGI41LCgEKALqvc+aT5q/dSLzDhsbc8xrU0NuWKzx442j047wduVug cZDE26wmOY4g9yEdim1LRucf8xfpJ232Ki2+YhXfl3SfB6T6MIazVdIFjlG3xXnIB5i7 5PR7KAxgzeVFHgpQgIJkiAbenHw0S82g2hEKbRqgWo8UYN5sQH6H6l3Mih2YQS19wsO9 gheA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971966; x=1746576766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OzbJ+lbcTmilx4tIqHT+9/1H84YazWKPPxqiI2QgM8c=; b=dorDeOMBy4T7uo5Y2ckUmVcUbMyOcJq+KkHNcRz6iMiVRjjhVvvZ5qtD9sdJaBjZLA QyDQSJUM1MWDo8W29WO+Sx1Pi0DUMslPv1RLsdPce65QDDtc3NAalzjn+ugFyFY5g7jg bHdW1dfE3Sns7dUq2joTYIpK0fn4HobaJIXVVATmLWmWUT1ITn7HziN20y19I09kgFsK 4RRR6TFUdY1D1GN9mW6ZeKzWHFx/dwlCNsvf9aBErZK6JH6J4wt381KsZeZ5Q1QL9xXJ mBL58e9OzHJRoifbGvGsYFwAnMIp54/v+1Wo3DuF2iUxbwuzAUvmX+paqS7z1zrBsrSg TH5w== X-Forwarded-Encrypted: i=1; AJvYcCWkWB3jMSDnEDmYdFaQIzhTuFhKFFiouIJow4udoh2/Bx8qsWzmeHpl/QhYwC/dzxK+9ikbWt39N1goQFs=@vger.kernel.org X-Gm-Message-State: AOJu0YwIGsNncg8EXOfVmx2TQx5wnLSj4nqpCS53O8r3BNSUUYmZGEvk ESaG1y9cfd2JMUcS/LCjqlU20VB+0c0GN2+0DkIdfQ0Rm6A0CPAjSmjjG7b8DxU= X-Gm-Gg: ASbGncvX6cSUnr+JbQ/tyVmaW0/qnOujc38x569DMmJUhMmEcKOF9Nyyyl89qLFgr34 fSe/h+xvqZwvKyALwvcwPbzznfUewXMLdZ+1AX/U/ioopKyPDLsW0o/jxxaryWZ6MxXXIc5duDu P9gmR3PMwgep9s5eHIsl4L8ZAolZw06QG+NURFjwzPwWh5MgQAtKbSP6gglBd/z0ZbU4FfEBzjx 4t3iYenmwb6F5DULVAnE7GxJmVVCUhJl3F0KHE0/j3QHTNvBDR61QXzWpCHvOSzGipAD/Xoa8Ai KlCYtxZTMcsr/R5ftRJTIu40Jlk48toCQk5RWP3fY6uxNOXFHqjn50DdpJd/UOZWCsfsb8JZE59 /DBLaSprmpSyw2AOoi+m00oYjZ2Za X-Google-Smtp-Source: AGHT+IFeXobGzY+UYaAw9ml70E4P3Ut/z3bwXmavgtI4R4ODTbaat+OFdnH3Erfh/wCpOWxEkkV53A== X-Received: by 2002:a05:622a:5c99:b0:474:eff7:a478 with SMTP id d75a77b69052e-489c5412645mr17505051cf.46.1745971966068; Tue, 29 Apr 2025 17:12:46 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:45 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 07/17] cxl: docs/linux - early boot configuration Date: Tue, 29 Apr 2025 20:12:14 -0400 Message-ID: <20250430001224.1028656-8-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document __init time configurations that affect CXL driver probe process and memory region configuration. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../driver-api/cxl/linux/early-boot.rst | 129 ++++++++++++++++++ 2 files changed, 130 insertions(+) create mode 100644 Documentation/driver-api/cxl/linux/early-boot.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 01c0284fc273..da74480207b7 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -36,6 +36,7 @@ that have impacts on each other. The docs here break up = configurations steps. :caption: Linux Kernel Configuration =20 linux/overview + linux/early-boot linux/access-coordinates =20 =20 diff --git a/Documentation/driver-api/cxl/linux/early-boot.rst b/Documentat= ion/driver-api/cxl/linux/early-boot.rst new file mode 100644 index 000000000000..ca9fa1b57855 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/early-boot.rst @@ -0,0 +1,129 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Linux Init (Early Boot) +*********************** + +Linux configuration is split into two major steps: Early-Boot and everythi= ng else. + +During early boot, Linux sets up immutable resources (such as numa nodes),= while +later operations include things like driver probe and memory hotplug. Lin= ux may +read EFI and ACPI information throughout this process to configure logical +representations of the devices. + +During Linux Early Boot stage (functions in the kernel that have the __init +decorator), the system takes the resources created by EFI/BIOS (ACPI table= s) +and turns them into resources that the kernel can consume. + + +BIOS, Build and Boot Options +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D + +There are 4 pre-boot options that need to be considered during kernel build +which dictate how memory will be managed by Linux during early boot. + +* EFI_MEMORY_SP + + * BIOS/EFI Option that dictates whether memory is SystemRAM or + Specific Purpose. Specific Purpose memory will be deferred to + drivers to manage - and not immediately exposed as system RAM. + +* CONFIG_EFI_SOFT_RESERVE + + * Linux Build config option that dictates whether the kernel supports + Specific Purpose memory. + +* CONFIG_MHP_DEFAULT_ONLINE_TYPE + + * Linux Build config that dictates whether and how Specific Purpose memo= ry + converted to a dax device should be managed (left as DAX or onlined as + SystemRAM in ZONE_NORMAL or ZONE_MOVABLE). + +* nosoftreserve + + * Linux kernel boot option that dictates whether Soft Reserve should be + supported. Similar to CONFIG_EFI_SOFT_RESERVE. + +Memory Map Creation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +While the kernel parses the EFI memory map, if :code:`Specific Purpose` me= mory +is supported and detect, it will set this region aside as :code:`SOFT_RESE= RVED`. + +If :code:`EFI_MEMORY_SP=3D0`, :code:`CONFIG_EFI_SOFT_RESERVE=3Dn`, or +:code:`nosoftreserve=3Dy` - Linux will default a CXL device memory region = to +SystemRAM. This will expose the memory to the kernel page allocator in +:code:`ZONE_NORMAL`, making it available for use for most allocations (inc= luding +:code:`struct page` and page tables). + +If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLI= NE_TYPE_*` +dictates whether the memory is onlined by default (:code:`_OFFLINE` or +:code:`_ONLINE_*`), and if online which zone to online this memory to by d= efault +(:code:`_NORMAL` or :code:`_MOVABLE`). + +If placed in :code:`ZONE_MOVABLE`, the memory will not be available for mo= st +kernel allocations (such as :code:`struct page` or page tables). This may +significant impact performance depending on the memory capacity of the sys= tem. + + +NUMA Node Reservation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Linux refers to the proximity domains (:code:`PXM`) defined in the SRAT to +create NUMA nodes in :code:`acpi_numa_init`. Typically, there is a 1:1 rel= ation +between :code:`PXM` and NUMA node IDs. + +SRAT is the only ACPI defined way of defining Proximity Domains. Linux cho= oses +to, at most, map those 1:1 with NUMA nodes. CEDT adds a description of SPA +ranges which Linux may wish to map to one or more NUMA nodes + +If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`P= XM` +is created (as of v6.15). In the future, Linux may reject CFMWS not descri= bed +by SRAT due to the ambiguity of proximity domain association. + +It is important to note that NUMA node creation cannot be done at runtime.= All +possible NUMA nodes are identified at :code:`__init` time, more specifical= ly +during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`P= XM` +data for Linux to identify NUMA nodes their associated memory regions. + +The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`. + +See the Example Platform Configurations section for more information. + +Memory Tiers Creation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Memory tiers are a collection of NUMA nodes grouped by performance charact= eristics. +During :code:`__init`, Linux initializes the system with a default memory = tier that +contains all nodes marked :code:`N_MEMORY`. + +:code:`memory_tier_init` is called at boot for all nodes with memory onlin= e by +default. :code:`memory_tier_late_init` is called during late-init for node= s setup +during driver configuration. + +Nodes are only marked :code:`N_MEMORY` if they have *online* memory. + +Tier membership can be inspected in :: + + /sys/devices/virtual/memory_tiering/memory_tierN/nodelist + 0-1 + +If nodes are grouped which have clear difference in performance, check the= HMAT +and CDAT information for the CXL nodes. All nodes default to the DRAM tie= r, +unless HMAT/CDAT information is reported to the memory_tier component via +`access_coordinates`. + +Contiguous Memory Allocation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +The contiguous memory allocator (CMA) enables reservation of contiguous me= mory +regions on NUMA nodes during early boot. However, CMA cannot reserve memo= ry +on NUMA nodes that are not online during early boot. :: + + void __init hugetlb_cma_reserve(int order) { + if (!node_online(nid)) + /* do not allow reservations */ + } + +This means if users intend to defer management of CXL memory to the driver= , CMA +cannot be used to guarantee huge page allocations. If enabling CXL memory= as +SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node ca= n be +made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line +parameters. --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 020721C5D7A for ; Wed, 30 Apr 2025 00:12:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971972; cv=none; b=g0Q8lqbzQyBBLjZy/2dYQDPrmukYPywOS71evYX0+ZnuL05sV1PcDjiYLufYL1xFU9F6kYjFakHugfr1MJIukbwNp0yCBohX1QM0Qj3CR4McCF3PBaI+yLBEj0YFLoQ2jsqjw/vWASC8Xoois6PjCajD/6dO0BS4eDkH3YV5Q4o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971972; c=relaxed/simple; bh=r7NFqhv2nT1cBSxaAUpORyHo5B+6APjNFmDOdZGh7jk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZP7iPeakskTVEzFV7nDQwhtLQsd9SA6wktdZW4MClllJh8adwnWeW5KgCm7DJnwjTz5DGEQmdFZ8kqndHXJbTtyKYpJ91/jx9QfbD0PKYq+laH7cJRqCTTo7DG51JsLwQ7U35WgHQsWsBsmvhstlviRk5bz/zpSMadGTiXGZIEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=RNQxTE2U; arc=none smtp.client-ip=209.85.160.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="RNQxTE2U" Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4774d68c670so114320181cf.0 for ; Tue, 29 Apr 2025 17:12:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971969; x=1746576769; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=17vD1ZJMB9GHCWFotVbxEr5M0DwafWGgiCu9VsxYk/U=; b=RNQxTE2UZcStv4EnrY+vSjiVSowmU++f9/IvwafUarZV2wVM4nDdqmEJ5qPGyb2Fq6 5uGXunQShg4Ncp/O8Sby3JW+x1P4VBDrzxQvuJ0araQY0V0NvCcu8OGpW+ZnqMY2Op4m U5nYMU4WHFtDJNRwQfX4MH8+B1s9NQ6zi3saZLHT/dbg+clTJhwtAiQMuMlCIYS0CmGg PYGrf0l6BymUrHh8FOE0wXnnF1rZB8wYsf+MesTW+/bRnvRhrLZUtKAiWzUK75TvOxOE pRhXU5cMuU7vkGDhU8Mn3QnlS2dd2GWaupaxKDuQsHG+MZG2iIQQs2+5YQVpJeN1lo/Z SGuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971969; x=1746576769; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=17vD1ZJMB9GHCWFotVbxEr5M0DwafWGgiCu9VsxYk/U=; b=xFJB7vZYImLM3u3VaFUF0gZsYcCaokGSowo1jE+631zhYUgay3FREOLZxLAgE7S6RA ecqN/QJGrH9W3JiMP8YHsfUtzfOmwRl6Kr5cmqbF64TFd7TZfA+4I/SjZeP1smrB0/PO 1ES7UE5LulcDfd4ePs4FV9p52R+Ym+dyMZowizB5lcoyEDFl6eY85wK3jEP82xM0CUfo SqQmw4vgDzLstJuva5Rj9b0cvvpIj+8zAubCKTPMJ4ofmA4LbrHyd28u5HTp6UIYgtS5 6OnNJefrHIea67Uxbs3O+0S4xEfM8JqU1Fkqgy+9OkpjGncPQNaF7BCalqCnYiPFJ62d BbCg== X-Forwarded-Encrypted: i=1; AJvYcCUwU4XHPGSSJkao04G4vbdrJbgezQk3vSo4dhsTkKdtxkdRATfksvy/q9gyI2ACZLgYgT26WRIQI5rnLPU=@vger.kernel.org X-Gm-Message-State: AOJu0Yxn4P3ADm0g3uBwpywRtNjOKKQiaFAUFHa+RdW+7igoCoMykg8i B0Z4SR7/+U9Oxx5P2dy9q62mT9b3o+t3wWPbRigbeyh1Wa7B1RWVyMO3nr3pNDgs1Dga7vjGVp4 H X-Gm-Gg: ASbGnctv3jDVxMvSWjzLmxWyGLoETHj2ZUxiWsQFHICgLzM4Ndcg9Q3cK8E1dMWMiZn ldIfIgCG7DRdjDT4yABmdpYhlsSW7gcTc8O7gswNzOiLlGJvgC77hk41f2jy1nRJQg1wJg9DLP2 CUFgSoo4GQ2turv6FcjKyByyoPNzK/Z3hr9fhSC7efmy4qkyseb9iDDpFpGjP1vZI7tBMdAV3gb IkyNbzJyp6cUvy4m2U+RcUu+JETEMnFuVg8Ayj3qdv/7df9H2bhDzhym6u9KZvukctNH/p46msY X2TLDan9jYcdZ2PB+zOJGa/Px2o0PPRAGFDX1mRoYnfYmGCQlqkRiRUaLJvzC4KxA8wyz5ldI8r KUOUBzzhMzRfZ7F4xbh8caTAbBguw X-Google-Smtp-Source: AGHT+IFtCfbX47An8N8+VKAkxzfsQgqHg5Cwzaxupcx9dVEzxCh6E3PguwQHjTEefZfrbCdjCgjbnw== X-Received: by 2002:a05:622a:1f8d:b0:476:a6bc:a94d with SMTP id d75a77b69052e-489e4a8df67mr12766091cf.19.1745971968682; Tue, 29 Apr 2025 17:12:48 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:47 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 08/17] cxl: docs/linux - add cxl-driver theory of operation Date: Tue, 29 Apr 2025 20:12:15 -0400 Message-ID: <20250430001224.1028656-9-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add docs for the CXL driver that explains the base devices, decoder types, region types, mailbox interfaces, and decoder programming. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../driver-api/cxl/linux/cxl-driver.rst | 521 ++++++++++++++++++ 2 files changed, 522 insertions(+) create mode 100644 Documentation/driver-api/cxl/linux/cxl-driver.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index da74480207b7..b915ce982048 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -37,6 +37,7 @@ that have impacts on each other. The docs here break up = configurations steps. =20 linux/overview linux/early-boot + linux/cxl-driver linux/access-coordinates =20 =20 diff --git a/Documentation/driver-api/cxl/linux/cxl-driver.rst b/Documentat= ion/driver-api/cxl/linux/cxl-driver.rst new file mode 100644 index 000000000000..f403804648b1 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/cxl-driver.rst @@ -0,0 +1,521 @@ +.. SPDX-License-Identifier: GPL-2.0 + +CXL Driver Operation +#################### + +The devices described in this section are present in :: + + /sys/bus/cxl/devices/ + /dev/cxl/ + +The :code:`cxl-cli` library, maintained as part of the NDTCL project, may +be used to script interactions with these devices. + +Drivers +******* +The CXL driver is split into a number of drivers. + +* cxl_core - fundamental init interface and core object creation +* cxl_port - initializes root and provides port enumeration interface. +* cxl_acpi - initializes root decoders and interacts with ACPI data. +* cxl_p/mem - initializes memory devices +* cxl_pci - uses cxl_port to enumates the actual fabric hierarchy. + +Driver Devices +************** +Here is an example from a single-socket system with 4 host bridges. Two ho= st +bridges have a single memory device attached, and the devices are interlea= ved +into a single memory region. The memory region has been converted to dax. = :: + + # ls /sys/bus/cxl/devices/ + dax_region0 decoder3.0 decoder6.0 mem0 port3 + decoder0.0 decoder4.0 decoder6.1 mem1 port4 + decoder1.0 decoder5.0 endpoint5 port1 region0 + decoder2.0 decoder5.1 endpoint6 port2 root0 + +For this section we'll explore the devices present in this configuration, = but +we'll explore more configurations in-depth in example configurations below. + +Base Devices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Most devices in a CXL fabric are a `port` of some kind (because each +device mostly routes request from one device to the next, rather than +provide a manageable service). + +Root +---- +The `CXL Root` is logical object created by the `cxl_acpi` driver during +:code:`cxl_acpi_probe` - if the :code:`ACPI0017` `Compute Express Link +Root Object` Device Class is found. + +The Root contains links to: + +* `Host Bridge Ports` defined by ACPI CEDT CHBS. + +* `Root Decoders` defined by ACPI CEDT CFMWS. + +:: + + # ls /sys/bus/cxl/devices/root0 + decoder0.0 dport0 dport5 port2 subsystem + decoders_committed dport1 modalias port3 uevent + devtype dport4 port1 port4 uport + + # cat /sys/bus/cxl/devices/root0/devtype + cxl_port + + # cat port1/devtype + cxl_port + + # cat decoder0.0/devtype + cxl_decoder_root + +The root is first `logical port` in the CXL fabric, as presented by the Li= nux +CXL driver. The `CXL root` is a special type of `switch port`, in that it +only has downstream port connections. + +Port +---- +A `port` object is better described as a `switch port`. It may represent a +host bridge to the root or an actual switch port on a switch. A `switch po= rt` +contains one or more decoders used to route memory requests downstream por= ts, +which may be connected to another `switch port` or an `endpoint port`. + +:: + + # ls /sys/bus/cxl/devices/port1 + decoder1.0 dport0 driver parent_dport uport + decoders_committed dport113 endpoint5 subsystem + devtype dport2 modalias uevent + + # cat devtype + cxl_port + + # cat decoder1.0/devtype + cxl_decoder_switch + + # cat endpoint5/devtype + cxl_port + +CXL `Host Bridges` in the fabric are probed during :code:`cxl_acpi_probe` = at +the time the `CXL Root` is probed. The allows for the immediate logical +connection to between the root and host bridge. + +* The root has a downstream port connection to a host bridge + +* The host bridge has an upstream port connection to the root. + +* The host bridge has one or more downstream port connections to switch + or endpoint ports. + +A `Host Bridge` is a special type of CXL `switch port`. It is explicitly +defined in the ACPI specification via `ACPI0016` ID. `Host Bridge` ports +will be probed at `acpi_probe` time, while similar ports on an actual swit= ch +will be probed later. Otherwise, switch and host bridge ports look very +similar - the both contain switch decoders which route accesses between +upstream and downstream ports. + +Endpoint +-------- +An `endpoint` is a terminal port in the fabric. This is a `logical device= `, +and may be one of many `logical devices` presented by a memory device. It +is still considered a type of `port` in the fabric. + +An `endpoint` contains `endpoint decoders` available for use and the +*Coherent Device Attribute Table* (CDAT) used to describe the capabilities +of the device. :: + + # ls /sys/bus/cxl/devices/endpoint5 + CDAT decoders_committed modalias uevent + decoder5.0 devtype parent_dport uport + decoder5.1 driver subsystem + + # cat /sys/bus/cxl/devices/endpoint5/devtype + cxl_port + + # cat /sys/bus/cxl/devices/endpoint5/decoder5.0/devtype + cxl_decoder_endpoint + + +Memory Device (memdev) +---------------------- +A `memdev` is probed and added by the `cxl_pci` driver in :code:`cxl_pci_p= robe` +and is managed by the `cxl_mem` driver. It primarily provides the `IOCTL` +interface to a memory device, via :code:`/dev/cxl/memN`, and exposes vario= us +device configuration data. :: + + # ls /sys/bus/cxl/devices/mem0 + dev firmware_version payload_max security uevent + driver label_storage_size pmem serial + firmware numa_node ram subsystem + + +Decoders +=3D=3D=3D=3D=3D=3D=3D=3D +A `Decoder` is short for a CXL Host-Managed Device Memory (HDM) Decoder. I= t is +a device that routes accesses through the CXL fabric to an endpoint, and at +the endpoint translates a `Host Physical` to `Device Physical` Addressing. + +The CXL 3.1 specification heavily implies that only endpoint decoders shou= ld +engage in translation of `Host Physical Address` to `Device Physical Addre= ss`. +:: + + 8.2.4.20 CXL HDM Decoder Capability Structure + + IMPLEMENTATION NOTE + CXL Host Bridge and Upstream Switch Port Decode Flow + + IMPLEMENTATION NOTE + Device Decode Logic + +These notes imply that there are two logical groups of decoders. + +* Routing Decoder - a decoder which routes accesses but does not translate + addresses from HPA to DPA. + +* Translating Decoder - a decoder which translates accesses from HPA to DPA + for an endpoint to service. + +The CXL drivers distinguish 3 decoder types: root, switch, and endpoint. O= nly +endpoint decoders are Translating Decoders, all others are Routing Decoder= s. + +.. note:: PLATFORM VENDORS BE AWARE + + Linux makes a strong assumption that endpoint decoders are the only dec= oder + in the fabric that actively translates HPA to DPA. Linux assumes routi= ng + decoders pass the HPA unchanged to the next decoder in the fabric. + + It is therefore assumed that any given decoder in the fabric will have = an + address range that is a subset of its upstream port decoder. Any deviat= ion + from this scheme undefined per the specification. Linux prioritizes + spec-defined / architectural behavior. + +Decoders may have one or more `Downstream Targets` if configured to interl= eave +memory accesses. This will be presented in sysfs via the :code:`target_li= st` +parameter. + +Root Decoder +------------ +A `Root Decoder` is logical construct of the physical address and interlea= ve +configurations present in the ACPI CEDT CFMWS. Linux presents this inform= ation +as a decoder present in the `CXL Root`. We consider this a `Root Decoder`, +though technically it exists on the boundary of the CXL specification and +platform-specific CXL root implementations. + +Linux considers these logical decoders a type of `Routing Decoder`, and is= the +first decoder in the CXL fabric to receive a memory access from the platfo= rm's +memory controllers. + +`Root Decoders` are created during :code:`cxl_acpi_probe`. One root decod= er +is created per CFMWS entry in the ACPI CEDT. + +The :code:`target_list` parameter is filled by the CFMWS target fields. Ta= rgets +of a root decoder are `Host Bridges`, which means interleave done at the r= oot +decoder level is an `Inter-Host-Bridge Interleave`. + +Only root decoders are capable of `Inter-Host-Bridge Interleave`. + +Such interleaves must be configured by the platform and described in the A= CPI +CEDT CFMWS, as the target CXL host bridge UIDs in the CFMWS must match the= CXL +host bridge UIDs in the ACPI CEDT CHBS and ACPI DSDT. + +Interleave settings in a rootdecoder describe how to interleave accesses a= mong +the *immediate downstream targets*, not the entire interleave set. + +The memory range described in the root decoder is used to + +1) Create a memory region (:code:`region0` in this example), and + +2) Associate the region with an IO Memory Resource (:code:`kernel/resource= .c`) + +:: + + # ls /sys/bus/cxl/devices/decoder0.0/ + cap_pmem devtype region0 + cap_ram interleave_granularity size + cap_type2 interleave_ways start + cap_type3 locked subsystem + create_ram_region modalias target_list + delete_region qos_class uevent + + # cat /sys/bus/cxl/devices/decoder0.0/region0/resource + 0xc050000000 + +The IO Memory Resource is created during early boot when the CFMWS region = is +identified in the EFI Memory Map or E820 table (on x86). + +Root decoders are defined as a separate devtype, but are also a type +of `Switch Decoder` due to having downstream targets. :: + + # cat /sys/bus/cxl/devices/decoder0.0/devtype + cxl_decoder_root + +Switch Decoder +-------------- +Any non-root, translating decoder is considered a `Switch Decoder`, and wi= ll +present with the type :code:`cxl_decoder_switch`. Both `Host Bridge` and `= CXL +Switch` (device) decoders are of type :code:`cxl_decoder_switch`. :: + + # ls /sys/bus/cxl/devices/decoder1.0/ + devtype locked size target_list + interleave_granularity modalias start target_type + interleave_ways region subsystem uevent + + # cat /sys/bus/cxl/devices/decoder1.0/devtype + cxl_decoder_switch + + # cat /sys/bus/cxl/devices/decoder1.0/region + region0 + +A `Switch Decoder` has associations between a region defined by a root +decoder and downstream target ports. Interleaving done within a switch de= coder +is a multi-downstream-port interleave (or `Intra-Host-Bridge Interleave` f= or +host bridges). + +Interleave settings in a switch decoder describe how to interleave accesses +among the *immediate downstream targets*, not the entire interleave set. + +Switch decoders are created during :code:`cxl_switch_port_probe` in the +:code:`cxl_port` driver, and is created based on a PCI device's DVSEC +registers. + +Switch decoder programming is validated during probe if the platform progr= ams +them during boot (See `Auto Decoders` below), or on commit if programmed at +runtime (See `Runtime Programming` below). + + +Endpoint Decoder +---------------- +Any decoder attached to a *terminal* point in the CXL fabric (`An Endpoint= `) is +considered an `Endpoint Decoder`. Endpoint decoders are of type +:code:`cxl_decoder_endpoint`. :: + + # ls /sys/bus/cxl/devices/decoder5.0 + devtype locked start + dpa_resource modalias subsystem + dpa_size mode target_type + interleave_granularity region uevent + interleave_ways size + + # cat /sys/bus/cxl/devices/decoder5.0/devtype + cxl_decoder_endpoint + + # cat /sys/bus/cxl/devices/decoder5.0/region + region0 + +An `Endpoint Decoder` has an association with a region defined by a root +decoder and describes the device-local resource associated with this regio= n. + +Unlike root and switch decoders, endpoint decoders translate `Host Physica= l` to +`Device Physical` address ranges. The interleave settings on an endpoint +therefore describe the entire *interleave set*. + +`Device Physical Address` regions must be committed in-order. For example,= the +DPA region starting at 0x80000000 cannot be committed before the DPA region +starting at 0x0. + +As of Linux v6.15, Linux does not support *imbalanced* interleave setups, = all +endpoints in an interleave set are expected to have the same interleave +settings (granularity and ways must be the same). + +Endpoint decoders are created during :code:`cxl_endpoint_port_probe` in the +:code:`cxl_port` driver, and is created based on a PCI device's DVSEC regi= sters. + +Regions +=3D=3D=3D=3D=3D=3D=3D + +Memory Region +------------- +A `Memory Region` is a logical construct that connects a set of CXL ports = in +the fabric to an IO Memory Resource. It is ultimately used to expose the = memory +on these devices to the DAX subsystem via a `DAX Region`. + +An example RAM region: :: + + # ls /sys/bus/cxl/devices/region0/ + access0 devtype modalias subsystem uuid + access1 driver mode target0 + commit interleave_granularity resource target1 + dax_region0 interleave_ways size uevent + +A memory region can be constructed during endpoint probe, if decoders were +programmed by BIOS/EFI (see `Auto Decoders`), or by creating a region manu= ally +via a `Root Decoder`'s :code:`create_ram_region` or :code:`create_pmem_reg= ion` +interfaces. + +The interleave settings in a `Memory Region` describe the configuration of= the +`Interleave Set` - and are what can be expected to be seen in the endpoint +interleave settings. + + +DAX Region +---------- +A `DAX Region` is used to convert a CXL `Memory Region` to a DAX device. A +DAX device may then be accessed directly via a file descriptor interface, = or +converted to System RAM via the DAX kmem driver. See the DAX driver secti= on +for more details. :: + + # ls /sys/bus/cxl/devices/dax_region0/ + dax0.0 devtype modalias uevent + dax_region driver subsystem + + +Mailbox Interfaces +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +A mailbox command interface for each device is exposed in :: + + /dev/cxl/mem0 + /dev/cxl/mem1 + +These mailboxes may receive any specification-defined command. Raw commands +(custom commands) can only be sent to these interfaces if the build config +:code:`CXL_MEM_RAW_COMMANDS` is set. This is considered a debug and/or +development interface, not an officially supported mechanism for creation +of vendor-specific commands (see the `fwctl` subsystem for that). + +Decoder Programming +******************* + +Runtime Programming +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +During probe, the only decoders *required* to be programmed are `Root Deco= ders`. +In reality, `Root Decoders` are a logical construct to describe the memory +region and interleave configuration at the host bridge level - as described +in the ACPI CEDT CFMWS. + +All other `Switch` and `Endpoint` decoders may be programmed by the user +at runtime - if the platform supports such configurations. + +This interaction is what creates a `Software Defined Memory` environment. + +See the :code:`cxl-cli` documentation for more information about how to +configure CXL decoders at runtime. + +Auto Decoders +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Auto Decoders are decoders programmed by BIOS/EFI at boot time, and are +almost always locked (cannot be changed). This is done by a platform +which may have a static configuration - or certain quirks which may prevent +dynamic runtime changes to the decoders (such as requiring additional +controller programming within the CPU complex outside the scope of CXL). + +Auto Decoders are probed automatically as long as the devices and memory +regions they are associated with probe without issue. When probing Auto +Decoders, the driver's primary responsibility is to ensure the fabric is +sane - as-if validating runtime programmed regions and decoders. + +If Linux cannot validate auto-decoder configuration, the memory will not +be surfaced as a DAX device - and therefore not be exposed to the page +allocator - effectively stranding it. + +Interleave +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The Linux CXL driver supports `Cross-Link First` interleave. This dictates +how interleave is programmed at each decoder step, as the driver validates +the relationships between a decoder and it's parent. + +For example, in a `Cross-Link First` interleave setup with 16 endpoints +attached to 4 host bridges, linux expects the following ways/granularity +across the root, host bridge, and endpoints respectively. :: + + ways granularity + root 4 256 + host bridge 4 1024 + endpoint 16 256 + +At the root, every a given access will be routed to the +:code:`((HPA / 256) % 4)th` target host bridge. Within a host bridge, every +:code:`((HPA / 1024) % 4)th` target endpoint. Each endpoint will translate +the access based on the entire 16 device interleave set. + +Unbalanced interleave sets are not supported - decoders at a similar point +in the hierarchy (e.g. all host bridge decoders) must have the same ways a= nd +granularity configuration. + +At Root +------- +Root decoder interleave is defined by the ACPI CEDT CFMWS. The CEDT +may actually define multiple CFMWS configurations to describe the same +physical capacity - with the intent to allow users to decide at runtime +whether to online memory as interleaved or non-interleaved. :: + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Window base address : 0000000100000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + First Target : 00000007 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Window base address : 0000000200000000 + Window size : 0000000100000000 + Interleave Members (2^n) : 00 + Interleave Arithmetic : 00 + First Target : 00000006 + + Subtable Type : 01 [CXL Fixed Memory Window Structure] + Window base address : 0000000300000000 + Window size : 0000000200000000 + Interleave Members (2^n) : 01 + Interleave Arithmetic : 00 + First Target : 00000007 + Next Target : 00000006 + +In this example, the CFMWS defines two discrete non-interleaved 4GB regions +for each host bridge, and one interleaved 8GB region that targets both. Th= is +would result in 3 root decoders presenting in the root. :: + + # ls /sys/bus/cxl/devices/root0 + decoder0.0 decoder0.1 decoder0.2 + + # cat /sys/bus/cxl/devices/decoder0.0/target_list start size + 7 + 0x100000000 + 0x100000000 + + # cat /sys/bus/cxl/devices/decoder0.1/target_list start size + 6 + 0x200000000 + 0x100000000 + + # cat /sys/bus/cxl/devices/decoder0.2/target_list start size + 7,6 + 0x300000000 + 0x200000000 + +These decoders are not runtime programmable. They are used to generate a +`Memory Region` to bring this memory online with runtime programmed settin= gs +at the `Switch` and `Endpoint` decoders. + +At Host Bridge or Switch +------------------------ +`Host Bridge` and `Switch` decoders are programmable via the following fie= lds: + +- :code:`start` - the HPA region associated with the memory region +- :code:`size` - the size of the region +- :code:`target_list` - the list of downstream ports +- :code:`interleave_ways` - the number downstream ports to interleave acro= ss +- :code:`interleave_granularity` - the granularity to interleave at. + +Linux expects the :code:`interleave_granularity` of switch decoders to be +derived from their upstream port connections. In `Cross-Link First` interl= eave +configurations, the :code:`interleave_granularity` of a decoder is equal to +:code:`parent_interleave_granularity * parent_interleave_ways`. + +At Endpoint +----------- +`Endpoint Decoders` are programmed similar to Host Bridge and Switch decod= ers, +with the exception that the ways and granularity are defined by the interl= eave +set (e.g. the interleave settings defined by the associated `Memory Region= `). + +- :code:`start` - the HPA region associated with the memory region +- :code:`size` - the size of the region +- :code:`interleave_ways` - the number endpoints in the interleave set +- :code:`interleave_granularity` - the granularity to interleave at. + +These settings are used by endpoint decoders to *Translate* memory requests +from HPA to DPA. This is why they must be aware of the entire interleave = set. + +Linux does not support unbalanced interleave configurations. As a result,= all +endpoints in an interleave set must have the same ways and granularity. --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 088F31C84BC for ; Wed, 30 Apr 2025 00:12:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971975; cv=none; b=fmasPjmxnTKBOvQ2EjQqo7rhm4Jp/UyS9VMnzM3SCr83cjs5SnefANB6T6K4GQOrGnfz/ttZsLCUQPV3d2yFuMUtDMht/Uy7YISGH7nnPxpP9GLYOLZoSwIvB52tM1qMkxrRpufvuJS+Pkz52isI3Md+hSELA4xw7pRBiEvpG8U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971975; c=relaxed/simple; bh=kdclmYpDFWGGn3eZYre2+mmmf45MD+ex0OVX9HJCjLs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PQldqcSve1F2E7twjChyzAFza61nlbL57fUfAp2y3TrMT7geXX8LJL4AjDCTUGkiDP65X9k5r5DrWww75bkcmIjm82DfLSlcVL40+Nztm2pylzgggOmov0EwZIVyslXHZkwOMEAj+2Z76Uu/KmGs3f/1vxAXn6S/aKqZU+Jh5cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=EL6eMbKk; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="EL6eMbKk" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-47ae894e9b7so143949691cf.3 for ; Tue, 29 Apr 2025 17:12:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971971; x=1746576771; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RPdD/fpXbxGAsaOBYn9Y1QDHEAxNFnUh1p4EK+EoUGU=; b=EL6eMbKkw34XtyJps3gKvuglKnujXUo+0pEoRR8H0bGqruJ6r/iQIQBF4t1mDxA45j 737g12DEHPhknP35N6qRP9x2k+MeaHKiZRQhfrgUITT4LNuBL6suYrJdKLTyQCSbcppa vAdfDsGs9T89PKvnZk6awa3+YBVrfAD04W8lvllMtnDst58OJGwHTGcTxzr7kwZwYDEl D+2WnZFlj7o4+9e+mLBWO4SJGXuWq9gY2DrzKDbUH7iHRlzsmZ/CWj9u2okkJjBC7Nke O4u70LIOh4cDjhMcqlEM08VObK5dtTpcO5tEwa207nmmdwKNYtDn8PaHx7OdIE7c2Src nypg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971971; x=1746576771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RPdD/fpXbxGAsaOBYn9Y1QDHEAxNFnUh1p4EK+EoUGU=; b=S5pexQgyuUyZAPxK0vymtObViiqUV6tXGe2gETYDjWRKIxMER1VCWxHFXuKBoLbOJ4 fBCDwoQUo/tZu+kdAaedNK1We4taidA/w6Zk84AB7GGwt1uJ8uj+WK82dMcT3YfAxcjM Xu8iaEr1U4f+LZgmL80OzAoIZFkqtosFQqa6sH/y+jtZwcbQlnQbO8JRYzVo1IN7VFT7 JWPmgPi1k+Ybqt+fdsFL9BQZXOw9mJwkyV0eGLVVIjDLiDtNTTU7kGKNXUCISCI00eKa 4KaxLNJxnV8XLMw2uUXGFyShWUxJqwo5oRinqvVDFJClC4gMFhOR9/Kq5OtaAgWvlmAB BgXA== X-Forwarded-Encrypted: i=1; AJvYcCUTqeaFZ6Ov8szRKg4/4lmhj0KwQgC4v94mEpT4LPLefdFWTi4LzCom1Z39Tlby56xNasWIl0dg6aZ888I=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8RqOoRHqoOHZttXFNZTI+NFcRza8QDux3mYomTIJv7kjX31iM xu8E5fLMD6J+yH1q/QPiGpTnytqnPhBVr4awSUARKq19GwXPuWVxB9boQLV1Tlc= X-Gm-Gg: ASbGnctI8+jzhcFLVpME1NQosWptTIPykfSRdcDCq7s1tYWuT483AW5uOwUMkzXXaKW IxQqS6+aVazNwAjrlc0DsQN049c7CdsC/z2mVhSkbSLIAB7+zF8wt7+KiYNPcOO85RlRhBzpUsr aOy00X6folLr8M27RRb9jWs7svhHn6sPRsXVoMxlT45za/d5UozpXNjTj2tFXELxRhct3MW7Y1i +Rs/P7SDZiGLpc5dAyJ/xMD0rlbcgPVmdFB69rIAII2DJ1m6W7uKTRAXN7lzWjmFcTKaKiYvY1u BRV/eIic8VknTW0spbE0/tRSadsaNXhXMe6MMjGWYwuIFAzl3L5m4q2CsL4cgb+XVz40WFQysnD spvUEHnv1uv6N8DSQBv3RmtcTeiqI X-Google-Smtp-Source: AGHT+IGVPT55HBTmyvi+Hih/wuL6I+RAeHfsa1acj3FLMJPwEb+oEnFnvVEOjWXn5Lq5n5Altz/ZnQ== X-Received: by 2002:a05:622a:544e:b0:47a:e81b:ccb9 with SMTP id d75a77b69052e-489e44b5c5dmr10097551cf.2.1745971970614; Tue, 29 Apr 2025 17:12:50 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:50 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 09/17] cxl: docs/linux/cxl-driver - add example configurations Date: Tue, 29 Apr 2025 20:12:16 -0400 Message-ID: <20250430001224.1028656-10-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add 4 example configurations: - single device - cross-host-bridge interleave - intra-host-bridge-interleave - multi-level interleave Signed-off-by: Gregory Price --- .../driver-api/cxl/linux/cxl-driver.rst | 10 + .../example-configurations/hb-interleave.rst | 314 ++++++++++++++ .../intra-hb-interleave.rst | 291 +++++++++++++ .../multi-interleave.rst | 401 ++++++++++++++++++ .../example-configurations/single-device.rst | 246 +++++++++++ 5 files changed, 1262 insertions(+) create mode 100644 Documentation/driver-api/cxl/linux/example-configuratio= ns/hb-interleave.rst create mode 100644 Documentation/driver-api/cxl/linux/example-configuratio= ns/intra-hb-interleave.rst create mode 100644 Documentation/driver-api/cxl/linux/example-configuratio= ns/multi-interleave.rst create mode 100644 Documentation/driver-api/cxl/linux/example-configuratio= ns/single-device.rst diff --git a/Documentation/driver-api/cxl/linux/cxl-driver.rst b/Documentat= ion/driver-api/cxl/linux/cxl-driver.rst index f403804648b1..8d586b85d346 100644 --- a/Documentation/driver-api/cxl/linux/cxl-driver.rst +++ b/Documentation/driver-api/cxl/linux/cxl-driver.rst @@ -519,3 +519,13 @@ from HPA to DPA. This is why they must be aware of th= e entire interleave set. =20 Linux does not support unbalanced interleave configurations. As a result,= all endpoints in an interleave set must have the same ways and granularity. + +Example Configurations +********************** +.. toctree:: + :maxdepth: 1 + + example-configurations/single-device.rst + example-configurations/hb-interleave.rst + example-configurations/intra-hb-interleave.rst + example-configurations/multi-interleave.rst diff --git a/Documentation/driver-api/cxl/linux/example-configurations/hb-i= nterleave.rst b/Documentation/driver-api/cxl/linux/example-configurations/h= b-interleave.rst new file mode 100644 index 000000000000..f071490763a2 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/example-configurations/hb-interlea= ve.rst @@ -0,0 +1,314 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +Inter-Host-Bridge Interleave +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +This cxl-cli configuration dump shows the following host configuration: + +* A single socket system with one CXL root +* CXL Root has Four (4) CXL Host Bridges +* Two CXL Host Bridges have a single CXL Memory Expander Attached +* The CXL root is configured to interleave across the two host bridges. + +This output is generated by :code:`cxl list -v` and describes the relation= ships +between objects exposed in :code:`/sys/bus/cxl/devices/`. + +:: + + [ + { + "bus":"root0", + "provider":"ACPI.CXL", + "nr_dports":4, + "dports":[ + { + "dport":"pci0000:00", + "alias":"ACPI0016:01", + "id":0 + }, + { + "dport":"pci0000:a8", + "alias":"ACPI0016:02", + "id":4 + }, + { + "dport":"pci0000:2a", + "alias":"ACPI0016:03", + "id":1 + }, + { + "dport":"pci0000:d2", + "alias":"ACPI0016:00", + "id":5 + } + ], + +This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to = CXL +Host Bridges. The `Root` can be considered the singular upstream port att= ached +to the platform's memory controller - which routes memory requests to it. + +The `ports:root0` section lays out how each of these downstream ports are +configured. If a port is not configured (id's 0 and 1), they are omitted. + +:: + + "ports:root0":[ + { + "port":"port1", + "host":"pci0000:d2", + "depth":1, + "nr_dports":3, + "dports":[ + { + "dport":"0000:d2:01.1", + "alias":"device:02", + "id":0 + }, + { + "dport":"0000:d2:01.3", + "alias":"device:05", + "id":2 + }, + { + "dport":"0000:d2:07.1", + "alias":"device:0d", + "id":113 + } + ], + +This chunk shows the available downstream ports associated with the CXL Ho= st +Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstr= eam +ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. + +:: + + "endpoints:port1":[ + { + "endpoint":"endpoint5", + "host":"mem0", + "parent_dport":"0000:d2:01.1", + "depth":2, + "memdev":{ + "memdev":"mem0", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:d3:00.0" + }, + "decoders:endpoint5":[ + { + "decoder":"decoder5.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + +This chunk shows the endpoints attached to the host bridge :code:`port1`. + +:code:`endpoint5` contains a single configured decoder :code:`decoder5.0` +which has the same interleave configuration as :code:`region0` (shown late= r). + +Next we have the decodesr belonging to the host bridge: + +:: + + "decoders:port1":[ + { + "decoder":"decoder1.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":1, + "region":"region0", + "nr_targets":1, + "targets":[ + { + "target":"0000:d2:01.1", + "alias":"device:02", + "position":0, + "id":0 + } + ] + } + ] + }, + +Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose= only +target is :code:`dport1` - which is attached to :code:`endpoint5`. + +The following chunk shows a similar configuration for Host Bridge :code:`p= ort3`, +the second host bridge with a memory device attached. + +:: + + { + "port":"port3", + "host":"pci0000:a8", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:a8:01.1", + "alias":"device:c3", + "id":0 + } + ], + "endpoints:port3":[ + { + "endpoint":"endpoint6", + "host":"mem1", + "parent_dport":"0000:a8:01.1", + "depth":2, + "memdev":{ + "memdev":"mem1", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:a9:00.0" + }, + "decoders:endpoint6":[ + { + "decoder":"decoder6.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + "decoders:port3":[ + { + "decoder":"decoder3.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":1, + "region":"region0", + "nr_targets":1, + "targets":[ + { + "target":"0000:a8:01.1", + "alias":"device:c3", + "position":0, + "id":0 + } + ] + } + ] + }, + + +The next chunk shows the two CXL host bridges without attached endpoints. + +:: + + { + "port":"port2", + "host":"pci0000:00", + "depth":1, + "nr_dports":2, + "dports":[ + { + "dport":"0000:00:01.3", + "alias":"device:55", + "id":2 + }, + { + "dport":"0000:00:07.1", + "alias":"device:5d", + "id":113 + } + ] + }, + { + "port":"port4", + "host":"pci0000:2a", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:2a:01.1", + "alias":"device:d0", + "id":0 + } + ] + } + ], + +Next we have the `Root Decoders` belonging to :code:`root0`. This root de= coder +applies the interleave across the downstream ports :code:`port1` and +:code:`port3` - with a granularity of 256 bytes. + +This information is generated by the CXL driver reading the ACPI CEDT CMFW= S. + +:: + + "decoders:root0":[ + { + "decoder":"decoder0.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "max_available_extent":0, + "volatile_capable":true, + "nr_targets":2, + "targets":[ + { + "target":"pci0000:a8", + "alias":"ACPI0016:02", + "position":1, + "id":4 + }, + { + "target":"pci0000:d2", + "alias":"ACPI0016:00", + "position":0, + "id":5 + } + ], + +Finally we have the `Memory Region` associated with the `Root Decoder` +:code:`decoder0.0`. This region describes the overall interleave configur= ation +of the interleave set. + +:: + + "regions:decoder0.0":[ + { + "region":"region0", + "resource":825975898112, + "size":274877906944, + "type":"ram", + "interleave_ways":2, + "interleave_granularity":256, + "decode_state":"commit", + "mappings":[ + { + "position":1, + "memdev":"mem1", + "decoder":"decoder6.0" + }, + { + "position":0, + "memdev":"mem0", + "decoder":"decoder5.0" + } + ] + } + ] + } + ] + } + ] diff --git a/Documentation/driver-api/cxl/linux/example-configurations/intr= a-hb-interleave.rst b/Documentation/driver-api/cxl/linux/example-configurat= ions/intra-hb-interleave.rst new file mode 100644 index 000000000000..077dfaf8458d --- /dev/null +++ b/Documentation/driver-api/cxl/linux/example-configurations/intra-hb-in= terleave.rst @@ -0,0 +1,291 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +Intra-Host-Bridge Interleave +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D +This cxl-cli configuration dump shows the following host configuration: + +* A single socket system with one CXL root +* CXL Root has Four (4) CXL Host Bridges +* One (1) CXL Host Bridges has two CXL Memory Expanders Attached +* The Host bridge decoder is programmed to interleave across the expanders. + +This output is generated by :code:`cxl list -v` and describes the relation= ships +between objects exposed in :code:`/sys/bus/cxl/devices/`. + +:: + + [ + { + "bus":"root0", + "provider":"ACPI.CXL", + "nr_dports":4, + "dports":[ + { + "dport":"pci0000:00", + "alias":"ACPI0016:01", + "id":0 + }, + { + "dport":"pci0000:a8", + "alias":"ACPI0016:02", + "id":4 + }, + { + "dport":"pci0000:2a", + "alias":"ACPI0016:03", + "id":1 + }, + { + "dport":"pci0000:d2", + "alias":"ACPI0016:00", + "id":5 + } + ], + +This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to = CXL +Host Bridges. The `Root` can be considered the singular upstream port att= ached +to the platform's memory controller - which routes memory requests to it. + +The `ports:root0` section lays out how each of these downstream ports are +configured. If a port is not configured (id's 0 and 1), they are omitted. + +:: + + "ports:root0":[ + { + "port":"port1", + "host":"pci0000:d2", + "depth":1, + "nr_dports":3, + "dports":[ + { + "dport":"0000:d2:01.1", + "alias":"device:02", + "id":0 + }, + { + "dport":"0000:d2:01.3", + "alias":"device:05", + "id":2 + }, + { + "dport":"0000:d2:07.1", + "alias":"device:0d", + "id":113 + } + ], + +This chunk shows the available downstream ports associated with the CXL Ho= st +Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstr= eam +ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. + +:: + + "endpoints:port1":[ + { + "endpoint":"endpoint5", + "host":"mem0", + "parent_dport":"0000:d2:01.1", + "depth":2, + "memdev":{ + "memdev":"mem0", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:d3:00.0" + }, + "decoders:endpoint5":[ + { + "decoder":"decoder5.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + }, + { + "endpoint":"endpoint6", + "host":"mem1", + "parent_dport":"0000:d2:01.3, + "depth":2, + "memdev":{ + "memdev":"mem1", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:a9:00.0" + }, + "decoders:endpoint6":[ + { + "decoder":"decoder6.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + +This chunk shows the endpoints attached to the host bridge :code:`port1`. + +:code:`endpoint5` contains a single configured decoder :code:`decoder5.0` +which has the same interleave configuration memory region they belong to +(show later). + +Next we have the decoders belonging to the host bridge: + +:: + + "decoders:port1":[ + { + "decoder":"decoder1.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":2, + "interleave_granularity":256, + "region":"region0", + "nr_targets":2, + "targets":[ + { + "target":"0000:d2:01.1", + "alias":"device:02", + "position":0, + "id":0 + }, + { + "target":"0000:d2:01.3", + "alias":"device:05", + "position":1, + "id":0 + } + ] + } + ] + }, + +Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`) with t= wo +targets: :code:`dport1` and :code:`dport3` - which are attached to +:code:`endpoint5` and :code:`endpoint6` respectively. + +The host bridge decoder interleaves these devices at a 256 byte granularit= y. + +The next chunk shows the three CXL host bridges without attached endpoints. + +:: + + { + "port":"port2", + "host":"pci0000:00", + "depth":1, + "nr_dports":2, + "dports":[ + { + "dport":"0000:00:01.3", + "alias":"device:55", + "id":2 + }, + { + "dport":"0000:00:07.1", + "alias":"device:5d", + "id":113 + } + ] + }, + { + "port":"port3", + "host":"pci0000:a8", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:a8:01.1", + "alias":"device:c3", + "id":0 + } + ], + }, + { + "port":"port4", + "host":"pci0000:2a", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:2a:01.1", + "alias":"device:d0", + "id":0 + } + ] + } + ], + +Next we have the `Root Decoders` belonging to :code:`root0`. This root de= coder +applies the interleave across the downstream ports :code:`port1` and +:code:`port3` - with a granularity of 256 bytes. + +This information is generated by the CXL driver reading the ACPI CEDT CMFW= S. + +:: + + "decoders:root0":[ + { + "decoder":"decoder0.0", + "resource":825975898112, + "size":274877906944, + "interleave_ways":1, + "max_available_extent":0, + "volatile_capable":true, + "nr_targets":2, + "targets":[ + { + "target":"pci0000:a8", + "alias":"ACPI0016:02", + "position":1, + "id":4 + }, + ], + +Finally we have the `Memory Region` associated with the `Root Decoder` +:code:`decoder0.0`. This region describes the overall interleave configur= ation +of the interleave set. + +:: + + "regions:decoder0.0":[ + { + "region":"region0", + "resource":825975898112, + "size":274877906944, + "type":"ram", + "interleave_ways":2, + "interleave_granularity":256, + "decode_state":"commit", + "mappings":[ + { + "position":1, + "memdev":"mem1", + "decoder":"decoder6.0" + }, + { + "position":0, + "memdev":"mem0", + "decoder":"decoder5.0" + } + ] + } + ] + } + ] + } + ] diff --git a/Documentation/driver-api/cxl/linux/example-configurations/mult= i-interleave.rst b/Documentation/driver-api/cxl/linux/example-configuration= s/multi-interleave.rst new file mode 100644 index 000000000000..008f9053c630 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/example-configurations/multi-inter= leave.rst @@ -0,0 +1,401 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Multi-Level Interleave +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +This cxl-cli configuration dump shows the following host configuration: + +* A single socket system with one CXL root +* CXL Root has Four (4) CXL Host Bridges +* Two CXL Host Bridges have a two CXL Memory Expanders Attached each. +* The CXL root is configured to interleave across the two host bridges. +* Each host bridge with expanders interleaves across two endpoints. + +This output is generated by :code:`cxl list -v` and describes the relation= ships +between objects exposed in :code:`/sys/bus/cxl/devices/`. + +:: + + [ + { + "bus":"root0", + "provider":"ACPI.CXL", + "nr_dports":4, + "dports":[ + { + "dport":"pci0000:00", + "alias":"ACPI0016:01", + "id":0 + }, + { + "dport":"pci0000:a8", + "alias":"ACPI0016:02", + "id":4 + }, + { + "dport":"pci0000:2a", + "alias":"ACPI0016:03", + "id":1 + }, + { + "dport":"pci0000:d2", + "alias":"ACPI0016:00", + "id":5 + } + ], + +This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to = CXL +Host Bridges. The `Root` can be considered the singular upstream port att= ached +to the platform's memory controller - which routes memory requests to it. + +The `ports:root0` section lays out how each of these downstream ports are +configured. If a port is not configured (id's 0 and 1), they are omitted. + +:: + + "ports:root0":[ + { + "port":"port1", + "host":"pci0000:d2", + "depth":1, + "nr_dports":3, + "dports":[ + { + "dport":"0000:d2:01.1", + "alias":"device:02", + "id":0 + }, + { + "dport":"0000:d2:01.3", + "alias":"device:05", + "id":2 + }, + { + "dport":"0000:d2:07.1", + "alias":"device:0d", + "id":113 + } + ], + +This chunk shows the available downstream ports associated with the CXL Ho= st +Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstr= eam +ports: :code:`dport0`, :code:`dport2`, and :code:`dport113`. + +:: + + "endpoints:port1":[ + { + "endpoint":"endpoint5", + "host":"mem0", + "parent_dport":"0000:d2:01.1", + "depth":2, + "memdev":{ + "memdev":"mem0", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:d3:00.0" + }, + "decoders:endpoint5":[ + { + "decoder":"decoder5.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":4, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + }, + { + "endpoint":"endpoint6", + "host":"mem1", + "parent_dport":"0000:d2:01.3", + "depth":2, + "memdev":{ + "memdev":"mem1", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:d3:00.0" + }, + "decoders:endpoint6":[ + { + "decoder":"decoder6.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":4, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + +This chunk shows the endpoints attached to the host bridge :code:`port1`. + +:code:`endpoint5` contains a single configured decoder :code:`decoder5.0` +which has the same interleave configuration as :code:`region0` (shown late= r). + +:code:`endpoint6` contains a single configured decoder :code:`decoder5.0` +which has the same interleave configuration as :code:`region0` (shown late= r). + +Next we have the decoders belonging to the host bridge: + +:: + + "decoders:port1":[ + { + "decoder":"decoder1.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":2, + "interleave_granularity":512, + "region":"region0", + "nr_targets":2, + "targets":[ + { + "target":"0000:d2:01.1", + "alias":"device:02", + "position":0, + "id":0 + }, + { + "target":"0000:d2:01.3", + "alias":"device:05", + "position":2, + "id":0 + } + ] + } + ] + }, + +Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose +targets are :code:`dport0` and :code:`dport2` - which are attached to +:code:`endpoint5` and :code:`endpoint6` respectively. + +The following chunk shows a similar configuration for Host Bridge :code:`p= ort3`, +the second host bridge with a memory device attached. + +:: + + { + "port":"port3", + "host":"pci0000:a8", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:a8:01.1", + "alias":"device:c3", + "id":0 + }, + { + "dport":"0000:a8:01.3", + "alias":"device:c5", + "id":0 + } + ], + "endpoints:port3":[ + { + "endpoint":"endpoint7", + "host":"mem2", + "parent_dport":"0000:a8:01.1", + "depth":2, + "memdev":{ + "memdev":"mem2", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:a9:00.0" + }, + "decoders:endpoint7":[ + { + "decoder":"decoder7.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":4, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + }, + { + "endpoint":"endpoint8", + "host":"mem3", + "parent_dport":"0000:a8:01.3", + "depth":2, + "memdev":{ + "memdev":"mem3", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:a9:00.0" + }, + "decoders:endpoint8":[ + { + "decoder":"decoder8.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":4, + "interleave_granularity":256, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + "decoders:port3":[ + { + "decoder":"decoder3.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":2, + "interleave_granularity":512, + "region":"region0", + "nr_targets":1, + "targets":[ + { + "target":"0000:a8:01.1", + "alias":"device:c3", + "position":1, + "id":0 + }, + { + "target":"0000:a8:01.3", + "alias":"device:c5", + "position":3, + "id":0 + } + ] + } + ] + }, + + +The next chunk shows the two CXL host bridges without attached endpoints. + +:: + + { + "port":"port2", + "host":"pci0000:00", + "depth":1, + "nr_dports":2, + "dports":[ + { + "dport":"0000:00:01.3", + "alias":"device:55", + "id":2 + }, + { + "dport":"0000:00:07.1", + "alias":"device:5d", + "id":113 + } + ] + }, + { + "port":"port4", + "host":"pci0000:2a", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:2a:01.1", + "alias":"device:d0", + "id":0 + } + ] + } + ], + +Next we have the `Root Decoders` belonging to :code:`root0`. This root de= coder +applies the interleave across the downstream ports :code:`port1` and +:code:`port3` - with a granularity of 256 bytes. + +This information is generated by the CXL driver reading the ACPI CEDT CMFW= S. + +:: + + "decoders:root0":[ + { + "decoder":"decoder0.0", + "resource":825975898112, + "size":549755813888, + "interleave_ways":2, + "interleave_granularity":256, + "max_available_extent":0, + "volatile_capable":true, + "nr_targets":2, + "targets":[ + { + "target":"pci0000:a8", + "alias":"ACPI0016:02", + "position":1, + "id":4 + }, + { + "target":"pci0000:d2", + "alias":"ACPI0016:00", + "position":0, + "id":5 + } + ], + +Finally we have the `Memory Region` associated with the `Root Decoder` +:code:`decoder0.0`. This region describes the overall interleave configur= ation +of the interleave set. So we see there are a total of :code:`4` interleave +targets across 4 endpoint decoders. + +:: + + "regions:decoder0.0":[ + { + "region":"region0", + "resource":825975898112, + "size":549755813888, + "type":"ram", + "interleave_ways":4, + "interleave_granularity":256, + "decode_state":"commit", + "mappings":[ + { + "position":3, + "memdev":"mem3", + "decoder":"decoder8.0" + }, + { + "position":2, + "memdev":"mem1", + "decoder":"decoder6.0" + } + { + "position":1, + "memdev":"mem2", + "decoder":"decoder7.0" + }, + { + "position":0, + "memdev":"mem0", + "decoder":"decoder5.0" + } + ] + } + ] + } + ] + } + ] diff --git a/Documentation/driver-api/cxl/linux/example-configurations/sing= le-device.rst b/Documentation/driver-api/cxl/linux/example-configurations/s= ingle-device.rst new file mode 100644 index 000000000000..5fd38eb0aaf4 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/example-configurations/single-devi= ce.rst @@ -0,0 +1,246 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Single Device +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +This cxl-cli configuration dump shows the following host configuration: + +* A single socket system with one CXL root +* CXL Root has Four (4) CXL Host Bridges +* One CXL Host Bridges has a single CXL Memory Expander Attached +* No interleave is present. + +This output is generated by :code:`cxl list -v` and describes the relation= ships +between objects exposed in :code:`/sys/bus/cxl/devices/`. + +:: + + [ + { + "bus":"root0", + "provider":"ACPI.CXL", + "nr_dports":4, + "dports":[ + { + "dport":"pci0000:00", + "alias":"ACPI0016:01", + "id":0 + }, + { + "dport":"pci0000:a8", + "alias":"ACPI0016:02", + "id":4 + }, + { + "dport":"pci0000:2a", + "alias":"ACPI0016:03", + "id":1 + }, + { + "dport":"pci0000:d2", + "alias":"ACPI0016:00", + "id":5 + } + ], + +This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to = CXL +Host Bridges. The `Root` can be considered the singular upstream port att= ached +to the platform's memory controller - which routes memory requests to it. + +The `ports:root0` section lays out how each of these downstream ports are +configured. If a port is not configured (id's 0, 1, and 4), they are omit= ted. + +:: + + "ports:root0":[ + { + "port":"port1", + "host":"pci0000:d2", + "depth":1, + "nr_dports":3, + "dports":[ + { + "dport":"0000:d2:01.1", + "alias":"device:02", + "id":0 + }, + { + "dport":"0000:d2:01.3", + "alias":"device:05", + "id":2 + }, + { + "dport":"0000:d2:07.1", + "alias":"device:0d", + "id":113 + } + ], + +This chunk shows the available downstream ports associated with the CXL Ho= st +Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstr= eam +ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. + +:: + + "endpoints:port1":[ + { + "endpoint":"endpoint5", + "host":"mem0", + "parent_dport":"0000:d2:01.1", + "depth":2, + "memdev":{ + "memdev":"mem0", + "ram_size":137438953472, + "serial":0, + "numa_node":0, + "host":"0000:d3:00.0" + }, + "decoders:endpoint5":[ + { + "decoder":"decoder5.0", + "resource":825975898112, + "size":137438953472, + "interleave_ways":1, + "region":"region0", + "dpa_resource":0, + "dpa_size":137438953472, + "mode":"ram" + } + ] + } + ], + +This chunk shows the endpoints attached to the host bridge :code:`port1`. + +:code:`endpoint5` contains a single configured decoder :code:`decoder5.0` +which has the same interleave configuration as :code:`region0` (shown late= r). + +Next we have the decoders belonging to the host bridge: + +:: + + "decoders:port1":[ + { + "decoder":"decoder1.0", + "resource":825975898112, + "size":137438953472, + "interleave_ways":1, + "region":"region0", + "nr_targets":1, + "targets":[ + { + "target":"0000:d2:01.1", + "alias":"device:02", + "position":0, + "id":0 + } + ] + } + ] + }, + +Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose= only +target is :code:`dport1` - which is attached to :code:`endpoint5`. + +The next chunk shows the three CXL host bridges without attached endpoints. + +:: + + { + "port":"port2", + "host":"pci0000:00", + "depth":1, + "nr_dports":2, + "dports":[ + { + "dport":"0000:00:01.3", + "alias":"device:55", + "id":2 + }, + { + "dport":"0000:00:07.1", + "alias":"device:5d", + "id":113 + } + ] + }, + { + "port":"port3", + "host":"pci0000:a8", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:a8:01.1", + "alias":"device:c3", + "id":0 + } + ] + }, + { + "port":"port4", + "host":"pci0000:2a", + "depth":1, + "nr_dports":1, + "dports":[ + { + "dport":"0000:2a:01.1", + "alias":"device:d0", + "id":0 + } + ] + } + ], + +Next we have the `Root Decoders` belonging to :code:`root0`. This root de= coder +is a pass-through decoder because :code:`interleave_ways` is set to :code:= `1`. + +This information is generated by the CXL driver reading the ACPI CEDT CMFW= S. + +:: + + "decoders:root0":[ + { + "decoder":"decoder0.0", + "resource":825975898112, + "size":137438953472, + "interleave_ways":1, + "max_available_extent":0, + "volatile_capable":true, + "nr_targets":1, + "targets":[ + { + "target":"pci0000:d2", + "alias":"ACPI0016:00", + "position":0, + "id":5 + } + ], + +Finally we have the `Memory Region` associated with the `Root Decoder` +:code:`decoder0.0`. This region describes the discrete region associated +with the lone device. + +:: + + "regions:decoder0.0":[ + { + "region":"region0", + "resource":825975898112, + "size":137438953472, + "type":"ram", + "interleave_ways":1, + "decode_state":"commit", + "mappings":[ + { + "position":0, + "memdev":"mem0", + "decoder":"decoder5.0" + } + ] + } + ] + } + ] + } + ] --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD5201CDFCE for ; Wed, 30 Apr 2025 00:12:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971975; cv=none; b=XyIS6us9bIOATWNVeTOgPsiC5xYTQULVmN9qk1c/35D8SXn1T4EKOmVzVlMN9VvawJ4SeCK7gq075qk0yNC+LfZLq7pIlv8SPA9gtQ1LAqUhAJhRj4Ygc/E46LjUvBhuxIlfK/+VcajqCdWymO8gdvKAB+lr6WhARSlySo+Yk98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971975; c=relaxed/simple; bh=TjuLcwK+M6LLMZ5Wff/rzBDXZJiMEDSOuoY3CLkuI+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rv/reaxm9/eR+IjO8l4pzVQDoxjnKlYBIJ8WHgYqhn5oiFtqb8THjI9Cd5QIeR8VNqdizcyWGsH3vsMUE3Kmj/O1E6Fmt+2d8pISsx1oIfQxgRJ80LrTKQlfFPITr+1dLvMzfTjciwUgfG/AGZsmzu8ylvHDFv6/Pz7CwJXH73o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=HRKo3CI/; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="HRKo3CI/" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-47677b77725so85865771cf.3 for ; Tue, 29 Apr 2025 17:12:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971973; x=1746576773; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=48IBIQBS5H0eCikVfx3jlLOYBAjND5JFsqhmRr8kgFs=; b=HRKo3CI/VTUikbT0AAmZ8xINJDyGeKvzQbecVnhyh0c6mGsC7MVd0JKmGKVhcfTlzW zHOzBn1EV/g/oP+CNUl0NgwR1FrZGPYeFOdT07OqObUZINn/IaLrCVZQ9eDoguhst2Lb 8rTT/OaXY/Uw0XyF0G/4TcwFC8HGQcZnezOj5F+9rhWuMmYOuZU2vUDFGfawdWad7SFg sGCtOBeZLhZgHHqF1Pe1LqiJvwj9vgRtTUexbe4UOC0Ew1rokg9USGwLZPI7JJf4DvV7 jnEz3O548XsEMWKcZnUvIqUJHGIYM7SbIRcWAj8NJLbvickevwL3/3/dtDhBVwM11/Z+ JetQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971973; x=1746576773; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=48IBIQBS5H0eCikVfx3jlLOYBAjND5JFsqhmRr8kgFs=; b=R+/X+pYLU5nLnnG4CgR8Bfhw/t8GjtcrNaFoFTZUOxdON3FtNzWuVhEJZUA2EVjOo0 zuCUYqaEdir79F8NhwXDsz4J0LLVUtVAsE/jlqUP067QaYrngfjiDGVjSPNJPd67nt4G x+8hEhdsATCbN5fW+pO9wWHny7aAYM6kqpM/XQVjIH9DS56xNdlh1B1SS7FMvcqBDFNf 0SCofqQ2Djvr9ybLqw4M9iAF+1jBum8vDYHLXlTkultmoagQ7hG+HoOWQYaeGns9fQE2 uO2jjm8Sovq40Tr67wich/3epc/iGKWljCNNZrQuWDXuEx9kj9s/M3NfVK4eBFDaUsv+ zPog== X-Forwarded-Encrypted: i=1; AJvYcCXPFcnGpY36yKXmZK2nh+Zv7fVRJe+0mXrNHntac+pSG3z+6syN0X2suS19hOIsEyl/bkmeEYIw4aURFSU=@vger.kernel.org X-Gm-Message-State: AOJu0YwAHcgwRtvsrL0whBqj4Y8EHq1qz2x5QZkAvqzVOPVH9qqE92xA ZF2CJdT8y/i9yWJ1t+FQ3cOm7jJFPRndbJQwE50XbjTYvnIOsQCDd0Yk/d1Idrs= X-Gm-Gg: ASbGnctbqBjT+N6pG3upHSLCsHmPv+dJgoxi/atWH5NXuhriUrbg4RJ6TLa4PVM4h3/ FgXWKt3NUKLjIurzkfE8fIXzYzyKGm++5HDu64AD8ZKFng3/4sXFsJ/wPu3RATkIlwvGpK+ypFx 0RH8r1ay3cCFgb0Ndaxt1hi+F8C+3WXIcNd3TqdZ0mUic3IkwH9ocBZHBKBZ+Aeqw3LrfaAKkd0 /MU0mPXLiNxmzGKoJ9Kv32a/FezHmwQR02kC5J1uWWvgq4C7pjlwp1sKFyYC1KVg85VeH1pNirC zuTA5DgtPsqQGBqaopBWH4MrnenknxcRl/0zNf2iG8PmN/qr5RKcZ2qQBCWhMP+jtJMfOao9OY5 NALsr4rq5dSJMPY5bWy0Rz2A7E/1edYx6diaZK6c= X-Google-Smtp-Source: AGHT+IFzgkOauRvlDmL11rMMrtVN8v6NuHdseJxsSrQzs6ks2JKTqOLcs6++Jr/SBAfN9hT/VjYF6w== X-Received: by 2002:a05:622a:244e:b0:476:b7e2:385c with SMTP id d75a77b69052e-489c38aa1fbmr17532591cf.2.1745971972868; Tue, 29 Apr 2025 17:12:52 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:52 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 10/17] cxl: docs/linux/dax-driver documentation Date: Tue, 29 Apr 2025 20:12:17 -0400 Message-ID: <20250430001224.1028656-11-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add documentation on how the CXL driver interacts with the DAX driver. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../driver-api/cxl/linux/dax-driver.rst | 42 +++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 Documentation/driver-api/cxl/linux/dax-driver.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index b915ce982048..bfaf0e2ebfc0 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -38,6 +38,7 @@ that have impacts on each other. The docs here break up = configurations steps. linux/overview linux/early-boot linux/cxl-driver + linux/dax-driver linux/access-coordinates =20 =20 diff --git a/Documentation/driver-api/cxl/linux/dax-driver.rst b/Documentat= ion/driver-api/cxl/linux/dax-driver.rst new file mode 100644 index 000000000000..56addd5fa71e --- /dev/null +++ b/Documentation/driver-api/cxl/linux/dax-driver.rst @@ -0,0 +1,42 @@ +.. SPDX-License-Identifier: GPL-2.0 + +DAX Driver Operation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +The `Direct Access Device` driver was originally designed to provide a +memory-like access mechanism to memory-like block-devices. It was +extended to support CXL Memory Devices, which provide user-configured +memory devices. + +The CXL subsystem depends on the DAX subsystem to generate either: + +- A file-like interface to userland via :code:`/dev/daxN.Y`, or +- Engaging the memory-hotplug interface to add CXL memory to page allocato= r. + +The DAX subsystem exposes this ability through the `cxl_dax_region` driver. +A `dax_region` provides the translation between a CXL `memory_region` and +a `DAX Device`. + +DAX Device +---------- +A `DAX Device` is a file-like interface exposed in :code:`/dev/daxN.Y`. A +memory region exposed via dax device can be accessed via userland software +via the :code:`mmap()` system-call. The result is direct mappings to the +CXL capacity in the task's page tables. + +Users wishing to manually handle allocation of CXL memory should use this +interface. + +kmem conversion +--------------- +The :code:`dax_kmem` driver converts a `DAX Device` into a series of `hotp= lug +memory blocks` managed by :code:`kernel/memory-hotplug.c`. This capacity +will be exposed to the kernel page allocator in the user-selected memory +zone. + +The :code:`memmap_on_memory` setting (both global and DAX device local) di= ctate +where the kernell will allocate the :code:`struct folio` descriptors for t= his +memory will come from. If :code:`memmap_on_memory` is set, memory hotplug +will set aside a portion of the memory block capacity to allocate folios. = If +unset, the memory is allocated via a normal :code:`GFP_KERNEL` allocation - +and as a result will most likely land on the local NUM node of the cpu exe= cuting +the hotplug operation. --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA9E41D7E2F for ; Wed, 30 Apr 2025 00:12:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971978; cv=none; b=fFOATncb5UYQ5UEFpeQ4UtiCJsvvT7Af+EiMpc9asGzpboxX8dqRjO+4NFZUtpXL1frGUdGkEZlft1CpxMA/Xu3rJDeyxQqwONdtw0jwHT+qkULaFmozj7dHcEl/LSW4nqUJjcPPmRCvjWTWLcNps62nZ2Mhmu6hZRSQ3QHfl98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971978; c=relaxed/simple; bh=5V599XTZxAA6qbUCrLydp2qtu/cxWmE54QcHni3HVGk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LGPCFz6z1bhIdkFvET1xG8UURIONvkHwpbHs6l2lJSsJf82SUarLEhy1stuKLdB5oBJB6adSLJkEj4hv9Kk/In276jDyYPWUSr+PNrOANuGg1uEztw3FUuhP7g7LSX6sryDDRwzcj7mZtChNgx88s0Ja3OxR+qN7Nmlh1OiOqFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=UKqpuuP2; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="UKqpuuP2" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-476805acddaso83487221cf.1 for ; Tue, 29 Apr 2025 17:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971974; x=1746576774; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KE/unIJlMP8iLZOfjQZ/PwXn8D0sCs62FzmzF2efJyM=; b=UKqpuuP2l/w+KSXFw1UK+T9NjMZpGb3Cj1aFsj3bna9VFggzj5NDJh/zduHm6EOmQA 3KVXbiXYmTSolWHizi+4yg+uRJhNSbQJIClAt/Ur631OGp0KJErMRba4W49eyazttZG8 xvRu8yIFpNLbT2/ooqYiRB9hUZ7lUMT5T34QPQNpnI51zhLDRUz67C9PGY4CWQ4X6Xjq 61sOaHUdbuxTyT9vfykUL9rSlV0srNh3UNp4ZO1NoAkzo8d56FV4FsxpVLuhrNod60Ih fcn+R74gFt9EuAi45MNzK96AUyJhUBQysIE9eeztx4KBcdyAwHsOoaujdmZZPsrp7s8z TB9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971974; x=1746576774; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KE/unIJlMP8iLZOfjQZ/PwXn8D0sCs62FzmzF2efJyM=; b=P/kOBCuUJdakKh6Svt4m+WLO8acTSInmYtNQAISsiSVRFQzpz3eXl14FcDUud5W+IT KJC8vCDHa/QFVx0sD44WdnLU5YfMzRft8iYM2Qmpb4rH98EfcW1mjXu7CEmSPogMXhGQ g0VUzdnGZCTbGmh6ssuJzMxZY+KeU3yJaeCPIoxb3OXH5OK/RiAJIr/VUkmQYthBn1KG Wd2DS4oMdJLd4ItMsU49toLc6Sd8NnT/RzsRWFtpcodayDrUyP1LbOPm4crqLFFascvX Vq5WDhHzIXbokljT8iZZ+1WVFbBhWomCQ527a1h2qrW/FWCQUuqpwXqn16L4/MW7WTU+ X+yw== X-Forwarded-Encrypted: i=1; AJvYcCUpEt2jTHKWEhTdm16EgDRjF/syJKjA5V2v4PIGPcyCeMU1HyyKR5Tvh59nDzDyVJGS7CiJUKVfMfEcVog=@vger.kernel.org X-Gm-Message-State: AOJu0YzqUSB8M7HRnO9CqVsWeJl0mi0xsNzYIM+W1dAb7pdGtAODkUhQ Gly/Y2wmmc2UiCeXMdMvChdBP5gRetmnT5Okd7feb3BA1j7nsVugsV6EVKHej4lqR4sc78I025M C X-Gm-Gg: ASbGncsg6fd9yYIa/PWeJCvHN4hvhc9Z07TWUoe/3OIKOypsifMaAj9AW0cPnZ6Q/na JUmFPg7B7UAgpL+OVORGsTUNvtcrjtC1reMBGxCtTy3/Ny+6CQf3F9Qh5rH68omFbLYCVn2QCYC y+itHkqOQeQbed56pzl07Ax3yPqiauPw0k828XrsFQMCyOp2AbapblBDPwiBSiVS/l2j1K0y7xr lHoDV1JBiPpBJTYwPY61ngUbk9a16nUzI2b6vDJhfrNf3iNHlPj8jCJ4WsLatNB1V25SkoAzNZ8 /AK/VKrmB+bbrUUv5hA9kjF6IRcmnRE9kbaHx9CaeSlDQ+/ZqDvDnKE74OEQr7Z1bNs76Fx44PS jka8dOMviMp2nZpIbW3OMpHxZjtLc X-Google-Smtp-Source: AGHT+IEHDVkgWjoqnw57UMPr/IbEA4p7aS3feNOIrEU0q2Dur7jRTKnWJHv8ubLZEyWz+0kRa75TEQ== X-Received: by 2002:a05:622a:580d:b0:478:f8bb:b5e with SMTP id d75a77b69052e-489c3d8aa99mr20146361cf.13.1745971974662; Tue, 29 Apr 2025 17:12:54 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:54 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 11/17] cxl: docs/linux/memory-hotplug Date: Tue, 29 Apr 2025 20:12:18 -0400 Message-ID: <20250430001224.1028656-12-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add documentation on how the CXL driver surfaces memory through the DAX driver and memory-hotplug. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 1 + .../driver-api/cxl/linux/memory-hotplug.rst | 77 +++++++++++++++++++ 2 files changed, 78 insertions(+) create mode 100644 Documentation/driver-api/cxl/linux/memory-hotplug.rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index bfaf0e2ebfc0..d5186fc609a9 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -39,6 +39,7 @@ that have impacts on each other. The docs here break up = configurations steps. linux/early-boot linux/cxl-driver linux/dax-driver + linux/memory-hotplug linux/access-coordinates =20 =20 diff --git a/Documentation/driver-api/cxl/linux/memory-hotplug.rst b/Docume= ntation/driver-api/cxl/linux/memory-hotplug.rst new file mode 100644 index 000000000000..a26516a6483e --- /dev/null +++ b/Documentation/driver-api/cxl/linux/memory-hotplug.rst @@ -0,0 +1,77 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Memory Hotplug +############## +The final phase of surfacing CXL memory to the kernel page allocator is for +the `DAX` driver to surface a `Driver Managed` memory region via the +memory-hotplug component. + +There are four major configurations to consider + +1) Default Online Behavior (on/off and zone) +2) Hotplug Memory Block size +3) Memory Map Resource location +4) Driver-Managed Memory Designation + +Default Online Behavior +*********************** +The default-online behavior of hotplug memory is dictated by the following, +in order of precedence: + +- :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE` Build Configuration +- :code:`memhp_default_state` Boot parameters +- :code:`/sys/devices/system/memory/auto_online_blocks` value + +These dictate whether hotplugged memory blocks arrive in one of three stat= es: + +1) Offline +2) Online in :code:`ZONE_NORMAL` +3) Online in :code:`ZONE_MOVABLE` + +:code:`ZONE_NORMAL` implies this capacity may be used for almost any alloc= ation, +while :code:`ZONE_MOVABLE` implies this capacity should only be used for +migratable allocations. + +:code:`ZONE_MOVABLE` attempts to retain the hotplug-ability of a memory bl= ock +so that it the entire region may be hot-unplugged at a later time. Any ca= pacity +onlined into :code:`ZONE_NORMAL` should be considered permanently attached= to +the page allocator. + +Hotplug Memory Block Size +************************* +By default, on most architectures, the Hotplug Memory Block Size is either +128MB or 256MB. On x86, the block size increases up to 2GB as total memory +capacity exceeds 64GB. As of v6.15, Linux does not take into account the +size and alignment of the ACPI CEDT CFMWS regions (see Early Boot docs) wh= en +deciding the Hotplug Memory Block Size. + +Memory Map +********** +The location of :code:`struct folio` allocations to represent the hotplugg= ed +memory capacity are dicated by the following system settings: + +- :code:`/sys_module/memory_hotplug/parameters/memmap_on_memory` +- :code:`/sys/bus/dax/devices/daxN.Y/memmap_on_memory` + +If both of these parameters are set to true, :code:`struct folio` for this +capacity will be carved out of the memory block being onlined. This has +performance implications if the memory is particularly high-latency and +its :code:`struct folio` becomes hotly contended. + +If either parameter is set to false, :code:`struct folio` for this capacity +will be allocated from the local node of the processor running the hotplug +procedure. This capacity will be allocated from :code:`ZONE_NORMAL` on +that node, as it is a :code:`GFP_KERNEL` allocation. + +Systems with extremely large amounts of :code:`ZONE_MOVABLE` memory (e.g. +CXL memory pools) must ensure that there is sufficient local +:code:`ZONE_NORMAL` capacity to host the memory map for the hotplugged cap= acity. + +Driver Managed Memory +********************* +The DAX driver surfaces this memory to memory-hotplug as "Driver Managed".= This +is not a configurable setting, but it's important to not that driver manag= ed +memory is explicitly excluded from use during kexec. This is required to = ensure +any reset or out-of-band operations that the CXL device may be subject to = during +a functional system-reboot (such as a reset-on-probe) will not cause porti= ons of +the kexec kernel to be overwritten. --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFC551DB15F for ; Wed, 30 Apr 2025 00:12:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971979; cv=none; b=NXm1JcTpef1qgKce0JI/ui2ULHlcfR3QNl667LHNEj3yuDU0x1bsCyDHhroCXrO2TeSMxTKqL01I+leaQKXM0+V10uWjj29Uk82RDKRL2uh8oCDXdkdoCVawizlF/leMH4JndoTSrnYX6u5s3HZgtPyBcforrIUXrJMA0hKUS80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971979; c=relaxed/simple; bh=TbudILUJN0FP5VLxGVnsYnNFJJO9PoQVJHqli21jG0U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fcgdsbZdZ9X1tcX5mOoerTLcYjzWkU19WFkImaO/RbZUNkp2JVTDh3vadGTvWlLLWcj+cwwgM+Y1Git/pJwbahX845oaZooBShoe//HZ21mg6Zj4NvsUXKqBlV9PsTSZrvV/bBWlrArA2r/797rjQbUTZPnFW8iR0S+MxHYLjNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=lQjvfLoz; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="lQjvfLoz" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4772f48f516so4525911cf.1 for ; Tue, 29 Apr 2025 17:12:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971977; x=1746576777; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K/jf5P1SJ1TTjW37Z8pO9A3yJArrRLf6vL3nuDSaJuU=; b=lQjvfLozNq83YMcBm92iURmGiSsutiPsGwqCMVfVyYGmkRtQ7Zi+iKfrdQRP/7n0a3 S3PVpdHCSEe2nKjX3l4KUZ7Jwq5I1ICveaidDjq1RRLrTgNHhfDQFL2Zavis30JED+3D cOP1RdN86FqtBxAa+nYm6Di/sN+7Doc+GSJGa3PkDrusoSj7oGrzyQtak6zxxlL0pvPg /PGCU6BKaAzVwx9RLtwXXiDG1kvXkm6cAvNLZBr/qHeIx0viotKfSWKopcLh3EQzP8KK xAWzpCITUIZ+qFp+7BIqkHycCtKIqWqzbUVaprGEnxWXVI20id9L2V1rmktyOFF1PvSJ 2YhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971977; x=1746576777; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K/jf5P1SJ1TTjW37Z8pO9A3yJArrRLf6vL3nuDSaJuU=; b=wWjBpD1NRc9XRtU2WfAfCcgI/9gqzvY8J5DiKDjeD1ug93NTL+Q1r1a33K31bo4iIh QWyI66qn8ndVhva7mchp0liS72p2GQDZK+BBEyvEzVAWFpvCB7Qkq3pfb1KaTEe/oxmw A1lIGGF0lHT9BiSetsa43Muxd9W7GHfnQ8uyUNenxYjsfipFLY/sBb+WNlIftFaUrOzE ibKTmrf3R46Xo3OaW+/tO443W/dFn9phdO0qRwqgzg0VEceeX4HMo0FX0/eM/9fo5Mox /aody9JELaKDGAnMsJumKXL3jPH4QhgyRtMFte2bfjsSeMLSwYrariirZtdeGmCdfyE/ d81A== X-Forwarded-Encrypted: i=1; AJvYcCVbpHmKgUQovJI9QxgfOjcLHvcOqJOhRC2qxr4sfbKww6p13gXsICLafPqcxaSy7m3+Xh5pIkzuLcUbq1M=@vger.kernel.org X-Gm-Message-State: AOJu0YylPwkB7vyuq9ttWPOemf8QSYwD21kXOo6WLxDRfuouFrB2oSl+ QxhQz9Y5wTpMWuyf/GYt15oraa9un87gbwlWb/QUSRc0H5vBKRPrBPOirN1JcLo= X-Gm-Gg: ASbGncuZVlaMxIuB6C2hlDKAIZ1gQgX8xKmFHvGd1HYvHp2WB+ovOf5XKYOSwj18mOw 0FKjJYig19MNXRPBWLo4kRA2XvgvyXnMzaYFGYIAgpZYHqtL30GlRim+kMhk17x3BwTnnzEnfVz rd+22eSbu8IoOPpLMpKa/ziFFNdrW65t0F4P9W5jSr+uUz5Q/QIO1LuO1hAvefmMWgHUcRJbJxR QG2ScM0nK8d7io3yx//YXs4yHzjr+9DU0wbznER6zUcjaK6oZBYway6hJRH8VGdjGaPgGkWfNNA MfhK/zH4p8WtW8+Tv8yFNe5TDP8+IPSc621Aqp+0uFfCckQ3/hc2PVlPa48mmpeEjggxdaOQ0vB 2R96V7wWMzU8NdVh42VDEqqfR/ktG X-Google-Smtp-Source: AGHT+IEF1VJpcgqXRaIJTt5/r0suWK89wo3LFIdy+JCCJintOUr+gPIbueI1DmAc/mN/DB3Iu06J9w== X-Received: by 2002:a05:622a:a0d:b0:471:80ef:35e7 with SMTP id d75a77b69052e-489cb7bfd7dmr13735131cf.4.1745971976847; Tue, 29 Apr 2025 17:12:56 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:56 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 12/17] cxl: docs/allocation/dax Date: Tue, 29 Apr 2025 20:12:19 -0400 Message-ID: <20250430001224.1028656-13-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Small example of accessing CXL memory capacity via DAX device Signed-off-by: Gregory Price --- .../driver-api/cxl/allocation/dax.rst | 59 +++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 5 ++ 2 files changed, 64 insertions(+) create mode 100644 Documentation/driver-api/cxl/allocation/dax.rst diff --git a/Documentation/driver-api/cxl/allocation/dax.rst b/Documentatio= n/driver-api/cxl/allocation/dax.rst new file mode 100644 index 000000000000..8e0c9f6a6843 --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/dax.rst @@ -0,0 +1,59 @@ +.. SPDX-License-Identifier: GPL-2.0 + +DAX Devices +########### +CXL capacity exposed as a DAX device can be accessed directly via mmap. +Users may wish to use this interface mechanism to write their own userland +CXL allocator, or to managed shared or persistent memory regions across mu= ltiple +hosts. + +If the capacity is shared across hosts or persistent, appropriate flushing +mechanisms must be employed unless the region supports Snoop Back-Invalida= te. + +Note that mappings must be aligned (size and base) to the dax device's base +alignment, which is typically 2MB - but maybe be configured larger. + +:: + + #include + #include + #include + #include + #include + #include + + #define DEVICE_PATH "/dev/dax0.0" // Replace DAX device path + #define DEVICE_SIZE (4ULL * 1024 * 1024 * 1024) // 4GB + + int main() { + int fd; + void* mapped_addr; + + /* Open the DAX device */ + fd =3D open(DEVICE_PATH, O_RDWR); + if (fd < 0) { + perror("open"); + return -1; + } + + /* Map the device into memory */ + mapped_addr =3D mmap(NULL, DEVICE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (mapped_addr =3D=3D MAP_FAILED) { + perror("mmap"); + close(fd); + return -1; + } + + printf("Mapped address: %p\n", mapped_addr); + + /* You can now access the device through the mapped address */ + uint64_t* ptr =3D (uint64_t*)mapped_addr; + *ptr =3D 0x1234567890abcdef; // Write a value to the device + printf("Value at address %p: 0x%016llx\n", ptr, *ptr); + + /* Clean up */ + munmap(mapped_addr, DEVICE_SIZE); + close(fd); + return 0; + } diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index d5186fc609a9..d19148be3087 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -42,5 +42,10 @@ that have impacts on each other. The docs here break up= configurations steps. linux/memory-hotplug linux/access-coordinates =20 +.. toctree:: + :maxdepth: 2 + :caption: Memory Allocation + + allocation/dax =20 .. only:: subproject and html --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C850A1DE891 for ; Wed, 30 Apr 2025 00:12:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971981; cv=none; b=WLyjtcvotUnHxGrt2LpKH4Q+SRLFYC4Gg50zvktWlV/mfYD5eCvU3i/1B3gDpK0HaKXxXcpbkef07gIrOBrOTuGkLzEjbGj4ERUYfIDH0aUuwuQDzIFvDbyEkx8pixUu0Oi6W3JiZZtbNLoDAZFMt40ndheTNXeUyL5yMcb7Bec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971981; c=relaxed/simple; bh=a+GV4Qvg58wFX6J8gIoD1XrpMv0xTANuNhUoKbpPvI4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cHl3kKYi9CtFt96ZB1F1blhVrlkXeUq8q/GMHN9NEQbcv045stVN0WN0bj3lykc63DjP9ImlJI3lC8LNvPECs5c+eUcuk/hgnVKQHoCRZFDFD/mDOdSv3Ok3if/X3SHjwc2vjupB6dovJGI9YFOJq0uDbX20RLePEx4aK04A/NY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=WzyfWRHE; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="WzyfWRHE" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4774193fdffso122586241cf.1 for ; Tue, 29 Apr 2025 17:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971978; x=1746576778; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c0kd+zs7VCCg/bGrnqeB4WDWKHo0gBjI+E1y3uIR3P8=; b=WzyfWRHEeEpF5mqagsUAMfclfmFDbXYtwrPjsvCwQ3fjGjv+Ao5RCMR+2MUUDStxe1 lTU6Z61eHs5cvE34TkImNC4WEkneDztPg2IJZ8x/+5q4Hf5JBpCCe3sqBtj7euoQYtD7 H3zFwULmEG7LMeeHp+rnWC6IRAqh+aiZEAZQExPgtnJ46KP9pGiVpIEAPpG7ZHbvJ5EG /45DjR+2qR6kqbDOnW+TdjppoCCFjYZTAbn+86HgvsYiZjDNHIK9JrDYD+INJyebVygQ jRBKGvrjUsmhQ9YU/Os7GE4UVpM18denrTebK08IxWDoUyUg1azmVLAVLioj12IjPhZP TjLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971978; x=1746576778; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c0kd+zs7VCCg/bGrnqeB4WDWKHo0gBjI+E1y3uIR3P8=; b=pHJoGe/0HLbeJ9xYNyOrbQDVikhdPj8SUioy3RoLaWvhzGMAFMBsptGR7micnhKOEq JBezJtAKkfqKpccTFO9p31Qgc/DYQT6FP9Hr6bXISXjziwtwNbfRNbVMTd8NdbJGD63p WFaptFxvC1W1FdzLLwuoyeBHoG5nSF22oETEyoVpvJYkTe9Y8Tfapl9R0AEzKFqNdmYj K6M1qg+XR6YCfl5nDCMUAFWifFnexXsOOlHhUVqvh4R/avzPcXhpnR598oGNc7uOWPTi meF4k3R8oNTUNiZwxZIZ0Vms2sghQcVoLdXGtnh5IEW7dCyqLsfGvHB1GloH6z0aHNjm mrIA== X-Forwarded-Encrypted: i=1; AJvYcCXg61zzf4t7TZDeOxPCadjCOgN3Mk+7d+YJD1NLNjC1fy3/8h9puirdEmEPk8lkxiOKyJmJXQesXRzN2ik=@vger.kernel.org X-Gm-Message-State: AOJu0YydgCbKx6XywBxgMBr3bc6TfOSnH2PvkQX0d3KHtdW4DeZ5ixwc rj9NeLrncxy9alWrgja/UktpowUH5U32NNg8ljgHzgu8FBR3l+HU2FRpZasfVAQ= X-Gm-Gg: ASbGnctCRWNOj8Co1yp1iD8ejtST7+AdmhajNrJ3Kbfcjtri07s2hvXw9mMhb+RhZUV CA8ZiYExqyT8Lz5iJ7fOFQPzDay/XHGg6Ubj+yX/IEjsFq715oYp7ow3GgHmED0V3h+GE3EA66D j5vi3n5Cnn85C6xxb0UGbPUNkyfpGzeHC514h0BFPs1LWOeHyrOvdWesC/ilvG4WTGmi59rdZDo HysnYdMIAQkXhye1x4wUi+7gefdvDGvOnE1o70oEpo8kvzRPqXSuSVHyVZA3NqJaXUmG5S20dSN 1z8l8VluveF9P0gEJxT52BEI3zsdU+n+Y2ngqHq7POw3xn/UL/6PRigAA1I5axlUcdb6faCujQ+ hsKZrwbY6Mbd4BCy7Xge/1nwcNczR X-Google-Smtp-Source: AGHT+IFPXNoBiM4Jy+i0CEayph3Il4BP1VbXTa+vimVibIrLsan89vSi39bk9GCS3u8Af1GG6zcFdw== X-Received: by 2002:a05:622a:4d85:b0:476:6b20:2cef with SMTP id d75a77b69052e-489e60f0c66mr9962821cf.41.1745971978489; Tue, 29 Apr 2025 17:12:58 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:12:58 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 13/17] cxl: docs/allocation/page-allocator Date: Tue, 29 Apr 2025 20:12:20 -0400 Message-ID: <20250430001224.1028656-14-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document some interesting interactions that occur when exposing CXL memory capacity to page allocator. Signed-off-by: Gregory Price --- .../cxl/allocation/page-allocator.rst | 85 +++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 1 + 2 files changed, 86 insertions(+) create mode 100644 Documentation/driver-api/cxl/allocation/page-allocator.= rst diff --git a/Documentation/driver-api/cxl/allocation/page-allocator.rst b/D= ocumentation/driver-api/cxl/allocation/page-allocator.rst new file mode 100644 index 000000000000..e913dfa5ff3f --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/page-allocator.rst @@ -0,0 +1,85 @@ +.. SPDX-License-Identifier: GPL-2.0 + +The Page Allocator +################## + +The kernel page allocator services all general page allocation requests, s= uch +as :code:`kmalloc`. CXL configuration steps affect the behavior of the pa= ge +allocator based on the selected `Memory Zone` and `NUMA node` the capacity= is +placed in. + +This section mostly focuses on how these configurations affect the page +allocator (as of Linux v6.15) rather than the overall page allocator behav= ior. + +NUMA nodes and mempolicy +************************ +Unless a task explicitly registers a mempolicy, the default memory policy +of the linux kernel is to allocate memory from the `local NUMA node` first, +and fall back to other nodes only if the local node is pressured. + +Generally, we expect to see local DRAM and CXL memory on separate NUMA nod= es, +with the CXL memory being non-local. Technically, however, it is possible +for a compute node to have no local DRAM, and for CXL memory to be the +`local` capacity for that compute node. + + +Memory Zones +************ +CXL capacity may be onlined in :code:`ZONE_NORMAL` or :code:`ZONE_MOVABLE`. + +As of v6.15, the page allocator attempts to allocate from the highest +available and compatible ZONE for an allocation from the local node first. + +An example of a `zone incompatibility` is attempting to service an allocat= ion +marked :code:`GFP_KERNEL` from :code:`ZONE_MOVABLE`. Kernel allocations a= re +typically not migratable, and as a result can only be serviced from +:code:`ZONE_NORMAL` or lower. + +To simplify this, the page allocator will prefer :code:`ZONE_MOVABLE` over +:code:`ZONE_NORMAL` by default, but if :code:`ZONE_MOVABLE` is depleted, it +will fallback to allocate from :code:`ZONE_NORMAL`. + + +Zone and Node Quirks +******************** +Lets consider a configuration where the local DRAM capacity is largely onl= ined +into :code:`ZONE_NORMAL`, with no :code:`ZONE_MOVABLE` capacity present. T= he +CXL capacity has the opposite configuration - all onlined in +:code:`ZONE_MOVABLE`. + +Under the default allocation policy, the page allocator will completely sk= ip +:code:`ZONE_MOVABLE` has a valid allocation target. This is because, as of +Linux v6.15, the page allocator does approximately the following: :: + + for (each zone in local_node): + + for (each node in fallback_order): + + attempt_allocation(gfp_flags); + +Because the local node does not have :code:`ZONE_MOVABLE`, the CXL node is +functionally unreachable for direct allocation. As a result, the only way +for CXL capacity to be used is via `demotion` in the reclaim path. + +This configuration also means that if the DRAM ndoe has :code:`ZONE_MOVABL= E` +capacity - when that capacity is depleted, the page allocator will actually +prefer CXL :code:`ZONE_MOVABLE` pages over DRAM :code:`ZONE_NORMAL` pages. + +We may wish to invert these configurations in future Linux versions. + +If `demotion` and `swap` are disabled, Linux will begin to cause OOM crash= es +when the DRAM nodes are depleted. This will be covered amore in depth in t= he +reclaim section. + + +CGroups and CPUSets +******************* +Finally, assuming CXL memory is reachable via the page allocation (i.e. on= lined +in :code:`ZONE_NORMAL`), the :code:`cpusets.mems_allowed` may be used by +containers to limit the accessibility of certain NUMA nodes for tasks in t= hat +container. Users may wish to utilize this in multi-tenant systems where s= ome +tasks prefer not to use slower memory. + +In the reclaim section we'll discuss some limitations of this interface to +prevent demotions of shared data to CXL memory (if demotions are enabled). + diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index d19148be3087..52bc444506bc 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -47,5 +47,6 @@ that have impacts on each other. The docs here break up = configurations steps. :caption: Memory Allocation =20 allocation/dax + allocation/page-allocator =20 .. only:: subproject and html --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8755B1E0B86 for ; Wed, 30 Apr 2025 00:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971983; cv=none; b=uB6mztyWlTWNLvmVtGK/DUXyr1mU77TvhT/zFF8z8Hd5ZR6Jr0fa1LHM65NqKwTVxE2wIeKNu7KbNtMSSTzrSZRS3cetsLGM+X7XJk3f4JtJeNaq0XQM/st74P5UtSP0oWKZGp6OFnbBCyv0nSdhPGF1TXG2UC0Jrumc20q+sMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971983; c=relaxed/simple; bh=od5Us0DmJZn5ET4lRp8ewWoLEj/tIGTNdfUdU5eEZAY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=skWUpcfsz3VzSp3VT7ritQkVK9eJoBYUPeu0GLdaLqWkZtIYQjRvCaO+NOqYM5bcOKAXl0ZP7iEHc1ohN487g0RgaPeT6A5PaUXUbGQI4YdoLP1E8vLM0eUTqb9oBsaZ57PacHw7P2vJ+6Bx/EnhHtLmikLgsjr1+axhlCrNi60= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=rBW4aMck; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="rBW4aMck" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4774193fdffso122586561cf.1 for ; Tue, 29 Apr 2025 17:13:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971980; x=1746576780; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZP8IwuTcI0ddxoNDYzSPYNwSxCLdaHiFssczpGgKsDE=; b=rBW4aMckVVP6jFgUTXnmgnDDMkKxImvzDCr7MiFyAWCdKqdD2NgI4SHjsiEG4I6b1a 5a27QwSF8GiavY/JB1n0RO5WsKnRsKkw9umD1H1qlcxrQjJ0hHdb7yVGUEtrJ4gpWBDZ h1fT+7jyHtNmA+bhytqAGd3rLzpZdROeI3L7mAK8X0zHrTkjzg1MF3j7mXRTCUSlDDmf vgHghdGr/6JH27yTAy28FtrIXmp3Z9cU3egqXnrn0TXvMlJ6kSZbW6uSThAbq6uJllFy DgPGacVs6mef218DfILSgh64cYSAhntPgvkHYgpk4tnYRZo/7fEgt2SeWso3ECY7U3tE iw8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971980; x=1746576780; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZP8IwuTcI0ddxoNDYzSPYNwSxCLdaHiFssczpGgKsDE=; b=WJyDFIXm7mMKFTDLz/pfuFGTwwSvKtZ1Nr/s7O+O/x2LSqYvArcNSY/PrygbnRtRup tEl2etzS7N7hqR5CwzsFSnRfuioX1Jr+Def10LXijiz1HXM+aocEAbUes1+6XAFvLNtK EvPUCufwj0vmanGTaRM+jlp8kzKmzj2FrPVwsdKqwaxBOPl0tk8OytRiVKumILxxnCIb DepEhYwIsPFEYSeAf2TI2rKWeJXlKqnxpvPu6AR/2+n4DrnXA4tLHnDho7/0q2igOxSw 6VIkPxel+2nGdhokSF2k+e69Ucho7uRLosR/+JXCXv/cxzlU6za4oxM/wBCtotQdU9dN KVxQ== X-Forwarded-Encrypted: i=1; AJvYcCXRQpDu9lfYF4PsMvgPRUPB/XSLT5u2dsSDr5lH8+lTz/x5IVbRES/7MikX013ksbBmdttuyzLkHaUQmIo=@vger.kernel.org X-Gm-Message-State: AOJu0YwLFKRFaISS1CGByOFdHMhrWIaOVJOileOs4Awvp50YxtwVe7HC fUo9ST9wL2UC4njR+pAajEas6j/9z7gLIVXP7UKFb0NsRlc0gAZxA0JeXfN+rfE= X-Gm-Gg: ASbGncu1rnivJA/Eo8Il6BCMjF4tViGoL5ziBbfeMfuc9tnRic0zT60ryCbqW0yZhLP vkXg1pxySCLGuKtjftKs4cA7MIzrqdAcJeIreeFjQ0UELKxro6AXD7E8fASDndbVX85LdcPWr+R cIhWvkFswBsTKuEWuFxP6+0+MhoG6T8u4ilsIEOGTomCuaQPnb5fwEUpldcpM9bKH+0fiFGiQou mXfI+bo/rogFmgQRzMuSGwFTXHjv0c47z3pD9svbWD0s3HTg+sNjCbR/UMDI7ovyUOHeCShEOZw w5agrMtjaN6LcByUukC8p7yZ6Vb3allLnpesaLd8Hj4aqi2O7MUVI2VblyxaYnHOKfZ+bkLND4C GyvRkkS6BQjO17iTllzeo75QCz342 X-Google-Smtp-Source: AGHT+IEulUJ1ZdOBlX0WQlJws5c+EW19ARfMKYcw/63vGp1jBFQt+HKkN0nBlJDN/wRBATMWao2v2w== X-Received: by 2002:a05:622a:5808:b0:476:86bc:8b41 with SMTP id d75a77b69052e-489e67b4669mr9808011cf.52.1745971980333; Tue, 29 Apr 2025 17:13:00 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.12.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:13:00 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 14/17] cxl: docs/allocation/reclaim Date: Tue, 29 Apr 2025 20:12:21 -0400 Message-ID: <20250430001224.1028656-15-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document a bit about how reclaim interacts with various CXL configurations. Signed-off-by: Gregory Price --- .../driver-api/cxl/allocation/reclaim.rst | 50 +++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 1 + 2 files changed, 51 insertions(+) create mode 100644 Documentation/driver-api/cxl/allocation/reclaim.rst diff --git a/Documentation/driver-api/cxl/allocation/reclaim.rst b/Document= ation/driver-api/cxl/allocation/reclaim.rst new file mode 100644 index 000000000000..deb59422492c --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/reclaim.rst @@ -0,0 +1,50 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Reclaim +####### +Another way CXL memory can be utilized *indirectly* is via the reclaim sys= tem +in :code:`mm/vmscan.c`. Reclaim is engaged when memory capacity on the sy= stem +becomes pressured based on global and cgroup-local `watermark` settings. + +In this section we won't discuss the `watermark` configurations, just how = CXL +memory can be consumed by various pieces of reclaim system. + +Demotion +******** +By default, the reclaim system will prefer swap (or zswap) when reclaiming +memory. Enabling :code:`kernel/mm/numa/demotion_enabled` will cause vmscan +to opportunistically prefer distant NUMA nodes to swap or zswap, if capaci= ty +is available. + +Demotion engages the :code:`mm/memory_tier.c` component to determine the +next demotion node. The next demotion node is based on the :code:`HMAT` +or :code:`CDAT` performance data. + +cpusets.mems_allowed quirk +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D +In Linux v6.15 and below, demotion does not respect :code:`cpusets.mems_al= lowed` +when migrating pages. As a result, if demotion is enabled, vmscan cannot +guarantee isolation of a container's memory from nodes not set in mems_all= owed. + +In Linux v6.XX and up, demotion does attempt to respect +:code:`cpusets.mems_allowed`; however, certain classes of shared memory +originally instantiated by another cgroup (such as common libraries - e.g. +libc) may still be demoted. As a result, the mems_allowed interface still +cannot provide perfect isolation from the remote nodes. + +ZSwap and Node Preference +************************* +In Linux v6.15 and below, ZSwap allocates memory from the local node of the +processor for the new pages being compressed. Since pages being compressed +are typically cold, the result is a cold page becomes promoted - only to +be later demoted as it ages off the LRU. + +In Linux v6.XX, ZSwap tries to prefer the node of the page being compressed +as the allocation target for the compression page. This helps prevernt +thrashing. + +Demotion with ZSwap +******************* +When enabling both Demotion and ZSwap, you create a situation where ZSwap +will prefer the slowest form of CXL memory by default until that tier of +memory is exausted. diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 52bc444506bc..e20defe9c20e 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -48,5 +48,6 @@ that have impacts on each other. The docs here break up = configurations steps. =20 allocation/dax allocation/page-allocator + allocation/reclaim =20 .. only:: subproject and html --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5ABFA1E32C3 for ; Wed, 30 Apr 2025 00:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971986; cv=none; b=RBnlP1GHe8VHaYK16+FLawNmlgFekIDF2v5NGogktSO6fEJ+MM4CrLMh37R9eYpdXj90a6z/kish0F1G3OmssRxH/IvnKE3yVXwnqHKY93VdOn4R2/JyEYZQHufNCcnIP53cyICpbOz8SWEAESZ1By/6jxeANsLX5Vp8gWZNmNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971986; c=relaxed/simple; bh=lNJCo6vZDGuxpwKTO4RG3PWfxMqBn8sro5yfGkfy9yM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Trt2SoaHGg+ermxDNi3XJh3uneSrCZgCf2y+EMdcTKXiJCrZdU7WDmGbTUkB5hNRf1lyTCiKc1cma0IWYK9AXSqF/NAusZSx62PLQ7lO9NcZCFbXZhs6ly+VsFEu6BdNqYroC54duXFjzxvEXSjR8YIQWkai4yCgMp1eYq8V+do= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=sW4f88bK; arc=none smtp.client-ip=209.85.160.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="sW4f88bK" Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-4774193fdffso122586891cf.1 for ; Tue, 29 Apr 2025 17:13:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971982; x=1746576782; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uawVL9waclaUXznPQCp2ioRqecofoABS91U/9dY2U1M=; b=sW4f88bKjbS5qYVqSKwCfepo4FuyiyZ/49rqhAUZxKdF0y0XDgQgFvqVcEcPEPyWQ+ WBL8WejFJB2+RqaJcXnrpdKHA6XXIqQK6TXM/8KnguLeLHfshHoqpBQArw7VW7d89g2e 1P6/6MtgITfh+U/AQ/zrORlBl7Xdpyw+dO8BUNM7e0FJZxwQKmvDXx6KVqX6fZIbOpsa yL7vVQcxmb/JNlNK4UcFGz1TXx4fSDJX8PTFu1x9CZRIXQRrUs+Nk1egjwhVyYluV3u9 ldiX8wkkf5pEGc5qjeV3OVHwtEP5S4OInPGibdOdYpK2E4KLtCKw4rM5dBddASd4KcG4 1QhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971982; x=1746576782; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uawVL9waclaUXznPQCp2ioRqecofoABS91U/9dY2U1M=; b=Nb7EOpZ2Vc0DFMiowuOfmvYnF2uIvFHOmB5H1lhKF8YPg264S+YhNlWUmyA9aB5yZ/ pQrr4peZNbtQKZ/EfCK7rc98y50EPQfLCEUrgXqMcEgwyaj9jonizpXCK9VCIxtyEQqn q8awaeSIKfSsvdhCBKT6gHB/HF7EQkqOjSBSNHAicMufix30jIzC/FTg97W6LaydalEn P2kz0acEba6y3XKVdXu5thO5gkw1Vc4zaOCIUFes0J5ps482YSQJqVYqw1z5znOjQ+Cx MVQC3bjn7JcxPuYhJYOpJEiQF4MGsOX6qxcaZ/459whpUxUayAAdr+fKclBth76Jm8ZQ fZJA== X-Forwarded-Encrypted: i=1; AJvYcCWoBLEhgmHDCG+4wMNgf+SY2QOqbIb5Hj5IATSfSXv+x7mFoDYFSvGMf/EcZkqdweccm+y2C9PIHs7qZ2s=@vger.kernel.org X-Gm-Message-State: AOJu0YxnG260yHW4Kznxs9TyePsJEFf3y2HnktqIuAdBA3zvFqw1YEvZ X0jvYyNkrLm0Gc074yt/9VwlB0U6rQ8SdE6b9Bm1v/EgYxkkhegAwaDcAvC+Owgu2WZB6SlVeNH O X-Gm-Gg: ASbGnctpUCIezWWdmjo9lJgNdVbD77hQgMIPv4CC3c5jHguLtqXBo011AaFTD0Xeo01 veMul3WTcRRLQE20oEYtNG4fLZ1vT6Wo+bBbL7fTMBGoBBWYiY0RUciWr0J2xi3xentbYgJsEJc N5QlaaBwG63Iaomm1+9qLm1r+qdBxC8TJ5CnZps7y2R0LFkiMo29OTOfxRH+qSS5Eir18rX/CKv pftVyncUHqQFGFf7deUW+fkjMOJPqTFPviz6J7uvf47H0MdLHulNtCNDjlhskH0rgwsXm4oo02j RzHyJnGMQYfcsU9MfroOUjg2IWQQDIzlz8/VpDhNKRY9Qr/ZPZJciLRqPfNc3ABeulXIlkdrN/I 0X1NqcLmDs8GEjoAyyXJ+Tqu0C9cKECkCxVRjUPk= X-Google-Smtp-Source: AGHT+IG2vOhCtzXNYKwtzwLvK9oR0zRMiZj20a0FOQ1aA2EP6QhhElNLMZmA7y1EAW5PIcBKuXgyGg== X-Received: by 2002:a05:622a:1f8d:b0:477:4df:9a58 with SMTP id d75a77b69052e-489e4a8d38fmr11575931cf.18.1745971982177; Tue, 29 Apr 2025 17:13:02 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:13:01 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 15/17] cxl: docs/allocation/hugepages Date: Tue, 29 Apr 2025 20:12:22 -0400 Message-ID: <20250430001224.1028656-16-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add docs on how CXL capacity interacts with CMA and HugeTLB allocation interfaces. Signed-off-by: Gregory Price --- .../driver-api/cxl/allocation/hugepages.rst | 30 +++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 1 + 2 files changed, 31 insertions(+) create mode 100644 Documentation/driver-api/cxl/allocation/hugepages.rst diff --git a/Documentation/driver-api/cxl/allocation/hugepages.rst b/Docume= ntation/driver-api/cxl/allocation/hugepages.rst new file mode 100644 index 000000000000..195cdb0d64ee --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/hugepages.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Huge Pages +########## + +Contiguous Memory Allocator +*************************** +CXL Memory onlined as SystemRAM during early boot is eligible for use by C= MA, +as the NUMA node hosting that capacity will be `Online` at the time CMA +carves out contiguous capacity. + +CXL Memory deferred to the CXL Driver for configuration cannot have its +capacity allocated by CMA - as the NUMA node hosting the capacity is `Offl= ine` +at :code:`__init` time - which CMA carves out contiguous capacity. + +HugeTLB +******* + +2MB Huge Pages +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +All CXL capacity regardless of configuration time or memory zone is eligib= le +for use as 2MB huge pages. + +1GB Huge Pages +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +CXL capacity onlined in :code:`ZONE_NORMAL` is eligible for 1GB Gigantic P= age +allocation. + +CXL capacity onlined in :code:`ZONE_MOVABLE` is not eligible for 1GB Gigan= tic +Page allocation. diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index e20defe9c20e..51dd0392883b 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -49,5 +49,6 @@ that have impacts on each other. The docs here break up = configurations steps. allocation/dax allocation/page-allocator allocation/reclaim + allocation/hugepages.rst =20 .. only:: subproject and html --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7E9A1D7E2F for ; Wed, 30 Apr 2025 00:13:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971986; cv=none; b=BOl8YG0ZmbmUd2i02qBoT5ggvgB5/ypjC3vKe46zW2n1e6gYez8CCuDEHVrI1C7FUBwIJD/DAl88bOETzGOjwRws6ouNOYGv6bBarddlRDRnwTLK1JRalQ5MWYhm8OrhJ7F4g83fWt1pxXn1EOtlkF9E6VsQyWfAjTEHMkRA+0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971986; c=relaxed/simple; bh=0K2qYudSIKSpntsCjjospSQ3Faqk32cfHXFTc352FVA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aXV8IxsW5VSrL/vIipc/kVgduOj0P97S7ioDjGO8TAg8TBAUT+jFJukjpERdz7MshDGu+0pVj8LOPbFAms5LSCfLNdQ5RGmHfnzds1fX0+g4FUJc0NXSuTEePnqwy6LVrZSkRrWwYgK97BDH+syYkaJ4zN6tK7Nkr4LBJW0cq0U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=qw9MN/Y7; arc=none smtp.client-ip=209.85.160.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="qw9MN/Y7" Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4772f48f516so4526751cf.1 for ; Tue, 29 Apr 2025 17:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971984; x=1746576784; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0nlYJCidbo7D6EAdzULAlI9wXCDZH50uIfMmptbRecU=; b=qw9MN/Y75ouhn6KECXclLkEN/G+rWANX8W0YPDtZyW0zAXvrQDMmz64vBsJf7KYGMi mtQT6SaUP6lz8wXAnufPQ1/mk7FG6Qp3J9weSITJRRV7lueUHf0GfqzzGtdnY8NdUSW3 CWJvguwo9teQFX3dr3fQCAJDAGrgA1lTzcrroMMDUjHnwcsO7s/nC7qXWTyDEgmtb8uD R7Q1fpIFA/pUE40A1IhGn6nFZ0QAYfocm1M6L0QjGJEQPfzHqdEiIe9j1mVpQtEPuZfR 3C+Jc4wcrYQ/lTQLc+E2abFe1nH088Khay5dRikGVEoD9Ovgq64goy1LMFu76xK4DRKY y64g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971984; x=1746576784; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0nlYJCidbo7D6EAdzULAlI9wXCDZH50uIfMmptbRecU=; b=ac3VyOq/ExEbdfYasmdSXXmxVhpFKEuzdg2i2N+Nw3WJuDIdtSSvLisTupp5blayLH +gHIywzYW53o6fE9ORkp7xK3zybnrYg7jRrTLm6qSJD+y/0kY8SC3GqBu/83f8+Kvs3a vW8kTCWMSAEPPTRVA44/v0CQkB+81xoMrYKio9bMhroda68+reKSP6wNzZwwVZ1B8xkE l/eoj6G5or32kwk8lscrp210Lk4d/Kf3wrSXQkYwR/QI3787fBXmEkUBOA6EBFwPQAAl t1QkfQrQnyPtzJdfwE0ybj48TNMFF/+Fakh0919x61pxEahYtAuqshAiKrP6kjId6D/n ZLCA== X-Forwarded-Encrypted: i=1; AJvYcCXezGGXcFwoss6rL83befMGFqW1J84OsXMyfVzV9lgLxQBhXGawsIKkrNdcema3WGclWlvX05KPUl44bb8=@vger.kernel.org X-Gm-Message-State: AOJu0YxkE7j0m+3C4YmzvGEubzYCU3pj2M/IJVmOXkkGIDBD2i2UqOi/ SXcdso5FeuvhPdiuZnVMInsqj5n+6u7K0H2GxcJnZyOV6uk4hMCPUPUsQ8KgjEo= X-Gm-Gg: ASbGncv7TbOPtjwxRaG7d3ftBNX6hp9Cdwd/cqRgWAaYhvaVb4PFzfzCjY2g3ki5H4F mmqTKybeMZS5gyYg4/9j6jL6XbVOYFNID12hfwNq8Cm8HRhhPycmxNwMupzok0fCBYqPbdnxK/P rq3H7+0LvOZai6Y49U3nAyjUO/IuoVns58f2O6ShiPm9k+BWn6nJ+U8+aFRH/wZ0cf5eNngzdPl LDLe/1YzLZb/Isvge/f0kGbqAMHxmfDa4wc+3mLcXjMF1AYeP6vGAOi5SJ01FcIbmes+th7VKAL iqmllxSWUqUZcnNJi8r5SbncOZsL2p5wMB2Bb0VAHmU9dAU+vm+o1tLU3hW983yXw0+1/xxZDkB MFDpnBWAP/45GgqSI7mC2RlrefJ3M X-Google-Smtp-Source: AGHT+IHHlI+GitsjsH8YXECXSbCgG0VCNVvdhBX/jA/ObO5ilet3C+3xZmGZflAts3b1Fw2a8vwXHA== X-Received: by 2002:a05:622a:2483:b0:477:1dd0:6d15 with SMTP id d75a77b69052e-489cb7b9eb5mr13982081cf.5.1745971984051; Tue, 29 Apr 2025 17:13:04 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.13.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:13:03 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 16/17] cxl: docs/allocation/tiering Date: Tue, 29 Apr 2025 20:12:23 -0400 Message-ID: <20250430001224.1028656-17-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Carve out a space for folks to explain existing tiering mechanisms and how CXL capacity interacts with it. Signed-off-by: Gregory Price --- .../driver-api/cxl/allocation/tiering.rst | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 Documentation/driver-api/cxl/allocation/tiering.rst diff --git a/Documentation/driver-api/cxl/allocation/tiering.rst b/Document= ation/driver-api/cxl/allocation/tiering.rst new file mode 100644 index 000000000000..dde7010fff12 --- /dev/null +++ b/Documentation/driver-api/cxl/allocation/tiering.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Memory Tiering +############## + +todo + +Memory Tiers +************ + +todo + + +Transparent Page Placement +************************** + +todo + +Data Access MONitor +******************* + +to be updated + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +- `Self-tuned Memory Tiering RFC prototype and its evaluation `_ +- `SK Hynix HMSDK Capacity Expansion `_ +- `kernel documentation `_ +- `project website `_ --=20 2.49.0 From nobody Sun Feb 8 22:35:32 2026 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68AF01E7660 for ; Wed, 30 Apr 2025 00:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971989; cv=none; b=YxCZ7w12psos2e+5HG6vcFuiHL6zVbn08XKhhUQCINLxMvykYg9ZuMbpIaXnC+86pB+rc3oGZG4vqVBtGCC41Z9g3W/3fOxXAOE2LrJmGktvj1i6wnrnLBFFsUKSTICKeoJzWNTy+H2XnBfd/btubEtKFQFF+s4hCsxuYmnuK7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745971989; c=relaxed/simple; bh=BzfFAM5BhwnMp75QAL22bKF9ElQQxGopTB6tLuSsWJI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lgVROkcor3GTDHKwM/8yMjmKAkD56npFP+IjHm4gBICf1FD9GHRrEvm/kKId8omUMspHrdjdqnnvrJqxSm27ThZXFsrnlnIrHeXy0OBtUkVjHk0y+gc601QuOZSnTFdwY6FiUEon5xC4liV3fgXKVkrX/W8SRR4bGy2OKC5Cib8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=EQAEqXJe; arc=none smtp.client-ip=209.85.160.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="EQAEqXJe" Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-4767e969b94so149603971cf.2 for ; Tue, 29 Apr 2025 17:13:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1745971986; x=1746576786; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U3Yyi9y+v5P6bmpZgdp45DsIU9acCM/HjEvTDZbsG00=; b=EQAEqXJe6sjJzQYrxPBEe5xRaSbx+ib/+GvuPxwq9/h8jM4I/R8CJANbkbgTnXMO99 9khpcaeCCb6B8eqL+/FHyYCdnX6N0wvKeXz2KXZQpQnLuyiVfIjPl7tEWtHemHq8uEID 7wKYwjCmV8WSr7YRNcAcCZXbVhHu9Qi0mhg8YOyUayGEvTtKOJJA5C3wniEH+aIQg4q7 mVtPixyORl006jKHhnAbv+1QojHQ7Swjdgoa+ql0IAr9UPdVPykxjDafZ8AJGaAhzC+u zKNCcW8P/cinfLvZuOrO6L24T6JV/sEZ/QgVYJHMm0kLWNzS+Z0+MslQumaV37KC1S/G i0Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745971986; x=1746576786; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U3Yyi9y+v5P6bmpZgdp45DsIU9acCM/HjEvTDZbsG00=; b=fIy3X/gyQ75sK2cTeoTt0z30o8JoTD0+U7KBe0nD8BVpFm0obvkG+D85UqDqM2y1zJ zOoBbrSFtN5mnNJJ4DlXtanZrSwYuB5TvXLI70HSnn4+aNhtdyN15jPG1+nyFZ6iP4Aq A+XTvhtToDdlv7vOTrd23JMkGOovEj5nUVxy31BCNi5H+g4h0twZk9ZsRTRCCs4a6df5 4apOuXCfnE6ieIIxxmTeR682EnB91IHPMk10ZyVhuZOTuMkVm8W6DmNh7ZD2TnPXARl4 +cYzgF3CaVL1I3hsUdoq/p4o+p32sjK645zwpg0XcFceai1DWXDsFnVQzfBoR9qA9TXE VrSA== X-Forwarded-Encrypted: i=1; AJvYcCUVT8cekkwnNfE2TBu+rmbkLEwKNU9aBzGxv0S2K0qn+VeLULr+ei5ajMqWZS8JWwYpGScm65mkKWxY7n8=@vger.kernel.org X-Gm-Message-State: AOJu0YziN6VCDX6BmBhmlymZtLxYaO45NKpRkRYpiNmLay3/r/x+KSN2 FwNzDbaZGL6fJWOarda+53h5Bsmk7KfFP1ls0eVXXrDQnsynGOgolil9T8eNli0= X-Gm-Gg: ASbGnctH2iZyEC81VpZJwPxAg0i30mhXH9wZFeW1rVEdRdfakeEzOwxIHZcf1zGWYJ5 3+Xa3O1bDRDT0n2q9MIVSmZhLlliH0fFOw57HDuP7HOdZkj0AgnP80N5J8w2+NAUjOqVQLrQgJ2 q7n75mJqHHKHic58AQCdZzixo+XPMgeCF4SECz9QUGlO9vOyYpPJjdUe5k3FAPSLRj4o1Rjum0D lwa56zwhrG+X85QgutCbSL6dglEXu4KuR7K92bJdTGRXFy79W2WknoUzARE6YPgs4XRHnst71Gj m8kvGlbRqL0DLgiebdQSXJA9s8rN3Zfr1HsQwIaHoGJZEIqHue9CienmjQTWZPemzoPQQ3t/IjD dJ/K/yYofoY7APA0sVBHL3S+OohWg9eV5z7+TtHs= X-Google-Smtp-Source: AGHT+IHDh8J8wln5z2S1SJ1zWQwlLw93RU9T3YfEZpUHWnDL1YaDIdRHES8teoNW5qqZZtojkjVFlQ== X-Received: by 2002:a05:622a:17cc:b0:476:bb72:f429 with SMTP id d75a77b69052e-489e63d8bbamr9206451cf.42.1745971986365; Tue, 29 Apr 2025 17:13:06 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-47e9f7a820esm87634411cf.41.2025.04.29.17.13.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 17:13:06 -0700 (PDT) From: Gregory Price To: linux-cxl@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, corbet@lwn.net Subject: [RFC PATCH 17/17] cxl: docs/use-cases Date: Tue, 29 Apr 2025 20:12:24 -0400 Message-ID: <20250430001224.1028656-18-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250430001224.1028656-1-gourry@gourry.net> References: <20250430001224.1028656-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add carve-outs for use-case documentation. Complete open as to what (if anything) we should explain here, and/or what the structure should be. Signed-off-by: Gregory Price --- Documentation/driver-api/cxl/index.rst | 10 ++++++++++ .../cxl/use-case/dynamic-capacity.rst | 19 +++++++++++++++++++ .../cxl/use-case/memory-expansion.rst | 14 ++++++++++++++ .../driver-api/cxl/use-case/shared-memory.rst | 14 ++++++++++++++ .../cxl/use-case/virtual-machines.rst | 18 ++++++++++++++++++ 5 files changed, 75 insertions(+) create mode 100644 Documentation/driver-api/cxl/use-case/dynamic-capacity.= rst create mode 100644 Documentation/driver-api/cxl/use-case/memory-expansion.= rst create mode 100644 Documentation/driver-api/cxl/use-case/shared-memory.rst create mode 100644 Documentation/driver-api/cxl/use-case/virtual-machines.= rst diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-= api/cxl/index.rst index 51dd0392883b..e0a86f68b6f8 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -50,5 +50,15 @@ that have impacts on each other. The docs here break up= configurations steps. allocation/page-allocator allocation/reclaim allocation/hugepages.rst + allocation/tiering.rst + +.. toctree:: + :maxdepth: 1 + :caption: Use Cases + + use-case/memory-expansion.rst + use-case/dynamic-capacity.rst + use-case/virtual-machines.rst + use-case/shared-memory.rst =20 .. only:: subproject and html diff --git a/Documentation/driver-api/cxl/use-case/dynamic-capacity.rst b/D= ocumentation/driver-api/cxl/use-case/dynamic-capacity.rst new file mode 100644 index 000000000000..93a24aa1edc5 --- /dev/null +++ b/Documentation/driver-api/cxl/use-case/dynamic-capacity.rst @@ -0,0 +1,19 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Dynamic Capacity +################ +todo + +For Virtual Machines +******************** +todo + +For Workload Orchestration +************************** +todo + +For Shared Memory +***************** +todo + + diff --git a/Documentation/driver-api/cxl/use-case/memory-expansion.rst b/D= ocumentation/driver-api/cxl/use-case/memory-expansion.rst new file mode 100644 index 000000000000..d1d25e0e4498 --- /dev/null +++ b/Documentation/driver-api/cxl/use-case/memory-expansion.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Memory Expansion +################ + +todo + +As Page Cache +************* +todo + +As DAX Device +************* +todo diff --git a/Documentation/driver-api/cxl/use-case/shared-memory.rst b/Docu= mentation/driver-api/cxl/use-case/shared-memory.rst new file mode 100644 index 000000000000..dfdc2c419ea2 --- /dev/null +++ b/Documentation/driver-api/cxl/use-case/shared-memory.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Shared Memory +############# +todo + +Coherence +********* +todo + +Fabric Attached Memory FileSystem (FAMFS) +***************************************** + +todo diff --git a/Documentation/driver-api/cxl/use-case/virtual-machines.rst b/D= ocumentation/driver-api/cxl/use-case/virtual-machines.rst new file mode 100644 index 000000000000..0411d37092ce --- /dev/null +++ b/Documentation/driver-api/cxl/use-case/virtual-machines.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Virtual Machines +################ + +todo + +NUMA Passthrough +**************** +todo + +Flexible Shapes +*************** +todo + +Datacenter Efficiency +********************* +todo --=20 2.49.0