From nobody Sat Nov 23 22:36:01 2024 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2058.outbound.protection.outlook.com [40.107.243.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C8901A7AFD for ; Mon, 11 Nov 2024 17:32:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.243.58 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731346360; cv=fail; b=tF9tGBc97cAKyKHJTl9QiNqQ/b9Mgzl4VFDppyOzHKrR5qpr2Inggji3VK9vPMQ1q04J++CjDomTwIaLyHqEfkT1njqAQ5nc+fYHnbckDE7fpZsXbezxAkw6j9PdUekeYyo7WrATySYp6Kb4w17FtKksssv2d5ShIed5RlqZy8M= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731346360; c=relaxed/simple; bh=UsbGcf13d+I+G7qD0kIHGvyvTzkFkolqbGJAGf7IakM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qoTfC6NYcfhFvZh7EJiWKxsZ5Y3oG83wyaF8WweK3bLAvNZsCTTnrvPdonYuIXUNsPL8DEUN92DZRrBf6PpO1hHFdn8SdWiOiIt2NKj/W9qINAeeMsTZngVizZOIOubcrrXJgQuDHFfbcvhmYYrVyuKVp1iNNOrDo82b7A4qx3s= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=xjicdgSd; arc=fail smtp.client-ip=40.107.243.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="xjicdgSd" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GTkij3KHm2PatCFsrsru9n+wdX4dDGTUUPI2YpR2w9fVHimir9aa2P9LTqRzkoBnGsgygAQkCYNaPx9QB0tHyC/H0jH9pPG23kxs6Efv7pdolDdraFxC7PhH48OO3l72qx0HzIVsvCy8l0rtNexIbGzkSPXY9cZJLxkxV2R4mRl1nLfHGB81+PYIHfHZRcpiH+jwQcQ9/ccsaquQw06MwxQd7IvRELk/hpQqYNXKKBC9Oa6RALAnu9U/xemj2Ekxvd0SVwZeIJz8VlzzUp+gYxZUSNMKUCX4ELsdV3pDOihOrIhwwnEfDQyr+Jwq2ZT257wOVpFW8ao6CKWOR2LzrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PVnTh90STkHLtxh4lH2Oi4lprjo/yzoCa+xelUlrY08=; b=oJiTZ3wTMxKk09yjEZ7GPcH6BtUYRViz8fkev6HAuQCfJVGVstMFTXPwGj3y0V1/vDucpjuO1y/2TBhpmXwgdhB2Y5YTTtftXdM7+c8QBxNLfhZb7dTpdjtG850Eqj6dTUCotzG9jpJCVC4Df9wyKCLKNtD2JbMCwan8az3sgC0qinT/CD6fC+8GPRFQw4Bc+50uXEBEbcCoXeFqAdaHDJdtN6oNl3ElsyPni/30+T8GwKuAFDnvjoreLEKztF7VqeyvjZWlwfaNFnL9Oxko5VAR4ee5x2sU1wYhDoXt1tJfClnlMq5r6ObM4WyQgXwKJEh9TS/Zuap38dkRUUMwOA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PVnTh90STkHLtxh4lH2Oi4lprjo/yzoCa+xelUlrY08=; b=xjicdgSdk5vXiwS4rZB6BxqE9w2GBXacjbpg0YHXSNe/1woF7pJRxZiZiQlRNxEZO94FiOfBeRZhLXIUZ2XUflkIkI0Nc+wClXUAc673WBFUusnwGLGI4l5oPyQ/oSHN8JCiUO2QnFAdoXOM+IiAau3gih1gtnV3XV5scY3k0EI= Received: from CH5P223CA0024.NAMP223.PROD.OUTLOOK.COM (2603:10b6:610:1f3::14) by CY8PR12MB8265.namprd12.prod.outlook.com (2603:10b6:930:72::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8137.27; Mon, 11 Nov 2024 17:32:35 +0000 Received: from CH2PEPF00000146.namprd02.prod.outlook.com (2603:10b6:610:1f3:cafe::19) by CH5P223CA0024.outlook.office365.com (2603:10b6:610:1f3::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8137.29 via Frontend Transport; Mon, 11 Nov 2024 17:32:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF00000146.mail.protection.outlook.com (10.167.244.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8158.14 via Frontend Transport; Mon, 11 Nov 2024 17:32:34 +0000 Received: from SATLEXMB05.amd.com (10.181.40.146) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 11 Nov 2024 11:32:34 -0600 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB05.amd.com (10.181.40.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 11 Nov 2024 11:32:34 -0600 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Mon, 11 Nov 2024 11:32:33 -0600 From: Lizhi Hou To: , , CC: Lizhi Hou , , , , , Subject: [PATCH V8 01/10] accel/amdxdna: Add documentation for AMD NPU accelerator driver Date: Mon, 11 Nov 2024 09:32:21 -0800 Message-ID: <20241111173230.655325-2-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241111173230.655325-1-lizhi.hou@amd.com> References: <20241111173230.655325-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: None (SATLEXMB05.amd.com: lizhi.hou@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF00000146:EE_|CY8PR12MB8265:EE_ X-MS-Office365-Filtering-Correlation-Id: 0b49e979-84d6-4272-c42e-08dd0276d479 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|376014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?nuJQ3c8KpijfH4aNk+4355XZHfQkMUvPNVM/RXY8ol+2Ku0+oW84+Qbc7xu2?= =?us-ascii?Q?t589Ze1qEyhBto87O1PxrwW72RN/2HN8/V1dYk94AO0JQqnXMXrHuVCkFHXA?= =?us-ascii?Q?j0mA0OUqw1F3aFTI/O+Y6IKN9UKH+rv83rMAuQfJLQoYi1sNFhKWb4xoRtfE?= =?us-ascii?Q?Bx3Mbs2lvpcoRQpCfeef4+pl8VVx/ePgLBKHvUTjJ57w2h/m38/vPTAOwDXc?= =?us-ascii?Q?FXSRg4HZ4cXTa5GA+wgX7q/j/C3d/VvT7+PfOUYDLWXiUEPQoGpstHRKImLO?= =?us-ascii?Q?QP0SffM0Kgfc1hf7KhzlaC9eppgN491yUc15WWk/k6mHm1nBF4HEi7o+Bgdc?= =?us-ascii?Q?weFuKDiwJzhcvFbGaRxxAJ/Xkm6XrbnXK1uAywIsZYE9wbMzK/tbUrBTOygw?= =?us-ascii?Q?kj9jzV/TG1NRSjCa3GAMPuj/SHDlIeU61q3xdcfo4PJZoEOVdpU85VznCOjf?= =?us-ascii?Q?B1UUBrldME6E880jKgEah4MJj8ZVeHupCY/nvq9IAdIzV5dXN0vVhTqI933H?= =?us-ascii?Q?yMnfshNGAvEl4LqGkgkW0cM7Zd4X13vGiiu7/3flwpfygHZhihmCxn42bztP?= =?us-ascii?Q?oTXSv1gXQqLJvZYJNSqFbYsodIVlNTWiwMEEk6XDywr9StEox+0DN/fLi/jJ?= =?us-ascii?Q?rrZQY7zjKJQOmmfAz2IFcZVo4psxnNzgwWHgWwGsFukbSWTMnLx78hI5wi5I?= =?us-ascii?Q?5uC4GWPPbAf8TXSywYDWq+E8NmSYn7kMm+EB9oTgAyUH2noJYB6a9ipxbnX+?= =?us-ascii?Q?fgZHZ1/Mp4OLYsQSc9iHLyw2GqBT2HA9jn/76RrJ1wqPrJ6wy73oJIye4s6F?= =?us-ascii?Q?rGCncBYiUvP+Jt4iyXX2K0L0QOFw1/NyonC7s/+4bD/PEmS/I0niat+aCLWC?= =?us-ascii?Q?kc6tuddUDJ4Gd/crObPseKuj9N6c0KxaPno10T1MPXwF55Y9d0q8E15Ttt6d?= =?us-ascii?Q?J/nQ6KVNOpjyAcns0Pq6R6qX5FZLYdLy+C+rXRMVO4y9c6IMpw3KGImS6ASl?= =?us-ascii?Q?QJIr6HYwnOK3Yn+1te7y1N3GExvYhhU8h6rV0R4xdAvoQaKQ8mt1uby4ByRo?= =?us-ascii?Q?P8UsO7aSevXwnl653vPbykDbjZnmZtPNT+mS4Q7IHbMjvv/Y3jHFcFd2onkh?= =?us-ascii?Q?Lfk80aSKFj8fSainuIEZEFHVHGQZMFdcQr90Z936SDN1Bcg58K4SzADMjqOe?= =?us-ascii?Q?v9lsmm8Qirt9M/i5RGV9sExORITaw/CoXYCSKa+CuXf04qc0C0HPx8K01ADY?= =?us-ascii?Q?frNoV1dlQdByohoade9wfFbOf1A+VrIEzjEeUBe6R6xQ0VUv7RLXCyCdpBmS?= =?us-ascii?Q?Q5O3DL7SPdOxfdkQpGo3inKg15tkp/4nDuHSDgW0evl6DBNO1jo4ZyOL7zGr?= =?us-ascii?Q?t8/t3hQsj6qDHH3Z8y3buiT2rSfzYpGBgLZV7mM6+OoKDLyZjmtxwFJxlWhm?= =?us-ascii?Q?vb8paiqpGaE=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Nov 2024 17:32:34.9687 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0b49e979-84d6-4272-c42e-08dd0276d479 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF00000146.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB8265 Content-Type: text/plain; charset="utf-8" AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator integrated into AMD client APU. NPU enables efficient execution of Machine Learning applications like CNN, LLM, etc. NPU is based on AMD XDNA Architecture. NPU is managed by amdxdna driver. Co-developed-by: Sonal Santan Signed-off-by: Sonal Santan Reviewed-by: Jeffrey Hugo Signed-off-by: Lizhi Hou --- Documentation/accel/amdxdna/amdnpu.rst | 281 +++++++++++++++++++++++++ Documentation/accel/amdxdna/index.rst | 11 + Documentation/accel/index.rst | 1 + 3 files changed, 293 insertions(+) create mode 100644 Documentation/accel/amdxdna/amdnpu.rst create mode 100644 Documentation/accel/amdxdna/index.rst diff --git a/Documentation/accel/amdxdna/amdnpu.rst b/Documentation/accel/a= mdxdna/amdnpu.rst new file mode 100644 index 000000000000..fbe0a7585345 --- /dev/null +++ b/Documentation/accel/amdxdna/amdnpu.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +.. include:: + +=3D=3D=3D=3D=3D=3D=3D=3D=3D + AMD NPU +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Copyright: |copy| 2024 Advanced Micro Devices, Inc. +:Author: Sonal Santan + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator +integrated into AMD client APU. NPU enables efficient execution of Machine +Learning applications like CNN, LLM, etc. NPU is based on +`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver. + + +Hardware Description +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +AMD NPU consists of the following hardware components: + +AMD XDNA Array +-------------- + +AMD XDNA Array comprises of 2D array of compute and memory tiles built with +`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1 +row of memory tile. Each compute tile contains a VLIW processor with its o= wn +dedicated program and data memory. The memory tile acts as L2 memory. The = 2D +array can be partitioned at a column boundary creating a spatially isolated +partition which can be bound to a workload context. + +Each column also has dedicated DMA engines to move data between host DDR a= nd +memory tile. + +AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 row= s of +compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8 +topology, i.e., 4 rows of compute tiles arranged into 8 columns. + +Shared L2 Memory +---------------- + +The single row of memory tiles create a pool of software managed on chip L2 +memory. DMA engines are used to move data between host DDR and memory tile= s. +AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory. +AMD Strix Point NPU has a total of 4096 KB of L2 memory. + +Microcontroller +--------------- + +A microcontroller runs NPU Firmware which is responsible for command proce= ssing, +XDNA Array partition setup, XDNA Array configuration, workload context +management and workload orchestration. + +NPU Firmware uses a dedicated instance of an isolated non-privileged conte= xt +called ERT to service each workload context. ERT is also used to execute u= ser +provided ``ctrlcode`` associated with the workload context. + +NPU Firmware uses a single isolated privileged context called MERT to serv= ice +management commands from the amdxdna driver. + +Mailboxes +--------- + +The microcontroller and amdxdna driver use a privileged channel for manage= ment +tasks like setting up of contexts, telemetry, query, error handling, setti= ng up +user channel, etc. As mentioned before, privileged channel requests are +serviced by MERT. The privileged channel is bound to a single mailbox. + +The microcontroller and amdxdna driver use a dedicated user channel per +workload context. The user channel is primarily used for submitting work to +the NPU. As mentioned before, a user channel requests are serviced by an +instance of ERT. Each user channel is bound to its own dedicated mailbox. + +PCIe EP +------- + +NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and= some +MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fab= ric +for reading or writing into host memory. Each instance of ERT gets its own +dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt. + +The number of PCIe BARs varies depending on the specific device. Based on = their +functions, PCIe BARs can generally be categorized into the following types. + +* PSP BAR: Expose the AMD PSP (Platform Security Processor) function +* SMU BAR: Expose the AMD SMU (System Management Unit) function +* SRAM BAR: Expose ring buffers for the mailbox +* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR + registers etc.) +* Public Register BAR: Expose public registers + +On specific devices, the above-mentioned BAR type might be combined into a +single physical PCIe BAR. Or a module might require two physical PCIe BARs= to +be fully functional. For example, + +* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR in= dex 0. +* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe = BAR + index 0. The PSP has some registers in PCIe BAR index 0 (Public Register= BAR) + and PCIe BAR index 4 (PSP BAR). + +Process Isolation Hardware +-------------------------- + +As explained before, XDNA Array can be dynamically divided into isolated +spatial partitions, each of which may have one or more columns. The spatial +partition is setup by programming the column isolation registers by the +microcontroller. Each spatial partition is associated with a PASID which is +also programmed by the microcontroller. Hence multiple spatial partitions = in +the NPU can make concurrent host access protected by PASID. + +The NPU FW itself uses microcontroller MMU enforced isolated contexts for +servicing user and privileged channel requests. + + +Mixed Spatial and Temporal Scheduling +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +AMD XDNA architecture supports mixed spatial and temporal (time sharing) +scheduling of 2D array. This means that spatial partitions may be setup and +torn down dynamically to accommodate various workloads. A *spatial* partit= ion +may be *exclusively* bound to one workload context while another partition= may +be *temporarily* bound to more than one workload contexts. The microcontro= ller +updates the PASID for a temporarily shared partition to match the context = that +has been bound to the partition at any moment. + +Resource Solver +--------------- + +The Resource Solver component of the amdxdna driver manages the allocation +of 2D array among various workloads. Every workload describes the number +of columns required to run the NPU binary in its metadata. The Resource So= lver +component uses hints passed by the workload and its own heuristics to +decide 2D array (re)partition strategy and mapping of workloads for spatia= l and +temporal sharing of columns. The FW enforces the context-to-column(s) reso= urce +binding decisions made by the Resource Solver. + +AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload +contexts. AMD Strix Point can support 16 concurrent workload contexts. + + +Application Binaries +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +A NPU application workload is comprised of two separate binaries which are +generated by the NPU compiler. + +1. AMD XDNA Array overlay, which is used to configure a NPU spatial partit= ion. + The overlay contains instructions for setting up the stream switch + configuration and ELF for the compute tiles. The overlay is loaded on t= he + spatial partition bound to the workload by the associated ERT instance. + Refer to the + `Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more deta= ils. + +2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial + partition. ``ctrlcode`` is executed by the ERT running in protected mod= e on + the microcontroller in the context of the workload. ``ctrlcode`` is mad= e up + of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the + `AI Engine Run Time`_ for more details. + + +Special Host Buffers +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Per-context Instruction Buffer +------------------------------ + +Every workload context uses a host resident 64 MB buffer which is memory +mapped into the ERT instance created to service the workload. The ``ctrlco= de`` +used by the workload is copied into this special memory. This buffer is +protected by PASID like all other input/output buffers used by that worklo= ad. +Instruction buffer is also mapped into the user space of the workload. + +Global Privileged Buffer +------------------------ + +In addition, the driver also allocates a single buffer for maintenance tas= ks +like recording errors from MERT. This global buffer uses the global IOMMU +domain and is only accessible by MERT. + + +High-level Use Flow +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Here are the steps to run a workload on AMD NPU: + +1. Compile the workload into an overlay and a ``ctrlcode`` binary. +2. Userspace opens a context in the driver and provides the overlay. +3. The driver checks with the Resource Solver for provisioning a set of c= olumns + for the workload. +4. The driver then asks MERT to create a context on the device with the d= esired + columns. +5. MERT then creates an instance of ERT. MERT also maps the Instruction B= uffer + into ERT memory. +6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer. +7. Userspace then creates a command buffer with pointers to input, output= , and + instruction buffer; it then submits command buffer with the driver and= goes + to sleep waiting for completion. +8. The driver sends the command over the Mailbox to ERT. +9. ERT *executes* the ``ctrlcode`` in the instruction buffer. +10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR = while + AMD XDNA Array is running. +11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send compl= etion + signal to the driver which then wakes up the waiting workload. + + +Boot Flow +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +amdxdna driver uses PSP to securely load signed NPU FW and kick off the bo= ot +of the NPU microcontroller. amdxdna driver then waits for the alive signal= in +a special location on BAR 0. The NPU is switched off during SoC suspend and +turned on after resume where the NPU FW is reloaded, and the handshake is +performed again. + + +Userspace components +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Compiler +-------- + +Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile +available at: +https://github.com/Xilinx/llvm-aie + +The open-source IREE compiler supports graph compilation of ML models for = AMD +NPU and uses Peano underneath. It is available at: +https://github.com/nod-ai/iree-amd-aie + +Usermode Driver (UMD) +--------------------- + +The open-source XRT runtime stack interfaces with amdxdna kernel driver. X= RT +can be found at: +https://github.com/Xilinx/XRT + +The open-source XRT shim for NPU is can be found at: +https://github.com/amd/xdna-driver + + +DMA Operation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DMA operation instructions are encoded in the ``ctrlcode`` as +``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, D= MA +operations between host DDR and L2 memory are effected. + + +Error Handling +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +When MERT detects an error in AMD XDNA Array, it pauses execution for that +workload context and sends an asynchronous message to the driver over the +privileged channel. The driver then sends a buffer pointer to MERT to capt= ure +the register states for the partition bound to faulting workload context. = The +driver then decodes the error by reading the contents of the buffer pointe= r. + + +Telemetry +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +MERT can report various kinds of telemetry information like the following: + +* L1 interrupt counter +* DMA counter +* Deep Sleep counter +* etc. + + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +- `AMD XDNA Architecture `_ +- `AMD AI Engine Technology `_ +- `Peano `_ +- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) `_ +- `AI Engine Run Time `_ diff --git a/Documentation/accel/amdxdna/index.rst b/Documentation/accel/am= dxdna/index.rst new file mode 100644 index 000000000000..38c16939f1fc --- /dev/null +++ b/Documentation/accel/amdxdna/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + accel/amdxdna NPU driver +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit). + +.. toctree:: + + amdnpu diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index e94a0160b6a0..bc85f26533d8 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -8,6 +8,7 @@ Compute Accelerators :maxdepth: 1 =20 introduction + amdxdna/index qaic/index =20 .. only:: subproject and html --=20 2.34.1