From nobody Wed Nov 27 06:28:33 2024 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2050.outbound.protection.outlook.com [40.107.243.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37E3D1D130E for ; Fri, 11 Oct 2024 23:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.243.50 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688384; cv=fail; b=lPDpD1twYUlxCnvZdaHLE/KdZgJT9Nne7AuAleilzvpop/yea8SfVD/fQjso6AHRzaxgYWD8sOAFrQw40V8V5T0Ff5e2tveF7vCZR6BrJk/WUPdRutQzX1Ely5DudLS5z3YrOSop7hQeuMBcPBEQaQxQnIxKjmUwW3dfEpma/VE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688384; c=relaxed/simple; bh=Pm3sR3D2N/gTBOIthQTJDKkeYEFh+Aq3QC9ZtG1SBpM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ExmUP0QbQxYcPHapuhUwjmOSTi/985lP2qUenwndUlsGypdu8bp/hZmAKr4+V1Rwj5UfpdtDvwLg6qgJcKPszogCrTyQ+3kkNR3Dr4SG9TJGnTGJR8cd4U3+K3e9ojttxkK7ehCjamHhTpfoVZm8PK9/awfnloZEQIrHY1F9A0Q= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=ZSSNJGT/; arc=fail smtp.client-ip=40.107.243.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="ZSSNJGT/" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=bgKRVvSzRHZ7dD2WLNZDbwiKq82EB6DB/eXFkalm4QfvUN5ETu0MTA8Ne2gGzK4xeaAQweifDgIGWlLenELBcNgu/hZ1GAUh20ncbH3QfH+x0Sp12HaU8dIIbtdwVQ8MyGvQR4e1GjmRnaNn1VlbgWwBfiYbYfskYUNsdJWb2+6bZUgVUsb8lTw7McHB76p+NezLx1Cun7y6awtQCpvK0EK5dsoMyNpIBIv3kSOjAilNsfdJiKkgqNa36kEML+XfiAwCS6PxVrr8gRXXrA+ySH8juiKGLUm/qgxSl9FWvKfTMtYy9xhMELsWPO8Ph6CYZo6I+NKR0oLL6edVyPPgMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1uWLwjqXCbH3oziR8h9ByRPoENuO3dcjtLDSWmcJ0SM=; b=HNqfj/g50Tq3Y3LeC1GqybDP5v42kXztPVXMqkbKs5ZRn+ASejp0onkEKkimslKDXXlgnbk+zWXJgFWn6SoFFFEd1F4vPoEpLygfj3xREbfyX7/vhpMsad/Osr55oubeJyLxgYxLeQYT6BrKIUKs/Sm3gRn0y29j0nuFeZBcJoC383lK6fIOPA4M2RJhbvyAS4D2KbBtDX7PSTci0+6oxXZVGfl9qUicVJbLDWpcD3uWPdIG0Cd58B+xe+lM7c6maqs6SM4+h2fTDZ4DCZoETzf8TpTjvTN29NlX3LAlXy5JUNIO9Hvz8Q9TvwM9H2Ausvmc1hyEbGnYKi69KzMc4A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1uWLwjqXCbH3oziR8h9ByRPoENuO3dcjtLDSWmcJ0SM=; b=ZSSNJGT/M+1Ug2yG5NCKTqsamJl9naseb6mMHrpBvUA5jlon8qZ9tY2IR+U+gAdXjWSvC3mvVmnSbWE7UbPKDyNzq2N9Z2Y4t9CTQR/TJxJ+df0BkGNoarxbXOzdOg3MaHOvMv8QKNsCQXa9ayqSJAI8bUrTlBfKIXQv5dOfcP0= Received: from MW4PR03CA0185.namprd03.prod.outlook.com (2603:10b6:303:b8::10) by SA3PR12MB9197.namprd12.prod.outlook.com (2603:10b6:806:39e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 23:12:57 +0000 Received: from CO1PEPF000044FC.namprd21.prod.outlook.com (2603:10b6:303:b8:cafe::75) by MW4PR03CA0185.outlook.office365.com (2603:10b6:303:b8::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.20 via Frontend Transport; Fri, 11 Oct 2024 23:12:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044FC.mail.protection.outlook.com (10.167.241.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:12:57 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:56 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:55 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:12:55 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 01/10] accel/amdxdna: Add documentation for AMD NPU accelerator driver Date: Fri, 11 Oct 2024 16:12:35 -0700 Message-ID: <20241011231244.3182625-2-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044FC:EE_|SA3PR12MB9197:EE_ X-MS-Office365-Filtering-Correlation-Id: 6d204f38-9aa9-4037-932d-08dcea4a3e7b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?PB2/qdejIqK89fGgt0P14RwE8e0qFaUz4mZaWYqmjRtJk3IAQvdrgIFGXUyN?= =?us-ascii?Q?En82Ec/CfY8PAfgybowo/4SXw+l8IX8CdN/0DYzO1kDBsww7ixlyhmqySYPR?= =?us-ascii?Q?HmJUCwTnicEpgn4wdZKuVTTeRok3mbF/ZGyIZ5Lb36aQS1T0c/cYlW9J0KqJ?= =?us-ascii?Q?7ZDN5uFkFG+KEmfs9nHZMV/+4X15EgBKMMJ7p9sB9wH93NcFB6wUoYSE4pDV?= =?us-ascii?Q?z04e/wkYnLdpx13n56NHiwYjfx6Gglw6vhjF2OlrdefX7brdhuzLobfGP47i?= =?us-ascii?Q?UDr8O9VAFt7/ewQSvjUb2wGm/8/kzdi1D/Swc3jxHG2rztn7Dr+6RcE1I9d3?= =?us-ascii?Q?6y9TlcSqdE9s/Q8nSVGy4Xnx5rCVOo//t1NCq1dgUl2ze8EKNUiLTUUeI3xN?= =?us-ascii?Q?y1q+DnktvpjQTmVDMTk78QxCeDU1nV5U0NY0HBYqxRNB12XVsmI6ZBRhaulu?= =?us-ascii?Q?xfbxQdIX0BxX6RA9w3TYbaRC+eAN6aoH2oIFx726Tj4BfXD4SFJ+80Dlj4Wq?= =?us-ascii?Q?j9OtOkfIwqnsL7jtfg97jje+PE9iU8G7J450CbR4xMDD/YpN2DjwADeQhoty?= =?us-ascii?Q?244+8o6VVgOm1SQjz1n75jfh2lKqrtLTm6kV4DoD3TDNeV9vV7aDWcMKOhmn?= =?us-ascii?Q?1fHioJ/a3xVYxBf5vHgGSHWiK27JO+boxTWee71PfdPy03CWsss1Zg3KvDxU?= =?us-ascii?Q?gwj3hkv2QCX84iwOsnd9V2muPvRgkYmyhw3SHfmuNktU9Xl9047kr+uUq+Li?= =?us-ascii?Q?tJxbHOe+RJ7TWt59A7aha6VHCZwQ5SvlyqfMkBPRMn1jpsjbuul/nDZGKrdS?= =?us-ascii?Q?roY6CPTXAIRO64Y0IXkGApiqmZk8jNeM2xNjN918Cm3PqTAo43l2tcAquinI?= =?us-ascii?Q?+jX3KpbwPZPW22+4i8hyzglmYCPk9iz67MctnHJvVv4saUcz34zYqy56N3He?= =?us-ascii?Q?fiMHuvrxmRM/OdeMnQTj6Y5QSk4ib9yfS+GIiT/SQTCbYbO2w8OQ+9/12fEy?= =?us-ascii?Q?9395wF1tQS6qeki6IBC7nvTYkNYDk8V0gc8aFQIlo43B1aQbTjcLDE83GqoH?= =?us-ascii?Q?xhgQzOco+KWWXQj9oyw8vBWIHJmvKLr3cXlu1MF6e94bNIaPhROGD4pK8jRn?= =?us-ascii?Q?orl+FnVZwyfANfdzHKZXwH5Czqn5sNJMlaVsuosAL5LvttIBiKcFyVzA/+3m?= =?us-ascii?Q?nMTSCV5Gr9TSrZ5RQZA5Jio4ECXBBZl5TDX7Afo6ooTaFEICLDKxm8smcGwk?= =?us-ascii?Q?CvMRLJzi1sB2I0447EHzKdj5VbqpPIVbUwVUzzIKzX373cu9QT+awm3zWrGX?= =?us-ascii?Q?dYhBaUEyZRo8Rt2VEHhk3S6DRtwjtp0IA0oVRxIB9mGkCzSEdVk0enFCPQar?= =?us-ascii?Q?32jP/p4=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:12:57.4814 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6d204f38-9aa9-4037-932d-08dcea4a3e7b X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044FC.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB9197 Content-Type: text/plain; charset="utf-8" AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator integrated into AMD client APU. NPU enables efficient execution of Machine Learning applications like CNN, LLM, etc. NPU is based on AMD XDNA Architecture. NPU is managed by amdxdna driver. Co-developed-by: Sonal Santan Signed-off-by: Sonal Santan Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- Documentation/accel/amdxdna/amdnpu.rst | 281 +++++++++++++++++++++++++ Documentation/accel/amdxdna/index.rst | 11 + Documentation/accel/index.rst | 1 + 3 files changed, 293 insertions(+) create mode 100644 Documentation/accel/amdxdna/amdnpu.rst create mode 100644 Documentation/accel/amdxdna/index.rst diff --git a/Documentation/accel/amdxdna/amdnpu.rst b/Documentation/accel/a= mdxdna/amdnpu.rst new file mode 100644 index 000000000000..fbe0a7585345 --- /dev/null +++ b/Documentation/accel/amdxdna/amdnpu.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +.. include:: + +=3D=3D=3D=3D=3D=3D=3D=3D=3D + AMD NPU +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Copyright: |copy| 2024 Advanced Micro Devices, Inc. +:Author: Sonal Santan + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator +integrated into AMD client APU. NPU enables efficient execution of Machine +Learning applications like CNN, LLM, etc. NPU is based on +`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver. + + +Hardware Description +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +AMD NPU consists of the following hardware components: + +AMD XDNA Array +-------------- + +AMD XDNA Array comprises of 2D array of compute and memory tiles built with +`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1 +row of memory tile. Each compute tile contains a VLIW processor with its o= wn +dedicated program and data memory. The memory tile acts as L2 memory. The = 2D +array can be partitioned at a column boundary creating a spatially isolated +partition which can be bound to a workload context. + +Each column also has dedicated DMA engines to move data between host DDR a= nd +memory tile. + +AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 row= s of +compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8 +topology, i.e., 4 rows of compute tiles arranged into 8 columns. + +Shared L2 Memory +---------------- + +The single row of memory tiles create a pool of software managed on chip L2 +memory. DMA engines are used to move data between host DDR and memory tile= s. +AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory. +AMD Strix Point NPU has a total of 4096 KB of L2 memory. + +Microcontroller +--------------- + +A microcontroller runs NPU Firmware which is responsible for command proce= ssing, +XDNA Array partition setup, XDNA Array configuration, workload context +management and workload orchestration. + +NPU Firmware uses a dedicated instance of an isolated non-privileged conte= xt +called ERT to service each workload context. ERT is also used to execute u= ser +provided ``ctrlcode`` associated with the workload context. + +NPU Firmware uses a single isolated privileged context called MERT to serv= ice +management commands from the amdxdna driver. + +Mailboxes +--------- + +The microcontroller and amdxdna driver use a privileged channel for manage= ment +tasks like setting up of contexts, telemetry, query, error handling, setti= ng up +user channel, etc. As mentioned before, privileged channel requests are +serviced by MERT. The privileged channel is bound to a single mailbox. + +The microcontroller and amdxdna driver use a dedicated user channel per +workload context. The user channel is primarily used for submitting work to +the NPU. As mentioned before, a user channel requests are serviced by an +instance of ERT. Each user channel is bound to its own dedicated mailbox. + +PCIe EP +------- + +NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and= some +MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fab= ric +for reading or writing into host memory. Each instance of ERT gets its own +dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt. + +The number of PCIe BARs varies depending on the specific device. Based on = their +functions, PCIe BARs can generally be categorized into the following types. + +* PSP BAR: Expose the AMD PSP (Platform Security Processor) function +* SMU BAR: Expose the AMD SMU (System Management Unit) function +* SRAM BAR: Expose ring buffers for the mailbox +* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR + registers etc.) +* Public Register BAR: Expose public registers + +On specific devices, the above-mentioned BAR type might be combined into a +single physical PCIe BAR. Or a module might require two physical PCIe BARs= to +be fully functional. For example, + +* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR in= dex 0. +* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe = BAR + index 0. The PSP has some registers in PCIe BAR index 0 (Public Register= BAR) + and PCIe BAR index 4 (PSP BAR). + +Process Isolation Hardware +-------------------------- + +As explained before, XDNA Array can be dynamically divided into isolated +spatial partitions, each of which may have one or more columns. The spatial +partition is setup by programming the column isolation registers by the +microcontroller. Each spatial partition is associated with a PASID which is +also programmed by the microcontroller. Hence multiple spatial partitions = in +the NPU can make concurrent host access protected by PASID. + +The NPU FW itself uses microcontroller MMU enforced isolated contexts for +servicing user and privileged channel requests. + + +Mixed Spatial and Temporal Scheduling +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +AMD XDNA architecture supports mixed spatial and temporal (time sharing) +scheduling of 2D array. This means that spatial partitions may be setup and +torn down dynamically to accommodate various workloads. A *spatial* partit= ion +may be *exclusively* bound to one workload context while another partition= may +be *temporarily* bound to more than one workload contexts. The microcontro= ller +updates the PASID for a temporarily shared partition to match the context = that +has been bound to the partition at any moment. + +Resource Solver +--------------- + +The Resource Solver component of the amdxdna driver manages the allocation +of 2D array among various workloads. Every workload describes the number +of columns required to run the NPU binary in its metadata. The Resource So= lver +component uses hints passed by the workload and its own heuristics to +decide 2D array (re)partition strategy and mapping of workloads for spatia= l and +temporal sharing of columns. The FW enforces the context-to-column(s) reso= urce +binding decisions made by the Resource Solver. + +AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload +contexts. AMD Strix Point can support 16 concurrent workload contexts. + + +Application Binaries +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +A NPU application workload is comprised of two separate binaries which are +generated by the NPU compiler. + +1. AMD XDNA Array overlay, which is used to configure a NPU spatial partit= ion. + The overlay contains instructions for setting up the stream switch + configuration and ELF for the compute tiles. The overlay is loaded on t= he + spatial partition bound to the workload by the associated ERT instance. + Refer to the + `Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more deta= ils. + +2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial + partition. ``ctrlcode`` is executed by the ERT running in protected mod= e on + the microcontroller in the context of the workload. ``ctrlcode`` is mad= e up + of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the + `AI Engine Run Time`_ for more details. + + +Special Host Buffers +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Per-context Instruction Buffer +------------------------------ + +Every workload context uses a host resident 64 MB buffer which is memory +mapped into the ERT instance created to service the workload. The ``ctrlco= de`` +used by the workload is copied into this special memory. This buffer is +protected by PASID like all other input/output buffers used by that worklo= ad. +Instruction buffer is also mapped into the user space of the workload. + +Global Privileged Buffer +------------------------ + +In addition, the driver also allocates a single buffer for maintenance tas= ks +like recording errors from MERT. This global buffer uses the global IOMMU +domain and is only accessible by MERT. + + +High-level Use Flow +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Here are the steps to run a workload on AMD NPU: + +1. Compile the workload into an overlay and a ``ctrlcode`` binary. +2. Userspace opens a context in the driver and provides the overlay. +3. The driver checks with the Resource Solver for provisioning a set of c= olumns + for the workload. +4. The driver then asks MERT to create a context on the device with the d= esired + columns. +5. MERT then creates an instance of ERT. MERT also maps the Instruction B= uffer + into ERT memory. +6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer. +7. Userspace then creates a command buffer with pointers to input, output= , and + instruction buffer; it then submits command buffer with the driver and= goes + to sleep waiting for completion. +8. The driver sends the command over the Mailbox to ERT. +9. ERT *executes* the ``ctrlcode`` in the instruction buffer. +10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR = while + AMD XDNA Array is running. +11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send compl= etion + signal to the driver which then wakes up the waiting workload. + + +Boot Flow +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +amdxdna driver uses PSP to securely load signed NPU FW and kick off the bo= ot +of the NPU microcontroller. amdxdna driver then waits for the alive signal= in +a special location on BAR 0. The NPU is switched off during SoC suspend and +turned on after resume where the NPU FW is reloaded, and the handshake is +performed again. + + +Userspace components +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Compiler +-------- + +Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile +available at: +https://github.com/Xilinx/llvm-aie + +The open-source IREE compiler supports graph compilation of ML models for = AMD +NPU and uses Peano underneath. It is available at: +https://github.com/nod-ai/iree-amd-aie + +Usermode Driver (UMD) +--------------------- + +The open-source XRT runtime stack interfaces with amdxdna kernel driver. X= RT +can be found at: +https://github.com/Xilinx/XRT + +The open-source XRT shim for NPU is can be found at: +https://github.com/amd/xdna-driver + + +DMA Operation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +DMA operation instructions are encoded in the ``ctrlcode`` as +``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, D= MA +operations between host DDR and L2 memory are effected. + + +Error Handling +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +When MERT detects an error in AMD XDNA Array, it pauses execution for that +workload context and sends an asynchronous message to the driver over the +privileged channel. The driver then sends a buffer pointer to MERT to capt= ure +the register states for the partition bound to faulting workload context. = The +driver then decodes the error by reading the contents of the buffer pointe= r. + + +Telemetry +=3D=3D=3D=3D=3D=3D=3D=3D=3D + +MERT can report various kinds of telemetry information like the following: + +* L1 interrupt counter +* DMA counter +* Deep Sleep counter +* etc. + + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +- `AMD XDNA Architecture `_ +- `AMD AI Engine Technology `_ +- `Peano `_ +- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) `_ +- `AI Engine Run Time `_ diff --git a/Documentation/accel/amdxdna/index.rst b/Documentation/accel/am= dxdna/index.rst new file mode 100644 index 000000000000..38c16939f1fc --- /dev/null +++ b/Documentation/accel/amdxdna/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + accel/amdxdna NPU driver +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit). + +.. toctree:: + + amdnpu diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index e94a0160b6a0..bc85f26533d8 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -8,6 +8,7 @@ Compute Accelerators :maxdepth: 1 =20 introduction + amdxdna/index qaic/index =20 .. only:: subproject and html --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2069.outbound.protection.outlook.com [40.107.244.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E0A21E284B for ; Fri, 11 Oct 2024 23:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.69 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688389; cv=fail; b=kNbkFBb1YRO+Cg93RUS+lzOjxTBu2PHJULVCr/jx0WsNK7oJFPV1Rnpf5j0GkBmlAPfXpH4QJ6nguZjqFHya4dyBaRUOVrE6rkwBX+kzJR3me9ckJ1u4QSB58MhveS/fFmcci0aO50g9SCXOqgel1Uc8kcPf3JzhfIre4FQlt0Q= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688389; c=relaxed/simple; bh=Hs3sUix82UL04Ia3mMO3Zr89LDyKyhWnDUQGpmiRfN8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=C++9w1XgUX18OilZFBjExZKKWSexp87BpLAqiWXSrg6/g/dOfTSDkiTfAceuRX8LGD9l6qYfuhX0wIkwCq0l3Z86uoyV90Gz9mH7sfSP6pnRK7f9nxujkTH8aZJLtbGWOVZwZZxfjAU8Du+NEpTq75Sr/scDzK2iDojH4Am6JUo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=XfeB6jJY; arc=fail smtp.client-ip=40.107.244.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="XfeB6jJY" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=FHWVcyW2BIXKAw58HH1VDi1aPeneoeCjvQ2geHcZ84Zrt8IM076yHIg5mkUi/q1b7ch6OV4GvQUsm8BCMSK4Igabtc1ajWl+21uosv0KdFDY2npLgFfjWQO81LMc3jHKttyKo5n4qUiNdk8KhF1lBxs/bUd/B9ZLC6h0NC6SbFcJx3bTZoar4HGP7M9flDxWCw/oRdViGjj3BojfabUOB010MSOUZ8sUJH0qQWvX07GUG74x2FBo9apVS0m5xgD7vpxpmSbLHrVCtWgp+UvLUChN5wBpDBcNYPk2BsgbI/eXgA2JbtiR5OUaI1Pyc0qdwgKPux6ALWYPxjyhOEHjyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oQPyxFnomsXFs4b32theV+F2CpAA/up3W2xuM5AAWZk=; b=VeuK3Pb7tCWAPyosYbNyhjfIHl2+2E6G4lJnhGke3xcWz5U0Gzxv1cX4eoa93xH0u/MZ3H4KOPUDWcDqDiu7CALBpEOTAFFE9JP8R6gWPM3md7sLkDVekutKRIrbr+JGBi7Ouee6nY5m9W17bfqHaFTYOCnCF6JYBj8htY5AzxQRnmm5R8SBNbTE/5k7BxocUnndmP/tRQUKMwv3aR41Mgj79HVeoZo1WAGcUqBl4drgM8JwD3rjcqGyrlzn1F0bXa+PteLoRv2GgKS/pDLDtNfyA2jWz5wtZ97XYUvCurc1R0Ut15I186sTLfD4voi3I1UqZAa4aO6Yyo+WqVKEhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oQPyxFnomsXFs4b32theV+F2CpAA/up3W2xuM5AAWZk=; b=XfeB6jJY46Egjpmve0KM4eGXAcxz3PeznF6kdXNgVIQ0ayx5bOf8oeSnw40xAPccgIbPsk53pUranIDkNEmftZ42Hf3bcxN7fHuyiFoIpD53oeL687of39nJvFV3Z3YqAxytKzX4Xrs2vM68xKxFNlRvOL2vt08c7iSZlnTtSP8= Received: from BYAPR05CA0076.namprd05.prod.outlook.com (2603:10b6:a03:e0::17) by SJ2PR12MB9139.namprd12.prod.outlook.com (2603:10b6:a03:564::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.20; Fri, 11 Oct 2024 23:12:58 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::af) by BYAPR05CA0076.outlook.office365.com (2603:10b6:a03:e0::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.9 via Frontend Transport; Fri, 11 Oct 2024 23:12:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:12:57 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:57 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:12:56 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , , Narendra Gutta , George Yang Subject: [PATCH V4 02/10] accel/amdxdna: Add a new driver for AMD AI Engine Date: Fri, 11 Oct 2024 16:12:36 -0700 Message-ID: <20241011231244.3182625-3-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: None (SATLEXMB03.amd.com: lizhi.hou@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|SJ2PR12MB9139:EE_ X-MS-Office365-Filtering-Correlation-Id: 78e1e752-c2bb-4547-3938-08dcea4a3eab X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|82310400026|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?AJWcRd6hX14AQTvUfO6JUJPgfa4/ZB7jfaDIATiYwMA8Lw90fqIikUg+pIII?= =?us-ascii?Q?WFEWfRpAmdJIURvCbdxQE24POMZhVICqAjpHaZdZYHtCu51kDKr6RZHGnHLF?= =?us-ascii?Q?Vb6lQ0cc8Dav2fxVM1OCg4C2EuG/vfsx+YWl5eAeSlo5fn6WLxuII8+Sl1WO?= =?us-ascii?Q?SVslOobfFoHedMiWVL/AYA1A+YKa8p+qjbvqnkCDIjO9e2kGqrsM0uug9wYm?= =?us-ascii?Q?ABGfK8SIAhIh903JbXavF+1ObkpFHHm9vmGsoeGaKQS/nPfDnHlTVl2c97dR?= =?us-ascii?Q?qpY0WriKIyclVhQl4hdB7YHLG4mQ2I0Y4rFwkR5j3mLksorU4w2VB6pC4eIJ?= =?us-ascii?Q?67sUliwtYnS8xqNN3JU6Bk1ZDTY1owqI8a8BKHRoUQ+5bbs2V+hmHMzBYPQK?= =?us-ascii?Q?mK6eWsWllIqscSMq/3nc+wLs2Fc/7tNhzGfjOsrLLgz2j/rfeQiBbrL77nXC?= =?us-ascii?Q?B33uXgQOnSAmDdUs2bI2yg1I5IsfbiY4/txtn7BwwP+G+Sfi3qT9+FTQ9FD+?= =?us-ascii?Q?2eM9U+JvnrgPV2R1l91XHzv/7D7C1Ni50LRRtCo5nQVg017davQFIHLCSQyi?= =?us-ascii?Q?Fzv6CAmhmCcAP8LydInAqa4+FQT2mOjA63nrLB56Yw7LrjOxp+L8er6Irt6b?= =?us-ascii?Q?QmUk1UUbDqyt7IztqJGd7NkiNscZJAbYpngM4sIN+Oz8xdHfcTPMCRYcL77v?= =?us-ascii?Q?w/Fhn8p64t+Ew3Cvd//UG+efWF/j1d+DW6DlqAUp0h9VLTY+HWk2exv3pCUg?= =?us-ascii?Q?kmZ40x0oAwxXClNSyRfQAFQ2rUkMrebSlpLQJfCkGlx87gvsUG5GP9lIVpLI?= =?us-ascii?Q?eod62gkeQoETzFfJFAj9j3zEBLzVPbTtv4KBHuQcJDFp6U1ximFjYpd2XsDW?= =?us-ascii?Q?37dCSVrxwREMmbOLCBQdvPtSFU5sXOYQIHMlivaLRO6KpL9nNnndc9NzBlxb?= =?us-ascii?Q?fyOck1K9rEKDa2PRGGNU4LfwMrsK5Y4yEuNidTF9QJUIThZ4DnzrZ4QRekd9?= =?us-ascii?Q?IKyUNHvTCAwAk/zmHtkufTRF5zoLxf6uK742uU4RJZ1mmErkJh/FCAaH5+we?= =?us-ascii?Q?0Re65WrTj7KocHgaJmeB8shyvw4Vpu2OeTj+8E3JymXzp/xStNGqYPx/8lKp?= =?us-ascii?Q?fQCfqM4keFVMCpMj2tr/cUUSpULl4Bnvjio9griTjVu8d2hRAoPUKrKLtIwH?= =?us-ascii?Q?iMk168c/Feqkx6ukEonGfNXBlhduzhr8NCESYc29fnPyDY7pkBYkRWozwG+o?= =?us-ascii?Q?Ko6agREW2/tLuyBLEqxe6NujsG7lxUQsSiBoaAWZ9Q=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(1800799024)(82310400026)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:12:57.7781 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 78e1e752-c2bb-4547-3938-08dcea4a3eab X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB9139 Content-Type: text/plain; charset="utf-8" AMD AI Engine forms the core of AMD NPU and can be used for accelerating machine learning applications. Add the driver to support AI Engine integrated to AMD CPU. Only very basic functionalities are added. - module and PCI device initialization - firmware load - power up - low level hardware initialization Co-developed-by: Narendra Gutta Signed-off-by: Narendra Gutta Co-developed-by: George Yang Signed-off-by: George Yang Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- MAINTAINERS | 9 ++ drivers/accel/Kconfig | 1 + drivers/accel/Makefile | 1 + drivers/accel/amdxdna/Kconfig | 15 ++ drivers/accel/amdxdna/Makefile | 13 ++ drivers/accel/amdxdna/TODO | 5 + drivers/accel/amdxdna/aie2_pci.c | 184 ++++++++++++++++++++++++ drivers/accel/amdxdna/aie2_pci.h | 130 +++++++++++++++++ drivers/accel/amdxdna/aie2_psp.c | 141 ++++++++++++++++++ drivers/accel/amdxdna/aie2_smu.c | 117 +++++++++++++++ drivers/accel/amdxdna/amdxdna_pci_drv.c | 128 +++++++++++++++++ drivers/accel/amdxdna/amdxdna_pci_drv.h | 76 ++++++++++ drivers/accel/amdxdna/amdxdna_sysfs.c | 51 +++++++ drivers/accel/amdxdna/npu1_regs.c | 99 +++++++++++++ drivers/accel/amdxdna/npu2_regs.c | 116 +++++++++++++++ drivers/accel/amdxdna/npu4_regs.c | 116 +++++++++++++++ drivers/accel/amdxdna/npu5_regs.c | 116 +++++++++++++++ include/uapi/drm/amdxdna_accel.h | 24 ++++ 18 files changed, 1342 insertions(+) create mode 100644 drivers/accel/amdxdna/Kconfig create mode 100644 drivers/accel/amdxdna/Makefile create mode 100644 drivers/accel/amdxdna/TODO create mode 100644 drivers/accel/amdxdna/aie2_pci.c create mode 100644 drivers/accel/amdxdna/aie2_pci.h create mode 100644 drivers/accel/amdxdna/aie2_psp.c create mode 100644 drivers/accel/amdxdna/aie2_smu.c create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.c create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.h create mode 100644 drivers/accel/amdxdna/amdxdna_sysfs.c create mode 100644 drivers/accel/amdxdna/npu1_regs.c create mode 100644 drivers/accel/amdxdna/npu2_regs.c create mode 100644 drivers/accel/amdxdna/npu4_regs.c create mode 100644 drivers/accel/amdxdna/npu5_regs.c create mode 100644 include/uapi/drm/amdxdna_accel.h diff --git a/MAINTAINERS b/MAINTAINERS index 7ad507f49324..997cbcad8e3e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1185,6 +1185,15 @@ M: Sanjay R Mehta S: Maintained F: drivers/spi/spi-amd.c =20 +AMD XDNA DRIVER +M: Min Ma +M: Lizhi Hou +L: dri-devel@lists.freedesktop.org +S: Supported +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git +F: drivers/accel/amdxdna/ +F: include/uapi/drm/amdxdna_accel.h + AMD XGBE DRIVER M: "Shyam Sundar S K" L: netdev@vger.kernel.org diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig index 64065fb8922b..5b9490367a39 100644 --- a/drivers/accel/Kconfig +++ b/drivers/accel/Kconfig @@ -24,6 +24,7 @@ menuconfig DRM_ACCEL different device files, called accel/accel* (in /dev, sysfs and debugfs). =20 +source "drivers/accel/amdxdna/Kconfig" source "drivers/accel/habanalabs/Kconfig" source "drivers/accel/ivpu/Kconfig" source "drivers/accel/qaic/Kconfig" diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile index ab3df932937f..a301fb6089d4 100644 --- a/drivers/accel/Makefile +++ b/drivers/accel/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only =20 +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) +=3D amdxdna/ obj-$(CONFIG_DRM_ACCEL_HABANALABS) +=3D habanalabs/ obj-$(CONFIG_DRM_ACCEL_IVPU) +=3D ivpu/ obj-$(CONFIG_DRM_ACCEL_QAIC) +=3D qaic/ diff --git a/drivers/accel/amdxdna/Kconfig b/drivers/accel/amdxdna/Kconfig new file mode 100644 index 000000000000..b6f4c364dd6b --- /dev/null +++ b/drivers/accel/amdxdna/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config DRM_ACCEL_AMDXDNA + tristate "AMD AI Engine" + depends on AMD_IOMMU + depends on DRM_ACCEL + depends on PCI && HAS_IOMEM + depends on X86_64 + select FW_LOADER + help + Choose this option to enable support for NPU integrated into AMD + client CPUs like AMD Ryzen AI 300 Series. AMD NPU can be used to + accelerate machine learning applications. + + If "M" is selected, the driver module will be amdxdna. diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile new file mode 100644 index 000000000000..1dee0cba8390 --- /dev/null +++ b/drivers/accel/amdxdna/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0-only + +amdxdna-y :=3D \ + aie2_pci.o \ + aie2_psp.o \ + aie2_smu.o \ + amdxdna_pci_drv.o \ + amdxdna_sysfs.o \ + npu1_regs.o \ + npu2_regs.o \ + npu4_regs.o \ + npu5_regs.o +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) =3D amdxdna.o diff --git a/drivers/accel/amdxdna/TODO b/drivers/accel/amdxdna/TODO new file mode 100644 index 000000000000..a130259f5f70 --- /dev/null +++ b/drivers/accel/amdxdna/TODO @@ -0,0 +1,5 @@ +- Replace idr with xa +- Add import and export BO support +- Add debugfs support +- Add debug BO support +- Improve power management diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c new file mode 100644 index 000000000000..f36549293053 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -0,0 +1,184 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +static void aie2_hw_stop(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev =3D xdna->dev_handle; + + aie2_psp_stop(ndev->psp_hdl); + aie2_smu_fini(ndev); + pci_disable_device(pdev); +} + +static int aie2_hw_start(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev =3D xdna->dev_handle; + int ret; + + ret =3D pci_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "failed to enable device, ret %d", ret); + return ret; + } + pci_set_master(pdev); + + ret =3D aie2_smu_init(ndev); + if (ret) { + XDNA_ERR(xdna, "failed to init smu, ret %d", ret); + goto disable_dev; + } + + ret =3D aie2_psp_start(ndev->psp_hdl); + if (ret) { + XDNA_ERR(xdna, "failed to start psp, ret %d", ret); + goto fini_smu; + } + + return 0; + +fini_smu: + aie2_smu_fini(ndev); +disable_dev: + pci_disable_device(pdev); + + return ret; +} + +static int aie2_init(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev; + struct psp_config psp_conf; + const struct firmware *fw; + void __iomem * const *tbl; + int i, bars, nvec, ret; + + ndev =3D drmm_kzalloc(&xdna->ddev, sizeof(*ndev), GFP_KERNEL); + if (!ndev) + return -ENOMEM; + + ndev->priv =3D xdna->dev_info->dev_priv; + ndev->xdna =3D xdna; + + ret =3D request_firmware(&fw, ndev->priv->fw_path, &pdev->dev); + if (ret) { + XDNA_ERR(xdna, "failed to request_firmware %s, ret %d", + ndev->priv->fw_path, ret); + return ret; + } + + ret =3D pcim_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "pcim enable device failed, ret %d", ret); + goto release_fw; + } + + bars =3D pci_select_bars(pdev, IORESOURCE_MEM); + for (i =3D 0; i < PSP_MAX_REGS; i++) { + if (!(BIT(PSP_REG_BAR(ndev, i)) && bars)) { + XDNA_ERR(xdna, "does not get pci bar%d", + PSP_REG_BAR(ndev, i)); + ret =3D -EINVAL; + goto release_fw; + } + } + + ret =3D pcim_iomap_regions(pdev, bars, "amdxdna-npu"); + if (ret) { + XDNA_ERR(xdna, "map regions failed, ret %d", ret); + goto release_fw; + } + + tbl =3D pcim_iomap_table(pdev); + if (!tbl) { + XDNA_ERR(xdna, "Cannot get iomap table"); + ret =3D -ENOMEM; + goto release_fw; + } + ndev->sram_base =3D tbl[xdna->dev_info->sram_bar]; + ndev->smu_base =3D tbl[xdna->dev_info->smu_bar]; + + ret =3D dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); + if (ret) { + XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret); + goto release_fw; + } + + nvec =3D pci_msix_vec_count(pdev); + if (nvec <=3D 0) { + XDNA_ERR(xdna, "does not get number of interrupt vector"); + ret =3D -EINVAL; + goto release_fw; + } + + ret =3D pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_MSIX); + if (ret < 0) { + XDNA_ERR(xdna, "failed to alloc irq vectors, ret %d", ret); + goto release_fw; + } + + ret =3D iommu_dev_enable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + if (ret) { + XDNA_ERR(xdna, "Enable PASID failed, ret %d", ret); + goto free_irq; + } + + psp_conf.fw_size =3D fw->size; + psp_conf.fw_buf =3D fw->data; + for (i =3D 0; i < PSP_MAX_REGS; i++) + psp_conf.psp_regs[i] =3D tbl[PSP_REG_BAR(ndev, i)] + PSP_REG_OFF(ndev, i= ); + ndev->psp_hdl =3D aie2m_psp_create(&xdna->ddev, &psp_conf); + if (!ndev->psp_hdl) { + XDNA_ERR(xdna, "failed to create psp"); + ret =3D -ENOMEM; + goto disable_sva; + } + xdna->dev_handle =3D ndev; + + ret =3D aie2_hw_start(xdna); + if (ret) { + XDNA_ERR(xdna, "start npu failed, ret %d", ret); + goto disable_sva; + } + + release_firmware(fw); + return 0; + +disable_sva: + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); +free_irq: + pci_free_irq_vectors(pdev); +release_fw: + release_firmware(fw); + + return ret; +} + +static void aie2_fini(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + + aie2_hw_stop(xdna); + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + pci_free_irq_vectors(pdev); +} + +const struct amdxdna_dev_ops aie2_ops =3D { + .init =3D aie2_init, + .fini =3D aie2_fini, +}; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h new file mode 100644 index 000000000000..34f344b4b662 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -0,0 +1,130 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_PCI_H_ +#define _AIE2_PCI_H_ + +#define AIE2_INTERVAL 20000 /* us */ +#define AIE2_TIMEOUT 1000000 /* us */ + +/* Firmware determines device memory base address and size */ +#define AIE2_DEVM_BASE 0x4000000 +#define AIE2_DEVM_SIZE SZ_64M + +#define NDEV2PDEV(ndev) (to_pci_dev((ndev)->xdna->ddev.dev)) + +#define AIE2_SRAM_OFF(ndev, addr) ((addr) - (ndev)->priv->sram_dev_addr) +#define AIE2_MBOX_OFF(ndev, addr) ((addr) - (ndev)->priv->mbox_dev_addr) + +#define PSP_REG_BAR(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].bar_idx) +#define PSP_REG_OFF(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].offset) +#define SRAM_REG_OFF(ndev, idx) ((ndev)->priv->sram_offs[(idx)].offset) + +#define SMU_REG(ndev, idx) \ +({ \ + typeof(ndev) _ndev =3D ndev; \ + ((_ndev)->smu_base + (_ndev)->priv->smu_regs_off[(idx)].offset); \ +}) +#define SRAM_GET_ADDR(ndev, idx) \ +({ \ + typeof(ndev) _ndev =3D ndev; \ + ((_ndev)->sram_base + SRAM_REG_OFF((_ndev), (idx))); \ +}) + +#define SMU_MPNPUCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_mpnpuclk_freq_max) +#define SMU_HCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_hclk_freq_max) + +enum aie2_smu_reg_idx { + SMU_CMD_REG =3D 0, + SMU_ARG_REG, + SMU_INTR_REG, + SMU_RESP_REG, + SMU_OUT_REG, + SMU_MAX_REGS /* Keep this at the end */ +}; + +enum aie2_sram_reg_idx { + MBOX_CHANN_OFF =3D 0, + FW_ALIVE_OFF, + SRAM_MAX_INDEX /* Keep this at the end */ +}; + +enum psp_reg_idx { + PSP_CMD_REG =3D 0, + PSP_ARG0_REG, + PSP_ARG1_REG, + PSP_ARG2_REG, + PSP_NUM_IN_REGS, /* number of input registers */ + PSP_INTR_REG =3D PSP_NUM_IN_REGS, + PSP_STATUS_REG, + PSP_RESP_REG, + PSP_MAX_REGS /* Keep this at the end */ +}; + +struct psp_config { + const void *fw_buf; + u32 fw_size; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +struct clock_entry { + char name[16]; + u32 freq_mhz; +}; + +struct rt_config { + u32 type; + u32 value; +}; + +struct amdxdna_dev_hdl { + struct amdxdna_dev *xdna; + const struct amdxdna_dev_priv *priv; + void __iomem *sram_base; + void __iomem *smu_base; + struct psp_device *psp_hdl; + struct clock_entry mp_npu_clock; + struct clock_entry h_clock; +}; + +#define DEFINE_BAR_OFFSET(reg_name, bar, reg_addr) \ + [reg_name] =3D {bar##_BAR_INDEX, (reg_addr) - bar##_BAR_BASE} + +struct aie2_bar_off_pair { + int bar_idx; + u32 offset; +}; + +struct amdxdna_dev_priv { + const char *fw_path; + u64 protocol_major; + u64 protocol_minor; + struct rt_config rt_config; +#define COL_ALIGN_NONE 0 +#define COL_ALIGN_NATURE 1 + u32 col_align; + u32 mbox_dev_addr; + /* If mbox_size is 0, use BAR size. See MBOX_SIZE macro */ + u32 mbox_size; + u32 sram_dev_addr; + struct aie2_bar_off_pair sram_offs[SRAM_MAX_INDEX]; + struct aie2_bar_off_pair psp_regs_off[PSP_MAX_REGS]; + struct aie2_bar_off_pair smu_regs_off[SMU_MAX_REGS]; + u32 smu_mpnpuclk_freq_max; + u32 smu_hclk_freq_max; +}; + +extern const struct amdxdna_dev_ops aie2_ops; + +/* aie2_smu.c */ +int aie2_smu_init(struct amdxdna_dev_hdl *ndev); +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev); + +/* aie2_psp.c */ +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_co= nfig *conf); +int aie2_psp_start(struct psp_device *psp); +void aie2_psp_stop(struct psp_device *psp); + +#endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_= psp.c new file mode 100644 index 000000000000..c87ca322e206 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -0,0 +1,141 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" + +#define PSP_STATUS_READY BIT(31) + +/* PSP commands */ +#define PSP_VALIDATE 1 +#define PSP_START 2 +#define PSP_RELEASE_TMR 3 + +/* PSP special arguments */ +#define PSP_START_COPY_FW 1 + +/* PSP response error code */ +#define PSP_ERROR_CANCEL 0xFFFF0002 +#define PSP_ERROR_BAD_STATE 0xFFFF0007 + +#define PSP_FW_ALIGN 0x10000 +#define PSP_POLL_INTERVAL 20000 /* us */ +#define PSP_POLL_TIMEOUT 1000000 /* us */ + +#define PSP_REG(p, reg) ((p)->psp_regs[reg]) + +struct psp_device { + struct drm_device *ddev; + struct psp_config conf; + u32 fw_buf_sz; + u64 fw_paddr; + void *fw_buffer; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +static int psp_exec(struct psp_device *psp, u32 *reg_vals) +{ + u32 resp_code; + int ret, i; + u32 ready; + + /* Write command and argument registers */ + for (i =3D 0; i < PSP_NUM_IN_REGS; i++) + writel(reg_vals[i], PSP_REG(psp, i)); + + /* clear and set PSP INTR register to kick off */ + writel(0, PSP_REG(psp, PSP_INTR_REG)); + writel(1, PSP_REG(psp, PSP_INTR_REG)); + + /* PSP should be busy. Wait for ready, so we know task is done. */ + ret =3D readx_poll_timeout(readl, PSP_REG(psp, PSP_STATUS_REG), ready, + FIELD_GET(PSP_STATUS_READY, ready), + PSP_POLL_INTERVAL, PSP_POLL_TIMEOUT); + if (ret) { + drm_err(psp->ddev, "PSP is not ready, ret 0x%x", ret); + return ret; + } + + resp_code =3D readl(PSP_REG(psp, PSP_RESP_REG)); + if (resp_code) { + drm_err(psp->ddev, "fw return error 0x%x", resp_code); + return -EIO; + } + + return 0; +} + +void aie2_psp_stop(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS] =3D { PSP_RELEASE_TMR, }; + int ret; + + ret =3D psp_exec(psp, reg_vals); + if (ret) + drm_err(psp->ddev, "release tmr failed, ret %d", ret); +} + +int aie2_psp_start(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS]; + int ret; + + reg_vals[0] =3D PSP_VALIDATE; + reg_vals[1] =3D lower_32_bits(psp->fw_paddr); + reg_vals[2] =3D upper_32_bits(psp->fw_paddr); + reg_vals[3] =3D psp->fw_buf_sz; + + ret =3D psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to validate fw, ret %d", ret); + return ret; + } + + memset(reg_vals, 0, sizeof(reg_vals)); + reg_vals[0] =3D PSP_START; + reg_vals[1] =3D PSP_START_COPY_FW; + ret =3D psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to start fw, ret %d", ret); + return ret; + } + + return 0; +} + +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_co= nfig *conf) +{ + struct psp_device *psp; + u64 offset; + + psp =3D drmm_kzalloc(ddev, sizeof(*psp), GFP_KERNEL); + if (!psp) + return NULL; + + psp->ddev =3D ddev; + memcpy(psp->psp_regs, conf->psp_regs, sizeof(psp->psp_regs)); + + psp->fw_buf_sz =3D ALIGN(conf->fw_size, PSP_FW_ALIGN) + PSP_FW_ALIGN; + psp->fw_buffer =3D drmm_kmalloc(ddev, psp->fw_buf_sz, GFP_KERNEL); + if (!psp->fw_buffer) { + drm_err(ddev, "no memory for fw buffer"); + return NULL; + } + + /* + * AMD Platform Security Processor(PSP) requires host physical + * address to load NPU firmware. + */ + psp->fw_paddr =3D virt_to_phys(psp->fw_buffer); + offset =3D ALIGN(psp->fw_paddr, PSP_FW_ALIGN) - psp->fw_paddr; + psp->fw_paddr +=3D offset; + memcpy(psp->fw_buffer + offset, conf->fw_buf, conf->fw_size); + + return psp; +} diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_= smu.c new file mode 100644 index 000000000000..3fa7064649aa --- /dev/null +++ b/drivers/accel/amdxdna/aie2_smu.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +#define SMU_RESULT_OK 1 + +/* SMU commands */ +#define AIE2_SMU_POWER_ON 0x3 +#define AIE2_SMU_POWER_OFF 0x4 +#define AIE2_SMU_SET_MPNPUCLK_FREQ 0x5 +#define AIE2_SMU_SET_HCLK_FREQ 0x6 + +static int aie2_smu_exec(struct amdxdna_dev_hdl *ndev, u32 reg_cmd, u32 re= g_arg) +{ + u32 resp; + int ret; + + writel(0, SMU_REG(ndev, SMU_RESP_REG)); + writel(reg_arg, SMU_REG(ndev, SMU_ARG_REG)); + writel(reg_cmd, SMU_REG(ndev, SMU_CMD_REG)); + + /* Clear and set SMU_INTR_REG to kick off */ + writel(0, SMU_REG(ndev, SMU_INTR_REG)); + writel(1, SMU_REG(ndev, SMU_INTR_REG)); + + ret =3D readx_poll_timeout(readl, SMU_REG(ndev, SMU_RESP_REG), resp, + resp, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret) { + XDNA_ERR(ndev->xdna, "smu cmd %d timed out", reg_cmd); + return ret; + } + + if (resp !=3D SMU_RESULT_OK) { + XDNA_ERR(ndev->xdna, "smu cmd %d failed, 0x%x", reg_cmd, resp); + return -EINVAL; + } + + return 0; +} + +static int aie2_smu_set_mpnpu_clock_freq(struct amdxdna_dev_hdl *ndev, u32= freq_mhz) +{ + int ret; + + if (!freq_mhz || freq_mhz > SMU_MPNPUCLK_FREQ_MAX(ndev)) { + XDNA_ERR(ndev->xdna, "invalid mpnpu clock freq %d", freq_mhz); + return -EINVAL; + } + + ndev->mp_npu_clock.freq_mhz =3D freq_mhz; + ret =3D aie2_smu_exec(ndev, AIE2_SMU_SET_MPNPUCLK_FREQ, freq_mhz); + if (!ret) + XDNA_INFO_ONCE(ndev->xdna, "set mpnpu_clock =3D %d mhz", freq_mhz); + + return ret; +} + +static int aie2_smu_set_hclock_freq(struct amdxdna_dev_hdl *ndev, u32 freq= _mhz) +{ + int ret; + + if (!freq_mhz || freq_mhz > SMU_HCLK_FREQ_MAX(ndev)) { + XDNA_ERR(ndev->xdna, "invalid hclock freq %d", freq_mhz); + return -EINVAL; + } + + ndev->h_clock.freq_mhz =3D freq_mhz; + ret =3D aie2_smu_exec(ndev, AIE2_SMU_SET_HCLK_FREQ, freq_mhz); + if (!ret) + XDNA_INFO_ONCE(ndev->xdna, "set npu_hclock =3D %d mhz", freq_mhz); + + return ret; +} + +int aie2_smu_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret =3D aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0); + if (ret) { + XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret); + return ret; + } + + ret =3D aie2_smu_set_mpnpu_clock_freq(ndev, SMU_MPNPUCLK_FREQ_MAX(ndev)); + if (ret) { + XDNA_ERR(ndev->xdna, "Set mpnpu clk freq failed, ret %d", ret); + return ret; + } + snprintf(ndev->mp_npu_clock.name, sizeof(ndev->mp_npu_clock.name), "MP-NP= U Clock"); + + ret =3D aie2_smu_set_hclock_freq(ndev, SMU_HCLK_FREQ_MAX(ndev)); + if (ret) { + XDNA_ERR(ndev->xdna, "Set hclk freq failed, ret %d", ret); + return ret; + } + snprintf(ndev->h_clock.name, sizeof(ndev->h_clock.name), "H Clock"); + + return 0; +} + +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret =3D aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0); + if (ret) + XDNA_ERR(ndev->xdna, "Power off failed, ret %d", ret); +} diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c new file mode 100644 index 000000000000..7a5945854e26 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include + +#include "amdxdna_pci_drv.h" + +/* + * Bind the driver base on (vendor_id, device_id) pair and later use the + * (device_id, rev_id) pair as a key to select the devices. The devices wi= th + * same device_id have very similar interface to host driver. + */ +static const struct pci_device_id pci_ids[] =3D { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x1502) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x17f0) }, + {0} +}; + +MODULE_DEVICE_TABLE(pci, pci_ids); + +static const struct amdxdna_device_id amdxdna_ids[] =3D { + { 0x1502, 0x0, &dev_npu1_info }, + { 0x17f0, 0x0, &dev_npu2_info }, + { 0x17f0, 0x10, &dev_npu4_info }, + { 0x17f0, 0x11, &dev_npu5_info }, + {0} +}; + +DEFINE_DRM_ACCEL_FOPS(amdxdna_fops); + +const struct drm_driver amdxdna_drm_drv =3D { + .driver_features =3D DRIVER_GEM | DRIVER_COMPUTE_ACCEL, + .fops =3D &amdxdna_fops, + .name =3D "amdxdna_accel_driver", + .desc =3D "AMD XDNA DRM implementation", +}; + +static const struct amdxdna_dev_info * +amdxdna_get_dev_info(struct pci_dev *pdev) +{ + int i; + + for (i =3D 0; i < ARRAY_SIZE(amdxdna_ids); i++) { + if (pdev->device =3D=3D amdxdna_ids[i].device && + pdev->revision =3D=3D amdxdna_ids[i].revision) + return amdxdna_ids[i].dev_info; + } + return NULL; +} + +static int amdxdna_probe(struct pci_dev *pdev, const struct pci_device_id = *id) +{ + struct amdxdna_dev *xdna; + int ret; + + xdna =3D devm_drm_dev_alloc(&pdev->dev, &amdxdna_drm_drv, typeof(*xdna), = ddev); + if (IS_ERR(xdna)) + return PTR_ERR(xdna); + + xdna->dev_info =3D amdxdna_get_dev_info(pdev); + if (!xdna->dev_info) + return -ENODEV; + + drmm_mutex_init(&xdna->ddev, &xdna->dev_lock); + pci_set_drvdata(pdev, xdna); + + mutex_lock(&xdna->dev_lock); + ret =3D xdna->dev_info->ops->init(xdna); + mutex_unlock(&xdna->dev_lock); + if (ret) { + XDNA_ERR(xdna, "Hardware init failed, ret %d", ret); + return ret; + } + + ret =3D amdxdna_sysfs_init(xdna); + if (ret) { + XDNA_ERR(xdna, "Create amdxdna attrs failed: %d", ret); + goto failed_dev_fini; + } + + ret =3D drm_dev_register(&xdna->ddev, 0); + if (ret) { + XDNA_ERR(xdna, "DRM register failed, ret %d", ret); + goto failed_sysfs_fini; + } + + return 0; + +failed_sysfs_fini: + amdxdna_sysfs_fini(xdna); +failed_dev_fini: + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); + return ret; +} + +static void amdxdna_remove(struct pci_dev *pdev) +{ + struct amdxdna_dev *xdna =3D pci_get_drvdata(pdev); + + drm_dev_unplug(&xdna->ddev); + amdxdna_sysfs_fini(xdna); + + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); +} + +static struct pci_driver amdxdna_pci_driver =3D { + .name =3D KBUILD_MODNAME, + .id_table =3D pci_ids, + .probe =3D amdxdna_probe, + .remove =3D amdxdna_remove, +}; + +module_pci_driver(amdxdna_pci_driver); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("XRT Team "); +MODULE_DESCRIPTION("amdxdna driver"); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h new file mode 100644 index 000000000000..2f1a1c2441f9 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_PCI_DRV_H_ +#define _AMDXDNA_PCI_DRV_H_ + +#define XDNA_INFO(xdna, fmt, args...) drm_info(&(xdna)->ddev, fmt, ##args) +#define XDNA_WARN(xdna, fmt, args...) drm_warn(&(xdna)->ddev, "%s: "fmt, _= _func__, ##args) +#define XDNA_ERR(xdna, fmt, args...) drm_err(&(xdna)->ddev, "%s: "fmt, __f= unc__, ##args) +#define XDNA_DBG(xdna, fmt, args...) drm_dbg(&(xdna)->ddev, fmt, ##args) +#define XDNA_INFO_ONCE(xdna, fmt, args...) drm_info_once(&(xdna)->ddev, fm= t, ##args) + +#define to_xdna_dev(drm_dev) \ + ((struct amdxdna_dev *)container_of(drm_dev, struct amdxdna_dev, ddev)) + +extern const struct drm_driver amdxdna_drm_drv; + +struct amdxdna_dev; + +/* + * struct amdxdna_dev_ops - Device hardware operation callbacks + */ +struct amdxdna_dev_ops { + int (*init)(struct amdxdna_dev *xdna); + void (*fini)(struct amdxdna_dev *xdna); +}; + +/* + * struct amdxdna_dev_info - Device hardware information + * Record device static information, like reg, mbox, PSP, SMU bar index + */ +struct amdxdna_dev_info { + int reg_bar; + int mbox_bar; + int sram_bar; + int psp_bar; + int smu_bar; + int device_type; + int first_col; + u32 dev_mem_buf_shift; + u64 dev_mem_base; + size_t dev_mem_size; + char *vbnv; + const struct amdxdna_dev_priv *dev_priv; + const struct amdxdna_dev_ops *ops; +}; + +struct amdxdna_dev { + struct drm_device ddev; + struct amdxdna_dev_hdl *dev_handle; + const struct amdxdna_dev_info *dev_info; + + struct mutex dev_lock; /* per device lock */ +}; + +/* + * struct amdxdna_device_id - PCI device info + */ +struct amdxdna_device_id { + unsigned short device; + u8 revision; + const struct amdxdna_dev_info *dev_info; +}; + +/* Add device info below */ +extern const struct amdxdna_dev_info dev_npu1_info; +extern const struct amdxdna_dev_info dev_npu2_info; +extern const struct amdxdna_dev_info dev_npu4_info; +extern const struct amdxdna_dev_info dev_npu5_info; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna); +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna); + +#endif /* _AMDXDNA_PCI_DRV_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/= amdxdna_sysfs.c new file mode 100644 index 000000000000..5dd652fcf9d4 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include + +#include "amdxdna_pci_drv.h" + +static ssize_t vbnv_show(struct device *dev, struct device_attribute *attr= , char *buf) +{ + struct amdxdna_dev *xdna =3D dev_get_drvdata(dev); + + return sprintf(buf, "%s\n", xdna->dev_info->vbnv); +} +static DEVICE_ATTR_RO(vbnv); + +static ssize_t device_type_show(struct device *dev, struct device_attribut= e *attr, char *buf) +{ + struct amdxdna_dev *xdna =3D dev_get_drvdata(dev); + + return sprintf(buf, "%d\n", xdna->dev_info->device_type); +} +static DEVICE_ATTR_RO(device_type); + +static struct attribute *amdxdna_attrs[] =3D { + &dev_attr_device_type.attr, + &dev_attr_vbnv.attr, + NULL, +}; + +static struct attribute_group amdxdna_attr_group =3D { + .attrs =3D amdxdna_attrs, +}; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna) +{ + int ret; + + ret =3D sysfs_create_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); + if (ret) + XDNA_ERR(xdna, "Create attr group failed"); + + return ret; +} + +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna) +{ + sysfs_remove_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); +} diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1= _regs.c new file mode 100644 index 000000000000..858b31a82888 --- /dev/null +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +/* Address definition from NPU1 docs */ +#define MPNPU_PUB_SEC_INTR 0x3010090 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010094 +#define MPNPU_PUB_SCRATCH2 0x30100A0 +#define MPNPU_PUB_SCRATCH3 0x30100A4 +#define MPNPU_PUB_SCRATCH4 0x30100A8 +#define MPNPU_PUB_SCRATCH5 0x30100AC +#define MPNPU_PUB_SCRATCH6 0x30100B0 +#define MPNPU_PUB_SCRATCH7 0x30100B4 +#define MPNPU_PUB_SCRATCH9 0x30100BC + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x30A0000 +#define MPNPU_SRAM_X2I_MAILBOX_1 0x30A2000 +#define MPNPU_SRAM_I2X_MAILBOX_15 0x30BF000 + +#define MPNPU_APERTURE0_BASE 0x3000000 +#define MPNPU_APERTURE1_BASE 0x3080000 +#define MPNPU_APERTURE2_BASE 0x30C0000 + +/* PCIe BAR Index for NPU1 */ +#define NPU1_REG_BAR_INDEX 0 +#define NPU1_MBOX_BAR_INDEX 4 +#define NPU1_PSP_BAR_INDEX 0 +#define NPU1_SMU_BAR_INDEX 0 +#define NPU1_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU1_REG_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_MBOX_BAR_BASE MPNPU_APERTURE2_BASE +#define NPU1_PSP_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SMU_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SRAM_BAR_BASE MPNPU_APERTURE1_BASE + +#define NPU1_RT_CFG_TYPE_PDI_LOAD 2 +#define NPU1_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU1_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU1_MPNPUCLK_FREQ_MAX 600 +#define NPU1_HCLK_FREQ_MAX 1024 + +const struct amdxdna_dev_priv npu1_dev_priv =3D { + .fw_path =3D "amdnpu/1502_00/npu.sbin", + .protocol_major =3D 0x5, + .protocol_minor =3D 0x1, + .rt_config =3D {NPU1_RT_CFG_TYPE_PDI_LOAD, NPU1_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align =3D COL_ALIGN_NONE, + .mbox_dev_addr =3D NPU1_MBOX_BAR_BASE, + .mbox_size =3D 0, /* Use BAR size */ + .sram_dev_addr =3D NPU1_SRAM_BAR_BASE, + .sram_offs =3D { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU1_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU1_SRAM, MPNPU_SRAM_I2X_MAILBOX_15), + }, + .psp_regs_off =3D { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU1_PSP, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU1_PSP, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU1_PSP, MPNPU_PUB_SEC_INTR), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off =3D { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU1_SMU, MPNPU_PUB_SCRATCH5), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU1_SMU, MPNPU_PUB_PWRMGMT_INTR), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU1_SMU, MPNPU_PUB_SCRATCH6), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + }, + .smu_mpnpuclk_freq_max =3D NPU1_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max =3D NPU1_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu1_info =3D { + .reg_bar =3D NPU1_REG_BAR_INDEX, + .mbox_bar =3D NPU1_MBOX_BAR_INDEX, + .sram_bar =3D NPU1_SRAM_BAR_INDEX, + .psp_bar =3D NPU1_PSP_BAR_INDEX, + .smu_bar =3D NPU1_SMU_BAR_INDEX, + .first_col =3D 1, + .dev_mem_buf_shift =3D 15, /* 32 KiB aligned */ + .dev_mem_base =3D AIE2_DEVM_BASE, + .dev_mem_size =3D AIE2_DEVM_SIZE, + .vbnv =3D "RyzenAI-npu1", + .device_type =3D AMDXDNA_DEV_TYPE_KMQ, + .dev_priv =3D &npu1_dev_priv, + .ops =3D &aie2_ops, +}; diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2= _regs.c new file mode 100644 index 000000000000..02b0f22c9f14 --- /dev/null +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU2 */ +#define NPU2_REG_BAR_INDEX 0 +#define NPU2_MBOX_BAR_INDEX 0 +#define NPU2_PSP_BAR_INDEX 4 +#define NPU2_SMU_BAR_INDEX 5 +#define NPU2_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU2_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU2_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU2_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU2_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU2_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU2_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU2_MPNPUCLK_FREQ_MAX 1267 +#define NPU2_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu2_dev_priv =3D { + .fw_path =3D "amdnpu/17f0_00/npu.sbin", + .protocol_major =3D 0x6, + .protocol_minor =3D 0x1, + .rt_config =3D {NPU2_RT_CFG_TYPE_PDI_LOAD, NPU2_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align =3D COL_ALIGN_NATURE, + .mbox_dev_addr =3D NPU2_MBOX_BAR_BASE, + .mbox_size =3D 0, /* Use BAR size */ + .sram_dev_addr =3D NPU2_SRAM_BAR_BASE, + .sram_offs =3D { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off =3D { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU2_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU2_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU2_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off =3D { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU2_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU2_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU2_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU2_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU2_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max =3D NPU2_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max =3D NPU2_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu2_info =3D { + .reg_bar =3D NPU2_REG_BAR_INDEX, + .mbox_bar =3D NPU2_MBOX_BAR_INDEX, + .sram_bar =3D NPU2_SRAM_BAR_INDEX, + .psp_bar =3D NPU2_PSP_BAR_INDEX, + .smu_bar =3D NPU2_SMU_BAR_INDEX, + .first_col =3D 0, + .dev_mem_buf_shift =3D 15, /* 32 KiB aligned */ + .dev_mem_base =3D AIE2_DEVM_BASE, + .dev_mem_size =3D AIE2_DEVM_SIZE, + .vbnv =3D "RyzenAI-npu2", + .device_type =3D AMDXDNA_DEV_TYPE_KMQ, + .dev_priv =3D &npu2_dev_priv, + .ops =3D &aie2_ops, /* NPU2 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4= _regs.c new file mode 100644 index 000000000000..ca5ca5a6c751 --- /dev/null +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU4 */ +#define NPU4_REG_BAR_INDEX 0 +#define NPU4_MBOX_BAR_INDEX 0 +#define NPU4_PSP_BAR_INDEX 4 +#define NPU4_SMU_BAR_INDEX 5 +#define NPU4_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU4_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU4_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU4_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU4_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU4_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU4_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU4_MPNPUCLK_FREQ_MAX 1267 +#define NPU4_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu4_dev_priv =3D { + .fw_path =3D "amdnpu/17f0_10/npu.sbin", + .protocol_major =3D 0x6, + .protocol_minor =3D 0x1, + .rt_config =3D {NPU4_RT_CFG_TYPE_PDI_LOAD, NPU4_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align =3D COL_ALIGN_NATURE, + .mbox_dev_addr =3D NPU4_MBOX_BAR_BASE, + .mbox_size =3D 0, /* Use BAR size */ + .sram_dev_addr =3D NPU4_SRAM_BAR_BASE, + .sram_offs =3D { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off =3D { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU4_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU4_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU4_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off =3D { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU4_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU4_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU4_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU4_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU4_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max =3D NPU4_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max =3D NPU4_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu4_info =3D { + .reg_bar =3D NPU4_REG_BAR_INDEX, + .mbox_bar =3D NPU4_MBOX_BAR_INDEX, + .sram_bar =3D NPU4_SRAM_BAR_INDEX, + .psp_bar =3D NPU4_PSP_BAR_INDEX, + .smu_bar =3D NPU4_SMU_BAR_INDEX, + .first_col =3D 0, + .dev_mem_buf_shift =3D 15, /* 32 KiB aligned */ + .dev_mem_base =3D AIE2_DEVM_BASE, + .dev_mem_size =3D AIE2_DEVM_SIZE, + .vbnv =3D "RyzenAI-npu4", + .device_type =3D AMDXDNA_DEV_TYPE_KMQ, + .dev_priv =3D &npu4_dev_priv, + .ops =3D &aie2_ops, /* NPU4 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5= _regs.c new file mode 100644 index 000000000000..07fddcbb86ec --- /dev/null +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU5 */ +#define NPU5_REG_BAR_INDEX 0 +#define NPU5_MBOX_BAR_INDEX 0 +#define NPU5_PSP_BAR_INDEX 4 +#define NPU5_SMU_BAR_INDEX 5 +#define NPU5_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU5_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU5_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU5_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +#define NPU5_RT_CFG_TYPE_PDI_LOAD 5 +#define NPU5_RT_CFG_VAL_PDI_LOAD_MGMT 0 +#define NPU5_RT_CFG_VAL_PDI_LOAD_APP 1 + +#define NPU5_MPNPUCLK_FREQ_MAX 1267 +#define NPU5_HCLK_FREQ_MAX 1800 + +const struct amdxdna_dev_priv npu5_dev_priv =3D { + .fw_path =3D "amdnpu/17f0_11/npu.sbin", + .protocol_major =3D 0x6, + .protocol_minor =3D 0x1, + .rt_config =3D {NPU5_RT_CFG_TYPE_PDI_LOAD, NPU5_RT_CFG_VAL_PDI_LOAD_APP}, + .col_align =3D COL_ALIGN_NATURE, + .mbox_dev_addr =3D NPU5_MBOX_BAR_BASE, + .mbox_size =3D 0, /* Use BAR size */ + .sram_dev_addr =3D NPU5_SRAM_BAR_BASE, + .sram_offs =3D { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off =3D { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU5_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU5_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU5_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off =3D { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU5_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU5_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU5_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU5_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU5_SMU, MP1_C2PMSG_60), + }, + .smu_mpnpuclk_freq_max =3D NPU5_MPNPUCLK_FREQ_MAX, + .smu_hclk_freq_max =3D NPU5_HCLK_FREQ_MAX, +}; + +const struct amdxdna_dev_info dev_npu5_info =3D { + .reg_bar =3D NPU5_REG_BAR_INDEX, + .mbox_bar =3D NPU5_MBOX_BAR_INDEX, + .sram_bar =3D NPU5_SRAM_BAR_INDEX, + .psp_bar =3D NPU5_PSP_BAR_INDEX, + .smu_bar =3D NPU5_SMU_BAR_INDEX, + .first_col =3D 0, + .dev_mem_buf_shift =3D 15, /* 32 KiB aligned */ + .dev_mem_base =3D AIE2_DEVM_BASE, + .dev_mem_size =3D AIE2_DEVM_SIZE, + .vbnv =3D "RyzenAI-npu5", + .device_type =3D AMDXDNA_DEV_TYPE_KMQ, + .dev_priv =3D &npu5_dev_priv, + .ops =3D &aie2_ops, +}; diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h new file mode 100644 index 000000000000..6d97e8e90cf6 --- /dev/null +++ b/include/uapi/drm/amdxdna_accel.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _UAPI_AMDXDNA_ACCEL_H_ +#define _UAPI_AMDXDNA_ACCEL_H_ + +#include "drm.h" + +#if defined(__cplusplus) +extern "C" { +#endif + +enum amdxdna_device_type { + AMDXDNA_DEV_TYPE_UNKNOWN =3D -1, + AMDXDNA_DEV_TYPE_KMQ, +}; + +#if defined(__cplusplus) +} /* extern c end */ +#endif + +#endif /* _UAPI_AMDXDNA_ACCEL_H_ */ --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2076.outbound.protection.outlook.com [40.107.212.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 789881E490B for ; Fri, 11 Oct 2024 23:13:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.212.76 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688391; cv=fail; b=hSa+4ew2sC0K4IpaZh7jDzDG3v59OXomY+K3HWZ77IR2skmXIEwxvH4vUiJA1bS+ukn4WZUoCJ3gnYLRhNNWqdeO3BB/Gr8LvdfXCo6gFD6SAcr7AVWlUTxdab7R/Nekzz9Oba2i2Gb9olbJhRhECpjkTAWojataQaTmSL9xeko= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688391; c=relaxed/simple; bh=NqD9t2JHvloHsJnw3XD2dTXgZHIFhcakluegOD+CbsY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H2xT2dAoiodv6P/Hys5StJSTbDsuLStOONjbYIbrjXXKzbqEqatnUmU/zsyjRb902SNMzWRP9Fs+GcMamgo+0Yqn77RlmI7zUBtTt129L3qlbe1acm2H6Wbon3GxRwHO6t6QguFPStkIExidhOPpjmq8av5h7P24ePUw+rDD94w= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=v6oV85KS; arc=fail smtp.client-ip=40.107.212.76 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="v6oV85KS" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=gqKQFERxj0p8kuXTsqbYE5f7+Zub2oIWyyxjt5La6wG4BNAJk+O7ORUGAGZmYAh/C5ahsUKV7rPY1pg8N00g+uwgLAMvuetOsgjGJqA6DRN07i89bAvlMqchMsDpfsfVea4DQTJ5WhW/UmWdO+sZ5g+muBxQDZaEz8IIIMBgRDbATsvN8lnbg6hHgtOfuKHyio7OH3ky5BR+LrYmffQFiqxIM0JpNYGp6CK+UAK6UmjwTNWsaOoe+l+2BRmhHfd/zuDuo48kv+sBvsFdHNI7biAZsnbIhjuTc8mQBrtA2Ef7d7eBggNsLjbjUBOHIcwxbAU9bhDHFvanZrpfznLOSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8mkwhGIzxuPWfrK3VW8KJQi7z+Eq5SjIbAEUY/n6NL4=; b=MfPP1LlSjLWLfTgSmrVrtTWF9Vhqune8n9L5dFIjmfhvYT9AadOPUn7tZOGi141XdIRSNGvLqNObyDc65lk6AsN7eMQWmRQf7+OTOLEFCrUxA/+dhi7OnEB7BsrB2aPWbUnehQM7vRMJgZGB91oaV3FhB+0oOrG3cQLn0r/0dbpI8bqdo9aBTPuyWL3k2/IiwpDtvzeh27GXBYoCUfcE6lQ4U9h01JhnmCE1JMV6ntygRBSW2ABWkRtzjrkWqebu2BNoONsrMeCPoqk9F1KUEk1/wHj+F0pUJbDmlFOWqvLb0407WrG0aXpz39sj02asrY8f5NeoM5auqFXCiTw3xQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=temperror (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=temperror action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8mkwhGIzxuPWfrK3VW8KJQi7z+Eq5SjIbAEUY/n6NL4=; b=v6oV85KSFVDGy5TASbaUcmsmDWJ804SvifWx9aqf10rBcx4EaCuL9HlpHZjdZL4Vx4BkEKQJFKi0uDfedaFCMdLXdwCSQTVh5sNAIzqBb2wB+zCNaEOjumEvsA79BeKvJlCPGJCkmVkG29trAqMaUXQVOQWpq6HKODNS/NR2DfM= Received: from SJ0PR03CA0266.namprd03.prod.outlook.com (2603:10b6:a03:3a0::31) by SA1PR12MB8093.namprd12.prod.outlook.com (2603:10b6:806:335::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 23:13:00 +0000 Received: from CO1PEPF000044F6.namprd21.prod.outlook.com (2603:10b6:a03:3a0:cafe::33) by SJ0PR03CA0266.outlook.office365.com (2603:10b6:a03:3a0::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.21 via Frontend Transport; Fri, 11 Oct 2024 23:13:00 +0000 X-MS-Exchange-Authentication-Results: spf=temperror (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=temperror action=none header.from=amd.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of amd.com: DNS Timeout) Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044F6.mail.protection.outlook.com (10.167.241.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:12:59 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:58 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:57 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:12:57 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , , George Yang Subject: [PATCH V4 03/10] accel/amdxdna: Support hardware mailbox Date: Fri, 11 Oct 2024 16:12:37 -0700 Message-ID: <20241011231244.3182625-4-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F6:EE_|SA1PR12MB8093:EE_ X-MS-Office365-Filtering-Correlation-Id: f2e19902-10cc-4dd0-aaca-08dcea4a3f8c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|376014|1800799024|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?12/6gBFCwrmRETrFQF9ZKRWPPbgGz32xIV/zWxdJ2ZZErwB2vuIF9Vy0DW08?= =?us-ascii?Q?iuo2pzrvXR2/gPMeQ+YcoIekqWtjAFBTDBU5akHaNMPiXcrVN8bEI9tCeAej?= =?us-ascii?Q?4812sUUWRRRq56md9tlMzmUSiB9DymYRQUBGekCTqjOYmo4xGn+EFWkLNgiH?= =?us-ascii?Q?RgepbCL4yGFxS/LDF9QoOpef85WF5u0b4anAvjEQVTqdEz+zc1+LpOc7RSgV?= =?us-ascii?Q?ebwcx/NLvtkWxO2X1jdndlESuexUxF/e/Xfq9mM4mYhPjb9K8ffv0+zpXVYI?= =?us-ascii?Q?OIJEMcyFbGKOXsZkIb81VOx+euyMbbTBiQfWLEXufox5ce5oYy/YM+yskDWY?= =?us-ascii?Q?DuOX0e8YX4/txnlrUs69seM1UyQN+joy862Oz3UTlhCjZGs1gGwAFeQQoUwn?= =?us-ascii?Q?oXzLNh5loNB77TvL6zCFHbCpXc/MVkiuLZJQpKjF28VvGxX43Suk9wRQTBlR?= =?us-ascii?Q?Y4su7TdNbe04uYJMmcf0nVet6ZjVB/okO5WrSuroYZQmrVBo1GQ3ab0VMcQg?= =?us-ascii?Q?rT4dCSPTv0+GjokP1YZv/Xn6ae9cHO2sn4TxxzI2Ou7prwxxAb47hc7gVkXe?= =?us-ascii?Q?eE0bCSPaF2B78VXkBHNp1uPuU6qt2LiwwpCC9oVwJkzU6h8n80Mh+XYEoCxS?= =?us-ascii?Q?I9GObhHqdpxSVACBINCh9B90HBuEgO7T/7TEI2Cwjurz2EuHeIcTwrOdhhMg?= =?us-ascii?Q?UaXvHCe3zlWKLxvR9MsKE7gJgkVF4Bt536Mo21vAoytz7RKpWBEMNyH8XNYg?= =?us-ascii?Q?+kIbJHcqfVUP0BCgCrjLMcI8gIuPor64mjBDzyOXTh9kof6Me8KCkb7d71bB?= =?us-ascii?Q?lVRsZV/f4z7RpRnPtNhdW/rIJM+K31+Wl79cVbMloa7uy13i7OpnFPdOioe5?= =?us-ascii?Q?sWaIeJSxRdJbltO8ZaVAw89XWj8u+hSLccnBZ9iv9CuLVXYzVDwo6rKkiITP?= =?us-ascii?Q?6MyUvzGP0i1BYNQDqhq+x+hUhOQFzBBjOb8UKiDHewLYhWW09Zr5EAGXLn4P?= =?us-ascii?Q?YWx6Z1794RHflTkekDI9QOnp3vlcelGinkzoHzf+Xw7UHFK1sjGdoEWD5Jmb?= =?us-ascii?Q?0nLnrswXkuaodFupy64ByB7DC24aE/AFIe/6J8gINe7V6TRVyIvfyAAOvmG5?= =?us-ascii?Q?hQWmPqc0yDtiVozQw32/lzVD0gocuVu/X8lzeP/iXrgiYYp41DF9kaJdBf+X?= =?us-ascii?Q?AkapqvboVoftrA+weebO6iV27cLoPjon3fOjGXkRQb/CZIHHGjpN9t3fYJGy?= =?us-ascii?Q?Rm8iKycZH/3uJWYiG3bFHth0503KzQ4Z7GuL6AXvURbAVaeXKjQBQoeRs0MT?= =?us-ascii?Q?N1DUsC5RaaxdULYjpxlc7khn?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(376014)(1800799024)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:12:59.2555 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f2e19902-10cc-4dd0-aaca-08dcea4a3f8c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F6.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB8093 Content-Type: text/plain; charset="utf-8" The hardware mailboxes are used by the driver to submit requests to firmware and receive the completion notices from hardware. Initially, a management mailbox channel is up and running. The driver may request firmware to create/destroy more channels dynamically through management channel. Add driver internal mailbox interfaces. - create/destroy a mailbox channel instance - send a message to the firmware through a specific channel - wait for a notification from the specific channel Co-developed-by: George Yang Signed-off-by: George Yang Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- MAINTAINERS | 1 + drivers/accel/amdxdna/Makefile | 3 + drivers/accel/amdxdna/aie2_message.c | 194 ++++++ drivers/accel/amdxdna/aie2_msg_priv.h | 370 +++++++++++ drivers/accel/amdxdna/aie2_pci.c | 256 +++++++- drivers/accel/amdxdna/aie2_pci.h | 62 ++ drivers/accel/amdxdna/aie2_psp.c | 2 + drivers/accel/amdxdna/amdxdna_mailbox.c | 575 ++++++++++++++++++ drivers/accel/amdxdna/amdxdna_mailbox.h | 124 ++++ .../accel/amdxdna/amdxdna_mailbox_helper.c | 56 ++ .../accel/amdxdna/amdxdna_mailbox_helper.h | 42 ++ drivers/accel/amdxdna/amdxdna_pci_drv.h | 8 + drivers/accel/amdxdna/amdxdna_sysfs.c | 11 + drivers/accel/amdxdna/npu1_regs.c | 1 + drivers/accel/amdxdna/npu2_regs.c | 1 + drivers/accel/amdxdna/npu4_regs.c | 1 + drivers/accel/amdxdna/npu5_regs.c | 1 + include/trace/events/amdxdna.h | 60 ++ 18 files changed, 1767 insertions(+), 1 deletion(-) create mode 100644 drivers/accel/amdxdna/aie2_message.c create mode 100644 drivers/accel/amdxdna/aie2_msg_priv.h create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.c create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.h create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.c create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.h create mode 100644 include/trace/events/amdxdna.h diff --git a/MAINTAINERS b/MAINTAINERS index 997cbcad8e3e..ae43451649f3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1192,6 +1192,7 @@ L: dri-devel@lists.freedesktop.org S: Supported T: git https://gitlab.freedesktop.org/drm/misc/kernel.git F: drivers/accel/amdxdna/ +F: include/trace/events/amdxdna.h F: include/uapi/drm/amdxdna_accel.h =20 AMD XGBE DRIVER diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile index 1dee0cba8390..1b4e78b43b44 100644 --- a/drivers/accel/amdxdna/Makefile +++ b/drivers/accel/amdxdna/Makefile @@ -1,9 +1,12 @@ # SPDX-License-Identifier: GPL-2.0-only =20 amdxdna-y :=3D \ + aie2_message.o \ aie2_pci.o \ aie2_psp.o \ aie2_smu.o \ + amdxdna_mailbox.o \ + amdxdna_mailbox_helper.o \ amdxdna_pci_drv.o \ amdxdna_sysfs.o \ npu1_regs.o \ diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c new file mode 100644 index 000000000000..cbf8ee54c6c2 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_message.c @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +#define DECLARE_AIE2_MSG(name, op) \ + DECLARE_XDNA_MSG_COMMON(name, op, MAX_AIE2_STATUS_CODE) + +static int aie2_send_mgmt_msg_wait(struct amdxdna_dev_hdl *ndev, + struct xdna_mailbox_msg *msg) +{ + struct amdxdna_dev *xdna =3D ndev->xdna; + struct xdna_notify *hdl =3D msg->handle; + int ret; + + if (!ndev->mgmt_chann) + return -ENODEV; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + ret =3D xdna_send_msg_wait(xdna, ndev->mgmt_chann, msg); + if (ret =3D=3D -ETIME) { + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + ndev->mgmt_chann =3D NULL; + } + + if (!ret && *hdl->data !=3D AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "command opcode 0x%x failed, status 0x%x", + msg->opcode, *hdl->data); + ret =3D -EINVAL; + } + + return ret; +} + +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_SUSPEND); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_RESUME); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value) +{ + DECLARE_AIE2_MSG(set_runtime_cfg, MSG_OP_SET_RUNTIME_CONFIG); + + req.type =3D type; + req.value =3D value; + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *valu= e) +{ + DECLARE_AIE2_MSG(get_runtime_cfg, MSG_OP_GET_RUNTIME_CONFIG); + int ret; + + req.type =3D type; + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(ndev->xdna, "Failed to get runtime config, ret %d", ret); + return ret; + } + + *value =3D resp.value; + return 0; +} + +int aie2_check_protocol_version(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(protocol_version, MSG_OP_GET_PROTOCOL_VERSION); + struct amdxdna_dev *xdna =3D ndev->xdna; + int ret; + + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(xdna, "Failed to get protocol version, ret %d", ret); + return ret; + } + + if (resp.major !=3D ndev->priv->protocol_major) { + XDNA_ERR(xdna, "Incompatible firmware protocol version major %d minor %d= ", + resp.major, resp.minor); + return -EINVAL; + } + + if (resp.minor < ndev->priv->protocol_minor) { + XDNA_ERR(xdna, "Firmware minor version smaller than supported"); + return -EINVAL; + } + + return 0; +} + +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid) +{ + DECLARE_AIE2_MSG(assign_mgmt_pasid, MSG_OP_ASSIGN_MGMT_PASID); + + req.pasid =3D pasid; + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_versio= n *version) +{ + DECLARE_AIE2_MSG(aie_version_info, MSG_OP_QUERY_AIE_VERSION); + struct amdxdna_dev *xdna =3D ndev->xdna; + int ret; + + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "Query AIE version - major: %u minor: %u completed", + resp.major, resp.minor); + + version->major =3D resp.major; + version->minor =3D resp.minor; + + return 0; +} + +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metad= ata *metadata) +{ + DECLARE_AIE2_MSG(aie_tile_info, MSG_OP_QUERY_AIE_TILE_INFO); + int ret; + + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + metadata->size =3D resp.info.size; + metadata->cols =3D resp.info.cols; + metadata->rows =3D resp.info.rows; + + metadata->version.major =3D resp.info.major; + metadata->version.minor =3D resp.info.minor; + + metadata->core.row_count =3D resp.info.core_rows; + metadata->core.row_start =3D resp.info.core_row_start; + metadata->core.dma_channel_count =3D resp.info.core_dma_channels; + metadata->core.lock_count =3D resp.info.core_locks; + metadata->core.event_reg_count =3D resp.info.core_events; + + metadata->mem.row_count =3D resp.info.mem_rows; + metadata->mem.row_start =3D resp.info.mem_row_start; + metadata->mem.dma_channel_count =3D resp.info.mem_dma_channels; + metadata->mem.lock_count =3D resp.info.mem_locks; + metadata->mem.event_reg_count =3D resp.info.mem_events; + + metadata->shim.row_count =3D resp.info.shim_rows; + metadata->shim.row_start =3D resp.info.shim_row_start; + metadata->shim.dma_channel_count =3D resp.info.shim_dma_channels; + metadata->shim.lock_count =3D resp.info.shim_locks; + metadata->shim.event_reg_count =3D resp.info.shim_events; + + return 0; +} + +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver) +{ + DECLARE_AIE2_MSG(firmware_version, MSG_OP_GET_FIRMWARE_VERSION); + int ret; + + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + fw_ver->major =3D resp.major; + fw_ver->minor =3D resp.minor; + fw_ver->sub =3D resp.sub; + fw_ver->build =3D resp.build; + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_msg_priv.h b/drivers/accel/amdxdna/= aie2_msg_priv.h new file mode 100644 index 000000000000..4e02e744b470 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_msg_priv.h @@ -0,0 +1,370 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MSG_PRIV_H_ +#define _AIE2_MSG_PRIV_H_ + +enum aie2_msg_opcode { + MSG_OP_CREATE_CONTEXT =3D 0x2, + MSG_OP_DESTROY_CONTEXT =3D 0x3, + MSG_OP_SYNC_BO =3D 0x7, + MSG_OP_EXECUTE_BUFFER_CF =3D 0xC, + MSG_OP_QUERY_COL_STATUS =3D 0xD, + MSG_OP_QUERY_AIE_TILE_INFO =3D 0xE, + MSG_OP_QUERY_AIE_VERSION =3D 0xF, + MSG_OP_EXEC_DPU =3D 0x10, + MSG_OP_CONFIG_CU =3D 0x11, + MSG_OP_CHAIN_EXEC_BUFFER_CF =3D 0x12, + MSG_OP_CHAIN_EXEC_DPU =3D 0x13, + MSG_OP_MAX_XRT_OPCODE, + MSG_OP_SUSPEND =3D 0x101, + MSG_OP_RESUME =3D 0x102, + MSG_OP_ASSIGN_MGMT_PASID =3D 0x103, + MSG_OP_INVOKE_SELF_TEST =3D 0x104, + MSG_OP_MAP_HOST_BUFFER =3D 0x106, + MSG_OP_GET_FIRMWARE_VERSION =3D 0x108, + MSG_OP_SET_RUNTIME_CONFIG =3D 0x10A, + MSG_OP_GET_RUNTIME_CONFIG =3D 0x10B, + MSG_OP_REGISTER_ASYNC_EVENT_MSG =3D 0x10C, + MSG_OP_MAX_DRV_OPCODE, + MSG_OP_GET_PROTOCOL_VERSION =3D 0x301, + MSG_OP_MAX_OPCODE +}; + +enum aie2_msg_status { + AIE2_STATUS_SUCCESS =3D 0x0, + /* AIE Error codes */ + AIE2_STATUS_AIE_SATURATION_ERROR =3D 0x1000001, + AIE2_STATUS_AIE_FP_ERROR =3D 0x1000002, + AIE2_STATUS_AIE_STREAM_ERROR =3D 0x1000003, + AIE2_STATUS_AIE_ACCESS_ERROR =3D 0x1000004, + AIE2_STATUS_AIE_BUS_ERROR =3D 0x1000005, + AIE2_STATUS_AIE_INSTRUCTION_ERROR =3D 0x1000006, + AIE2_STATUS_AIE_ECC_ERROR =3D 0x1000007, + AIE2_STATUS_AIE_LOCK_ERROR =3D 0x1000008, + AIE2_STATUS_AIE_DMA_ERROR =3D 0x1000009, + AIE2_STATUS_AIE_MEM_PARITY_ERROR =3D 0x100000a, + AIE2_STATUS_AIE_PWR_CFG_ERROR =3D 0x100000b, + AIE2_STATUS_AIE_BACKTRACK_ERROR =3D 0x100000c, + AIE2_STATUS_MAX_AIE_STATUS_CODE, + /* MGMT ERT Error codes */ + AIE2_STATUS_MGMT_ERT_SELF_TEST_FAILURE =3D 0x2000001, + AIE2_STATUS_MGMT_ERT_HASH_MISMATCH, + AIE2_STATUS_MGMT_ERT_NOAVAIL, + AIE2_STATUS_MGMT_ERT_INVALID_PARAM, + AIE2_STATUS_MGMT_ERT_ENTER_SUSPEND_FAILURE, + AIE2_STATUS_MGMT_ERT_BUSY, + AIE2_STATUS_MGMT_ERT_APPLICATION_ACTIVE, + MAX_MGMT_ERT_STATUS_CODE, + /* APP ERT Error codes */ + AIE2_STATUS_APP_ERT_FIRST_ERROR =3D 0x3000001, + AIE2_STATUS_APP_INVALID_INSTR, + AIE2_STATUS_APP_LOAD_PDI_FAIL, + MAX_APP_ERT_STATUS_CODE, + /* NPU RTOS Error Codes */ + AIE2_STATUS_INVALID_INPUT_BUFFER =3D 0x4000001, + AIE2_STATUS_INVALID_COMMAND, + AIE2_STATUS_INVALID_PARAM, + AIE2_STATUS_INVALID_OPERATION =3D 0x4000006, + AIE2_STATUS_ASYNC_EVENT_MSGS_FULL, + AIE2_STATUS_MAX_RTOS_STATUS_CODE, + MAX_AIE2_STATUS_CODE +}; + +struct assign_mgmt_pasid_req { + __u16 pasid; + __u16 reserved; +} __packed; + +struct assign_mgmt_pasid_resp { + enum aie2_msg_status status; +} __packed; + +struct map_host_buffer_req { + __u32 context_id; + __u64 buf_addr; + __u64 buf_size; +} __packed; + +struct map_host_buffer_resp { + enum aie2_msg_status status; +} __packed; + +#define MAX_CQ_PAIRS 2 +struct cq_info { + __u32 head_addr; + __u32 tail_addr; + __u32 buf_addr; + __u32 buf_size; +}; + +struct cq_pair { + struct cq_info x2i_q; + struct cq_info i2x_q; +}; + +struct create_ctx_req { + __u32 aie_type; + __u8 start_col; + __u8 num_col; + __u16 reserved; + __u8 num_cq_pairs_requested; + __u8 reserved1; + __u16 pasid; + __u32 pad[2]; + __u32 sec_comm_target_type; + __u32 context_priority; +} __packed; + +struct create_ctx_resp { + enum aie2_msg_status status; + __u32 context_id; + __u16 msix_id; + __u8 num_cq_pairs_allocated; + __u8 reserved; + struct cq_pair cq_pair[MAX_CQ_PAIRS]; +} __packed; + +struct destroy_ctx_req { + __u32 context_id; +} __packed; + +struct destroy_ctx_resp { + enum aie2_msg_status status; +} __packed; + +struct execute_buffer_req { + __u32 cu_idx; + __u32 payload[19]; +} __packed; + +struct exec_dpu_req { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 payload[35]; +} __packed; + +struct execute_buffer_resp { + enum aie2_msg_status status; +} __packed; + +struct aie_tile_info { + __u32 size; + __u16 major; + __u16 minor; + __u16 cols; + __u16 rows; + __u16 core_rows; + __u16 mem_rows; + __u16 shim_rows; + __u16 core_row_start; + __u16 mem_row_start; + __u16 shim_row_start; + __u16 core_dma_channels; + __u16 mem_dma_channels; + __u16 shim_dma_channels; + __u16 core_locks; + __u16 mem_locks; + __u16 shim_locks; + __u16 core_events; + __u16 mem_events; + __u16 shim_events; + __u16 reserved; +}; + +struct aie_tile_info_req { + __u32 reserved; +} __packed; + +struct aie_tile_info_resp { + enum aie2_msg_status status; + struct aie_tile_info info; +} __packed; + +struct aie_version_info_req { + __u32 reserved; +} __packed; + +struct aie_version_info_resp { + enum aie2_msg_status status; + __u16 major; + __u16 minor; +} __packed; + +struct aie_column_info_req { + __u64 dump_buff_addr; + __u32 dump_buff_size; + __u32 num_cols; + __u32 aie_bitmap; +} __packed; + +struct aie_column_info_resp { + enum aie2_msg_status status; + __u32 size; +} __packed; + +struct suspend_req { + __u32 place_holder; +} __packed; + +struct suspend_resp { + enum aie2_msg_status status; +} __packed; + +struct resume_req { + __u32 place_holder; +} __packed; + +struct resume_resp { + enum aie2_msg_status status; +} __packed; + +struct check_header_hash_req { + __u64 hash_high; + __u64 hash_low; +} __packed; + +struct check_header_hash_resp { + enum aie2_msg_status status; +} __packed; + +struct query_error_req { + __u64 buf_addr; + __u32 buf_size; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct query_error_resp { + enum aie2_msg_status status; + __u32 num_err; + __u32 has_next_err; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct protocol_version_req { + __u32 reserved; +} __packed; + +struct protocol_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; +} __packed; + +struct firmware_version_req { + __u32 reserved; +} __packed; + +struct firmware_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; + __u32 sub; + __u32 build; +} __packed; + +#define MAX_NUM_CUS 32 +#define AIE2_MSG_CFG_CU_PDI_ADDR GENMASK(16, 0) +#define AIE2_MSG_CFG_CU_FUNC GENMASK(24, 17) +struct config_cu_req { + __u32 num_cus; + __u32 cfgs[MAX_NUM_CUS]; +} __packed; + +struct config_cu_resp { + enum aie2_msg_status status; +} __packed; + +struct set_runtime_cfg_req { + __u32 type; + __u64 value; +} __packed; + +struct set_runtime_cfg_resp { + enum aie2_msg_status status; +} __packed; + +struct get_runtime_cfg_req { + __u32 type; +} __packed; + +struct get_runtime_cfg_resp { + enum aie2_msg_status status; + __u64 value; +} __packed; + +enum async_event_type { + ASYNC_EVENT_TYPE_AIE_ERROR, + ASYNC_EVENT_TYPE_EXCEPTION, + MAX_ASYNC_EVENT_TYPE +}; + +#define ASYNC_BUF_SIZE SZ_8K +struct async_event_msg_req { + __u64 buf_addr; + __u32 buf_size; +} __packed; + +struct async_event_msg_resp { + enum aie2_msg_status status; + enum async_event_type type; +} __packed; + +#define MAX_CHAIN_CMDBUF_SIZE SZ_4K +#define slot_cf_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_execbuf_cf, args[0])) +struct cmd_chain_slot_execbuf_cf { + __u32 cu_idx; + __u32 arg_cnt; + __u32 args[] __counted_by(arg_cnt); +}; + +#define slot_dpu_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_dpu, args[0])) +struct cmd_chain_slot_dpu { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 arg_cnt; +#define MAX_DPU_ARGS_SIZE (34 * sizeof(__u32)) + __u32 args[] __counted_by(arg_cnt); +}; + +struct cmd_chain_req { + __u64 buf_addr; + __u32 buf_size; + __u32 count; +} __packed; + +struct cmd_chain_resp { + enum aie2_msg_status status; + __u32 fail_cmd_idx; + enum aie2_msg_status fail_cmd_status; +} __packed; + +#define AIE2_MSG_SYNC_BO_SRC_TYPE GENMASK(3, 0) +#define AIE2_MSG_SYNC_BO_DST_TYPE GENMASK(7, 4) +struct sync_bo_req { + __u64 src_addr; + __u64 dst_addr; + __u32 size; +#define SYNC_BO_DEV_MEM 0 +#define SYNC_BO_HOST_MEM 2 + __u32 type; +} __packed; + +struct sync_bo_resp { + enum aie2_msg_status status; +} __packed; +#endif /* _AIE2_MSG_PRIV_H_ */ diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index f36549293053..c987c731172d 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -9,16 +9,210 @@ #include #include #include +#include #include =20 +#include "aie2_msg_priv.h" #include "aie2_pci.h" +#include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 +/* + * The management mailbox channel is allocated by firmware. + * The related register and ring buffer information is on SRAM BAR. + * This struct is the register layout. + */ +struct mgmt_mbox_chann_info { + u32 x2i_tail; + u32 x2i_head; + u32 x2i_buf; + u32 x2i_buf_sz; + u32 i2x_tail; + u32 i2x_head; + u32 i2x_buf; + u32 i2x_buf_sz; +}; + +static void aie2_dump_chann_info_debug(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna =3D ndev->xdna; + + XDNA_DBG(xdna, "i2x tail 0x%x", ndev->mgmt_i2x.mb_tail_ptr_reg); + XDNA_DBG(xdna, "i2x head 0x%x", ndev->mgmt_i2x.mb_head_ptr_reg); + XDNA_DBG(xdna, "i2x ringbuf 0x%x", ndev->mgmt_i2x.rb_start_addr); + XDNA_DBG(xdna, "i2x rsize 0x%x", ndev->mgmt_i2x.rb_size); + XDNA_DBG(xdna, "x2i tail 0x%x", ndev->mgmt_x2i.mb_tail_ptr_reg); + XDNA_DBG(xdna, "x2i head 0x%x", ndev->mgmt_x2i.mb_head_ptr_reg); + XDNA_DBG(xdna, "x2i ringbuf 0x%x", ndev->mgmt_x2i.rb_start_addr); + XDNA_DBG(xdna, "x2i rsize 0x%x", ndev->mgmt_x2i.rb_size); + XDNA_DBG(xdna, "x2i chann index 0x%x", ndev->mgmt_chan_idx); +} + +static int aie2_get_mgmt_chann_info(struct amdxdna_dev_hdl *ndev) +{ + struct mgmt_mbox_chann_info info_regs; + struct xdna_mailbox_chann_res *i2x; + struct xdna_mailbox_chann_res *x2i; + u32 addr, off; + u32 *reg; + int ret; + int i; + + /* + * Once firmware is alive, it will write management channel + * information in SRAM BAR and write the address of that information + * at FW_ALIVE_OFF offset in SRMA BAR. + * + * Read a non-zero value from FW_ALIVE_OFF implies that firmware + * is alive. + */ + ret =3D readx_poll_timeout(readl, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF), + addr, addr, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret || !addr) + return -ETIME; + + off =3D AIE2_SRAM_OFF(ndev, addr); + reg =3D (u32 *)&info_regs; + for (i =3D 0; i < sizeof(info_regs) / sizeof(u32); i++) + reg[i] =3D readl(ndev->sram_base + off + i * sizeof(u32)); + + i2x =3D &ndev->mgmt_i2x; + x2i =3D &ndev->mgmt_x2i; + + i2x->mb_head_ptr_reg =3D AIE2_MBOX_OFF(ndev, info_regs.i2x_head); + i2x->mb_tail_ptr_reg =3D AIE2_MBOX_OFF(ndev, info_regs.i2x_tail); + i2x->rb_start_addr =3D AIE2_SRAM_OFF(ndev, info_regs.i2x_buf); + i2x->rb_size =3D info_regs.i2x_buf_sz; + + x2i->mb_head_ptr_reg =3D AIE2_MBOX_OFF(ndev, info_regs.x2i_head); + x2i->mb_tail_ptr_reg =3D AIE2_MBOX_OFF(ndev, info_regs.x2i_tail); + x2i->rb_start_addr =3D AIE2_SRAM_OFF(ndev, info_regs.x2i_buf); + x2i->rb_size =3D info_regs.x2i_buf_sz; + ndev->mgmt_chan_idx =3D CHANN_INDEX(ndev, x2i->rb_start_addr); + + aie2_dump_chann_info_debug(ndev); + + /* Must clear address at FW_ALIVE_OFF */ + writel(0, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF)); + + return 0; +} + +static int aie2_runtime_cfg(struct amdxdna_dev_hdl *ndev) +{ + const struct rt_config *cfg =3D &ndev->priv->rt_config; + u64 value; + int ret; + + ret =3D aie2_set_runtime_cfg(ndev, cfg->type, cfg->value); + if (ret) { + XDNA_ERR(ndev->xdna, "Set runtime type %d value %d failed", + cfg->type, cfg->value); + return ret; + } + + ret =3D aie2_get_runtime_cfg(ndev, cfg->type, &value); + if (ret) { + XDNA_ERR(ndev->xdna, "Get runtime cfg failed"); + return ret; + } + + if (value !=3D cfg->value) + return -EINVAL; + + return 0; +} + +static int aie2_xdna_reset(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret =3D aie2_suspend_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Suspend firmware failed"); + return ret; + } + + ret =3D aie2_resume_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Resume firmware failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret =3D aie2_check_protocol_version(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Check header hash failed"); + return ret; + } + + ret =3D aie2_runtime_cfg(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Runtime config failed"); + return ret; + } + + ret =3D aie2_assign_mgmt_pasid(ndev, 0); + if (ret) { + XDNA_ERR(ndev->xdna, "Can not assign PASID"); + return ret; + } + + ret =3D aie2_xdna_reset(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Reset firmware failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_query(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret =3D aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(ndev->xdna, "query firmware version failed"); + return ret; + } + + ret =3D aie2_query_aie_version(ndev, &ndev->version); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE version failed"); + return ret; + } + + ret =3D aie2_query_aie_metadata(ndev, &ndev->metadata); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE metadata failed"); + return ret; + } + + return 0; +} + +static void aie2_mgmt_fw_fini(struct amdxdna_dev_hdl *ndev) +{ + if (aie2_suspend_fw(ndev)) + XDNA_ERR(ndev->xdna, "Suspend_fw failed"); + XDNA_DBG(ndev->xdna, "Firmware suspended"); +} + static void aie2_hw_stop(struct amdxdna_dev *xdna) { struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); struct amdxdna_dev_hdl *ndev =3D xdna->dev_handle; =20 + aie2_mgmt_fw_fini(ndev); + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); aie2_psp_stop(ndev->psp_hdl); aie2_smu_fini(ndev); pci_disable_device(pdev); @@ -28,7 +222,9 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) { struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); struct amdxdna_dev_hdl *ndev =3D xdna->dev_handle; - int ret; + struct xdna_mailbox_res mbox_res; + u32 xdna_mailbox_intr_reg; + int mgmt_mb_irq, ret; =20 ret =3D pci_enable_device(pdev); if (ret) { @@ -49,8 +245,56 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) goto fini_smu; } =20 + ret =3D aie2_get_mgmt_chann_info(ndev); + if (ret) { + XDNA_ERR(xdna, "firmware is not alive"); + goto stop_psp; + } + + mbox_res.ringbuf_base =3D (u64)ndev->sram_base; + mbox_res.ringbuf_size =3D pci_resource_len(pdev, xdna->dev_info->sram_bar= ); + mbox_res.mbox_base =3D (u64)ndev->mbox_base; + mbox_res.mbox_size =3D MBOX_SIZE(ndev); + mbox_res.name =3D "xdna_mailbox"; + ndev->mbox =3D xdnam_mailbox_create(&xdna->ddev, &mbox_res); + if (!ndev->mbox) { + XDNA_ERR(xdna, "failed to create mailbox device"); + ret =3D -ENODEV; + goto stop_psp; + } + + mgmt_mb_irq =3D pci_irq_vector(pdev, ndev->mgmt_chan_idx); + if (mgmt_mb_irq < 0) { + ret =3D mgmt_mb_irq; + XDNA_ERR(xdna, "failed to alloc irq vector, ret %d", ret); + goto stop_psp; + } + + xdna_mailbox_intr_reg =3D ndev->mgmt_i2x.mb_head_ptr_reg + 4; + ndev->mgmt_chann =3D xdna_mailbox_create_channel(ndev->mbox, + &ndev->mgmt_x2i, + &ndev->mgmt_i2x, + xdna_mailbox_intr_reg, + mgmt_mb_irq); + if (!ndev->mgmt_chann) { + XDNA_ERR(xdna, "failed to create management mailbox channel"); + ret =3D -EINVAL; + goto stop_psp; + } + + ret =3D aie2_mgmt_fw_init(ndev); + if (ret) { + XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); + goto destroy_mgmt_chann; + } + return 0; =20 +destroy_mgmt_chann: + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); +stop_psp: + aie2_psp_stop(ndev->psp_hdl); fini_smu: aie2_smu_fini(ndev); disable_dev: @@ -112,6 +356,7 @@ static int aie2_init(struct amdxdna_dev *xdna) } ndev->sram_base =3D tbl[xdna->dev_info->sram_bar]; ndev->smu_base =3D tbl[xdna->dev_info->smu_bar]; + ndev->mbox_base =3D tbl[xdna->dev_info->mbox_bar]; =20 ret =3D dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); if (ret) { @@ -156,9 +401,18 @@ static int aie2_init(struct amdxdna_dev *xdna) goto disable_sva; } =20 + ret =3D aie2_mgmt_fw_query(ndev); + if (ret) { + XDNA_ERR(xdna, "Query firmware failed, ret %d", ret); + goto stop_hw; + } + ndev->total_col =3D ndev->metadata.cols; + release_firmware(fw); return 0; =20 +stop_hw: + aie2_hw_stop(xdna); disable_sva: iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); free_irq: diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 34f344b4b662..4c81d10a0998 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -6,6 +6,8 @@ #ifndef _AIE2_PCI_H_ #define _AIE2_PCI_H_ =20 +#include "amdxdna_mailbox.h" + #define AIE2_INTERVAL 20000 /* us */ #define AIE2_TIMEOUT 1000000 /* us */ =20 @@ -33,6 +35,17 @@ ((_ndev)->sram_base + SRAM_REG_OFF((_ndev), (idx))); \ }) =20 +#define CHAN_SLOT_SZ SZ_8K +#define CHANN_INDEX(ndev, rbuf_off) \ + (((rbuf_off) - SRAM_REG_OFF((ndev), MBOX_CHANN_OFF)) / CHAN_SLOT_SZ) + +#define MBOX_SIZE(ndev) \ +({ \ + typeof(ndev) _ndev =3D (ndev); \ + ((_ndev)->priv->mbox_size) ? (_ndev)->priv->mbox_size : \ + pci_resource_len(NDEV2PDEV(_ndev), (_ndev)->xdna->dev_info->mbox_bar); \ +}) + #define SMU_MPNPUCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_mpnpuclk_freq_max) #define SMU_HCLK_FREQ_MAX(ndev) ((ndev)->priv->smu_hclk_freq_max) =20 @@ -63,12 +76,37 @@ enum psp_reg_idx { PSP_MAX_REGS /* Keep this at the end */ }; =20 +struct amdxdna_fw_ver; + struct psp_config { const void *fw_buf; u32 fw_size; void __iomem *psp_regs[PSP_MAX_REGS]; }; =20 +struct aie_version { + u16 major; + u16 minor; +}; + +struct aie_tile_metadata { + u16 row_count; + u16 row_start; + u16 dma_channel_count; + u16 lock_count; + u16 event_reg_count; +}; + +struct aie_metadata { + u32 size; + u16 cols; + u16 rows; + struct aie_version version; + struct aie_tile_metadata core; + struct aie_tile_metadata mem; + struct aie_tile_metadata shim; +}; + struct clock_entry { char name[16]; u32 freq_mhz; @@ -84,9 +122,22 @@ struct amdxdna_dev_hdl { const struct amdxdna_dev_priv *priv; void __iomem *sram_base; void __iomem *smu_base; + void __iomem *mbox_base; struct psp_device *psp_hdl; + + struct xdna_mailbox_chann_res mgmt_x2i; + struct xdna_mailbox_chann_res mgmt_i2x; + u32 mgmt_chan_idx; + + u32 total_col; + struct aie_version version; + struct aie_metadata metadata; struct clock_entry mp_npu_clock; struct clock_entry h_clock; + + /* Mailbox and the management channel */ + struct mailbox *mbox; + struct mailbox_channel *mgmt_chann; }; =20 #define DEFINE_BAR_OFFSET(reg_name, bar, reg_addr) \ @@ -127,4 +178,15 @@ struct psp_device *aie2m_psp_create(struct drm_device = *ddev, struct psp_config * int aie2_psp_start(struct psp_device *psp); void aie2_psp_stop(struct psp_device *psp); =20 +/* aie2_message.c */ +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev); +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev); +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value= ); +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *valu= e); +int aie2_check_protocol_version(struct amdxdna_dev_hdl *ndev); +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid); +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_versio= n *version); +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metad= ata *metadata); +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver); #endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_= psp.c index c87ca322e206..2efcfd1941bf 100644 --- a/drivers/accel/amdxdna/aie2_psp.c +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -9,6 +9,8 @@ #include =20 #include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" =20 #define PSP_STATUS_READY BIT(31) =20 diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.c b/drivers/accel/amdxdn= a/amdxdna_mailbox.c new file mode 100644 index 000000000000..60b3bf518013 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.c @@ -0,0 +1,575 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#define CREATE_TRACE_POINTS +#include + +#include "amdxdna_mailbox.h" + +#define MB_ERR(chann, fmt, args...) \ +({ \ + typeof(chann) _chann =3D chann; \ + dev_err((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_DBG(chann, fmt, args...) \ +({ \ + typeof(chann) _chann =3D chann; \ + dev_dbg((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_WARN_ONCE(chann, fmt, args...) \ +({ \ + typeof(chann) _chann =3D chann; \ + dev_warn_once((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) + +#define MAGIC_VAL 0x1D000000U +#define MAGIC_VAL_MASK 0xFF000000 +#define MAX_MSG_ID_ENTRIES 256 +#define MSG_RX_TIMER 200 /* milliseconds */ +#define MAILBOX_NAME "xdna_mailbox" + +enum channel_res_type { + CHAN_RES_X2I, + CHAN_RES_I2X, + CHAN_RES_NUM +}; + +struct mailbox { + struct device *dev; + struct xdna_mailbox_res res; +}; + +struct mailbox_channel { + struct mailbox *mb; + struct xdna_mailbox_chann_res res[CHAN_RES_NUM]; + int msix_irq; + u32 iohub_int_addr; + struct idr chan_idr; + spinlock_t chan_idr_lock; /* protect chan_idr */ + u32 x2i_tail; + + /* Received msg related fields */ + struct workqueue_struct *work_q; + struct work_struct rx_work; + u32 i2x_head; + bool bad_state; +}; + +#define MSG_BODY_SZ GENMASK(10, 0) +#define MSG_PROTO_VER GENMASK(23, 16) +struct xdna_msg_header { + __u32 total_size; + __u32 sz_ver; + __u32 id; + __u32 opcode; +} __packed; + +static_assert(sizeof(struct xdna_msg_header) =3D=3D 16); + +struct mailbox_pkg { + struct xdna_msg_header header; + __u32 payload[]; +}; + +/* The protocol version. */ +#define MSG_PROTOCOL_VERSION 0x1 +/* The tombstone value. */ +#define TOMBSTONE 0xDEADFACE + +struct mailbox_msg { + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + size_t pkg_size; /* package size in bytes */ + struct mailbox_pkg pkg; +}; + +static void mailbox_reg_write(struct mailbox_channel *mb_chann, u32 mbox_r= eg, u32 data) +{ + struct xdna_mailbox_res *mb_res =3D &mb_chann->mb->res; + u64 ringbuf_addr =3D mb_res->mbox_base + mbox_reg; + + writel(data, (void *)ringbuf_addr); +} + +static u32 mailbox_reg_read(struct mailbox_channel *mb_chann, u32 mbox_reg) +{ + struct xdna_mailbox_res *mb_res =3D &mb_chann->mb->res; + u64 ringbuf_addr =3D mb_res->mbox_base + mbox_reg; + + return readl((void *)ringbuf_addr); +} + +static int mailbox_reg_read_non_zero(struct mailbox_channel *mb_chann, u32= mbox_reg, u32 *val) +{ + struct xdna_mailbox_res *mb_res =3D &mb_chann->mb->res; + u64 ringbuf_addr =3D mb_res->mbox_base + mbox_reg; + int ret, value; + + /* Poll till value is not zero */ + ret =3D readx_poll_timeout(readl, (void *)ringbuf_addr, value, + value, 1 /* us */, 100); + if (ret < 0) + return ret; + + *val =3D value; + return 0; +} + +static inline void +mailbox_set_headptr(struct mailbox_channel *mb_chann, u32 headptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_head_ptr_reg, = headptr_val); + mb_chann->i2x_head =3D headptr_val; +} + +static inline void +mailbox_set_tailptr(struct mailbox_channel *mb_chann, u32 tailptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_X2I].mb_tail_ptr_reg, = tailptr_val); + mb_chann->x2i_tail =3D tailptr_val; +} + +static inline u32 +mailbox_get_headptr(struct mailbox_channel *mb_chann, enum channel_res_typ= e type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_head_ptr_reg); +} + +static inline u32 +mailbox_get_tailptr(struct mailbox_channel *mb_chann, enum channel_res_typ= e type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_tail_ptr_reg); +} + +static inline u32 +mailbox_get_ringbuf_size(struct mailbox_channel *mb_chann, enum channel_re= s_type type) +{ + return mb_chann->res[type].rb_size; +} + +static inline int mailbox_validate_msgid(int msg_id) +{ + return (msg_id & MAGIC_VAL_MASK) =3D=3D MAGIC_VAL; +} + +static int mailbox_acquire_msgid(struct mailbox_channel *mb_chann, struct = mailbox_msg *mb_msg) +{ + unsigned long flags; + int msg_id; + + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + msg_id =3D idr_alloc_cyclic(&mb_chann->chan_idr, mb_msg, 0, + MAX_MSG_ID_ENTRIES, GFP_NOWAIT); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + if (msg_id < 0) + return msg_id; + + /* + * The IDR becomes less efficient when dealing with larger IDs. + * Thus, add MAGIC_VAL to the higher bits. + */ + msg_id |=3D MAGIC_VAL; + return msg_id; +} + +static void mailbox_release_msgid(struct mailbox_channel *mb_chann, int ms= g_id) +{ + unsigned long flags; + + msg_id &=3D ~MAGIC_VAL_MASK; + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + idr_remove(&mb_chann->chan_idr, msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); +} + +static int mailbox_release_msg(int id, void *p, void *data) +{ + struct mailbox_channel *mb_chann =3D data; + struct mailbox_msg *mb_msg =3D p; + + MB_DBG(mb_chann, "msg_id 0x%x msg opcode 0x%x", + mb_msg->pkg.header.id, mb_msg->pkg.header.opcode); + mb_msg->notify_cb(mb_msg->handle, NULL, 0); + kfree(mb_msg); + + return 0; +} + +static int +mailbox_send_msg(struct mailbox_channel *mb_chann, struct mailbox_msg *mb_= msg) +{ + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 write_addr; + u32 tmp_tail; + + head =3D mailbox_get_headptr(mb_chann, CHAN_RES_X2I); + tail =3D mb_chann->x2i_tail; + ringbuf_size =3D mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I); + start_addr =3D mb_chann->res[CHAN_RES_X2I].rb_start_addr; + tmp_tail =3D tail + mb_msg->pkg_size; + + if (tail < head && tmp_tail >=3D head) + goto no_space; + + if (tail >=3D head && (tmp_tail > ringbuf_size - sizeof(u32) && + mb_msg->pkg_size >=3D head)) + goto no_space; + + if (tail >=3D head && tmp_tail > ringbuf_size - sizeof(u32)) { + write_addr =3D mb_chann->mb->res.ringbuf_base + start_addr + tail; + writel(TOMBSTONE, (void *)write_addr); + + /* tombstone is set. Write from the start of the ringbuf */ + tail =3D 0; + } + + write_addr =3D mb_chann->mb->res.ringbuf_base + start_addr + tail; + memcpy_toio((void *)write_addr, &mb_msg->pkg, mb_msg->pkg_size); + mailbox_set_tailptr(mb_chann, tail + mb_msg->pkg_size); + + trace_mbox_set_tail(MAILBOX_NAME, mb_chann->msix_irq, + mb_msg->pkg.header.opcode, + mb_msg->pkg.header.id); + + return 0; + +no_space: + return -ENOSPC; +} + +static int +mailbox_get_resp(struct mailbox_channel *mb_chann, struct xdna_msg_header = *header, + void *data) +{ + struct mailbox_msg *mb_msg; + unsigned long flags; + int msg_id; + int ret; + + msg_id =3D header->id; + if (!mailbox_validate_msgid(msg_id)) { + MB_ERR(mb_chann, "Bad message ID 0x%x", msg_id); + return -EINVAL; + } + + msg_id &=3D ~MAGIC_VAL_MASK; + spin_lock_irqsave(&mb_chann->chan_idr_lock, flags); + mb_msg =3D idr_find(&mb_chann->chan_idr, msg_id); + if (!mb_msg) { + MB_ERR(mb_chann, "Cannot find msg 0x%x", msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + return -EINVAL; + } + idr_remove(&mb_chann->chan_idr, msg_id); + spin_unlock_irqrestore(&mb_chann->chan_idr_lock, flags); + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + ret =3D mb_msg->notify_cb(mb_msg->handle, data, header->total_size); + if (unlikely(ret)) + MB_ERR(mb_chann, "Message callback ret %d", ret); + + kfree(mb_msg); + return ret; +} + +static int mailbox_get_msg(struct mailbox_channel *mb_chann) +{ + struct xdna_msg_header header; + u32 msg_size, rest; + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 read_addr; + int ret; + + if (mailbox_reg_read_non_zero(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_ta= il_ptr_reg, &tail)) + return -EINVAL; + head =3D mb_chann->i2x_head; + ringbuf_size =3D mailbox_get_ringbuf_size(mb_chann, CHAN_RES_I2X); + start_addr =3D mb_chann->res[CHAN_RES_I2X].rb_start_addr; + + if (unlikely(tail > ringbuf_size || !IS_ALIGNED(tail, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid tail 0x%x", tail); + return -EINVAL; + } + + /* ringbuf empty */ + if (head =3D=3D tail) + return -ENOENT; + + if (head =3D=3D ringbuf_size) + head =3D 0; + + /* Peek size of the message or TOMBSTONE */ + read_addr =3D mb_chann->mb->res.ringbuf_base + start_addr + head; + header.total_size =3D readl((void *)read_addr); + /* size is TOMBSTONE, set next read from 0 */ + if (header.total_size =3D=3D TOMBSTONE) { + if (head < tail) { + MB_WARN_ONCE(mb_chann, "Tombstone, head 0x%x tail 0x%x", + head, tail); + return -EINVAL; + } + mailbox_set_headptr(mb_chann, 0); + return 0; + } + + if (unlikely(!header.total_size || !IS_ALIGNED(header.total_size, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid total size 0x%x", header.total_size); + return -EINVAL; + } + msg_size =3D sizeof(header) + header.total_size; + + if (msg_size > ringbuf_size - head || msg_size > tail - head) { + MB_WARN_ONCE(mb_chann, "Invalid message size %d, tail %d, head %d", + msg_size, tail, head); + return -EINVAL; + } + + rest =3D sizeof(header) - sizeof(u32); + read_addr +=3D sizeof(u32); + memcpy_fromio((u32 *)&header + 1, (void *)read_addr, rest); + read_addr +=3D rest; + + ret =3D mailbox_get_resp(mb_chann, &header, (u32 *)read_addr); + + mailbox_set_headptr(mb_chann, head + msg_size); + /* After update head, it can equal to ringbuf_size. This is expected. */ + trace_mbox_set_head(MAILBOX_NAME, mb_chann->msix_irq, + header.opcode, header.id); + + return ret; +} + +static irqreturn_t mailbox_irq_handler(int irq, void *p) +{ + struct mailbox_channel *mb_chann =3D p; + + trace_mbox_irq_handle(MAILBOX_NAME, irq); + /* Schedule a rx_work to call the callback functions */ + queue_work(mb_chann->work_q, &mb_chann->rx_work); + /* Clear IOHUB register */ + mailbox_reg_write(mb_chann, mb_chann->iohub_int_addr, 0); + + return IRQ_HANDLED; +} + +static void mailbox_rx_worker(struct work_struct *rx_work) +{ + struct mailbox_channel *mb_chann; + int ret; + + mb_chann =3D container_of(rx_work, struct mailbox_channel, rx_work); + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state, work aborted"); + return; + } + + while (1) { + /* + * If return is 0, keep consuming next message, until there is + * no messages or an error happened. + */ + ret =3D mailbox_get_msg(mb_chann); + if (ret =3D=3D -ENOENT) + break; + + /* Other error means device doesn't look good, disable irq. */ + if (unlikely(ret)) { + MB_ERR(mb_chann, "Unexpected ret %d, disable irq", ret); + WRITE_ONCE(mb_chann->bad_state, true); + disable_irq(mb_chann->msix_irq); + break; + } + } +} + +int xdna_mailbox_send_msg(struct mailbox_channel *mb_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout) +{ + struct xdna_msg_header *header; + struct mailbox_msg *mb_msg; + size_t pkg_size; + int ret; + + pkg_size =3D sizeof(*header) + msg->send_size; + if (pkg_size > mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I)) { + MB_ERR(mb_chann, "Message size larger than ringbuf size"); + return -EINVAL; + } + + if (unlikely(!IS_ALIGNED(msg->send_size, 4))) { + MB_ERR(mb_chann, "Message must be 4 bytes align"); + return -EINVAL; + } + + /* The fist word in payload can NOT be TOMBSTONE */ + if (unlikely(((u32 *)msg->send_data)[0] =3D=3D TOMBSTONE)) { + MB_ERR(mb_chann, "Tomb stone in data"); + return -EINVAL; + } + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state"); + return -EPIPE; + } + + mb_msg =3D kzalloc(sizeof(*mb_msg) + pkg_size, GFP_KERNEL); + if (!mb_msg) + return -ENOMEM; + + mb_msg->handle =3D msg->handle; + mb_msg->notify_cb =3D msg->notify_cb; + mb_msg->pkg_size =3D pkg_size; + + header =3D &mb_msg->pkg.header; + /* + * Hardware use total_size and size to split huge message. + * We do not support it here. Thus the values are the same. + */ + header->total_size =3D msg->send_size; + header->sz_ver =3D FIELD_PREP(MSG_BODY_SZ, msg->send_size) | + FIELD_PREP(MSG_PROTO_VER, MSG_PROTOCOL_VERSION); + header->opcode =3D msg->opcode; + memcpy(mb_msg->pkg.payload, msg->send_data, msg->send_size); + + ret =3D mailbox_acquire_msgid(mb_chann, mb_msg); + if (unlikely(ret < 0)) { + MB_ERR(mb_chann, "mailbox_acquire_msgid failed"); + goto msg_id_failed; + } + header->id =3D ret; + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + + ret =3D mailbox_send_msg(mb_chann, mb_msg); + if (ret) { + MB_DBG(mb_chann, "Error in mailbox send msg, ret %d", ret); + goto release_id; + } + + return 0; + +release_id: + mailbox_release_msgid(mb_chann, header->id); +msg_id_failed: + kfree(mb_msg); + return ret; +} + +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mb, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 iohub_int_addr, + int mb_irq) +{ + struct mailbox_channel *mb_chann; + int ret; + + if (!is_power_of_2(x2i->rb_size) || !is_power_of_2(i2x->rb_size)) { + pr_err("Ring buf size must be power of 2"); + return NULL; + } + + mb_chann =3D kzalloc(sizeof(*mb_chann), GFP_KERNEL); + if (!mb_chann) + return NULL; + + mb_chann->mb =3D mb; + mb_chann->msix_irq =3D mb_irq; + mb_chann->iohub_int_addr =3D iohub_int_addr; + memcpy(&mb_chann->res[CHAN_RES_X2I], x2i, sizeof(*x2i)); + memcpy(&mb_chann->res[CHAN_RES_I2X], i2x, sizeof(*i2x)); + + spin_lock_init(&mb_chann->chan_idr_lock); + idr_init(&mb_chann->chan_idr); + mb_chann->x2i_tail =3D mailbox_get_tailptr(mb_chann, CHAN_RES_X2I); + mb_chann->i2x_head =3D mailbox_get_headptr(mb_chann, CHAN_RES_I2X); + + INIT_WORK(&mb_chann->rx_work, mailbox_rx_worker); + mb_chann->work_q =3D create_singlethread_workqueue(MAILBOX_NAME); + if (!mb_chann->work_q) { + MB_ERR(mb_chann, "Create workqueue failed"); + goto free_and_out; + } + + /* Everything look good. Time to enable irq handler */ + ret =3D request_irq(mb_irq, mailbox_irq_handler, 0, MAILBOX_NAME, mb_chan= n); + if (ret) { + MB_ERR(mb_chann, "Failed to request irq %d ret %d", mb_irq, ret); + goto destroy_wq; + } + + mb_chann->bad_state =3D false; + + MB_DBG(mb_chann, "Mailbox channel created (irq: %d)", mb_chann->msix_irq); + return mb_chann; + +destroy_wq: + destroy_workqueue(mb_chann->work_q); +free_and_out: + kfree(mb_chann); + return NULL; +} + +int xdna_mailbox_destroy_channel(struct mailbox_channel *mb_chann) +{ + if (!mb_chann) + return 0; + + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); + free_irq(mb_chann->msix_irq, mb_chann); + destroy_workqueue(mb_chann->work_q); + /* We can clean up and release resources */ + + idr_for_each(&mb_chann->chan_idr, mailbox_release_msg, mb_chann); + idr_destroy(&mb_chann->chan_idr); + + MB_DBG(mb_chann, "Mailbox channel destroyed, irq: %d", mb_chann->msix_irq= ); + kfree(mb_chann); + return 0; +} + +void xdna_mailbox_stop_channel(struct mailbox_channel *mb_chann) +{ + if (!mb_chann) + return; + + /* Disalbe an irq and wait. This might sleep. */ + disable_irq(mb_chann->msix_irq); + + /* Cancel RX work and wait for it to finish */ + cancel_work_sync(&mb_chann->rx_work); + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); +} + +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res) +{ + struct mailbox *mb; + + mb =3D drmm_kzalloc(ddev, sizeof(*mb), GFP_KERNEL); + if (!mb) + return NULL; + mb->dev =3D ddev->dev; + + /* mailbox and ring buf base and size information */ + memcpy(&mb->res, res, sizeof(*res)); + + return mb; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.h b/drivers/accel/amdxdn= a/amdxdna_mailbox.h new file mode 100644 index 000000000000..65dc0aba0f35 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MAILBOX_H_ +#define _AIE2_MAILBOX_H_ + +struct mailbox; +struct mailbox_channel; + +/* + * xdna_mailbox_msg - message struct + * + * @opcode: opcode for firmware + * @handle: handle used for the notify callback + * @notify_cb: callback function to notify the sender when there is respo= nse + * @send_data: pointing to sending data + * @send_size: size of the sending data + * + * The mailbox will splite the sending data in to multiple firmware messag= e if + * the size of the data is too big. This is transparent to the sender. The + * sender will receive one notification. + */ +struct xdna_mailbox_msg { + u32 opcode; + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + u8 *send_data; + size_t send_size; +}; + +/* + * xdna_mailbox_res - mailbox hardware resource + * + * @ringbuf_base: ring buffer base address + * @ringbuf_size: ring buffer size + * @mbox_base: mailbox base address + * @mbox_size: mailbox size + */ +struct xdna_mailbox_res { + u64 ringbuf_base; + size_t ringbuf_size; + u64 mbox_base; + size_t mbox_size; + const char *name; +}; + +/* + * xdna_mailbox_chann_res - resources + * + * @rb_start_addr: ring buffer start address + * @rb_size: ring buffer size + * @mb_head_ptr_reg: mailbox head pointer register + * @mb_tail_ptr_reg: mailbox tail pointer register + */ +struct xdna_mailbox_chann_res { + u32 rb_start_addr; + u32 rb_size; + u32 mb_head_ptr_reg; + u32 mb_tail_ptr_reg; +}; + +/* + * xdna_mailbox_create() -- create mailbox subsystem and initialize + * + * @ddev: device pointer + * @res: SRAM and mailbox resources + * + * Return: If success, return a handle of mailbox subsystem. + * Otherwise, return NULL pointer. + */ +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res); + +/* + * xdna_mailbox_create_channel() -- Create a mailbox channel instance + * + * @mailbox: the handle return from xdna_mailbox_create() + * @x2i: host to firmware mailbox resources + * @i2x: firmware to host mailbox resources + * @xdna_mailbox_intr_reg: register addr of MSI-X interrupt + * @mb_irq: Linux IRQ number associated with mailbox MSI-X interrupt vecto= r index + * + * Return: If success, return a handle of mailbox channel. Otherwise, retu= rn NULL. + */ +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mailbox, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 xdna_mailbox_intr_reg, + int mb_irq); + +/* + * xdna_mailbox_destroy_channel() -- destroy mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +int xdna_mailbox_destroy_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_stop_channel() -- stop mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +void xdna_mailbox_stop_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_send_msg() -- Send a message + * + * @mailbox_chann: Mailbox channel handle + * @msg: message struct for message information + * @tx_timeout: the timeout value for sending the message in ms. + * + * Return: If success return 0, otherwise, return error code + */ +int xdna_mailbox_send_msg(struct mailbox_channel *mailbox_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout); + +#endif /* _AIE2_MAILBOX_ */ diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c b/drivers/accel= /amdxdna/amdxdna_mailbox_helper.c new file mode 100644 index 000000000000..42b615394605 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +int xdna_msg_cb(void *handle, const u32 *data, size_t size) +{ + struct xdna_notify *cb_arg =3D handle; + int ret; + + if (unlikely(!data)) + goto out; + + if (unlikely(cb_arg->size !=3D size)) { + cb_arg->error =3D -EINVAL; + goto out; + } + + print_hex_dump_debug("resp data: ", DUMP_PREFIX_OFFSET, + 16, 4, data, cb_arg->size, true); + memcpy(cb_arg->data, data, cb_arg->size); +out: + ret =3D cb_arg->error; + complete(&cb_arg->comp); + return ret; +} + +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *c= hann, + struct xdna_mailbox_msg *msg) +{ + struct xdna_notify *hdl =3D msg->handle; + int ret; + + ret =3D xdna_mailbox_send_msg(chann, msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed, ret %d", ret); + return ret; + } + + ret =3D wait_for_completion_timeout(&hdl->comp, + msecs_to_jiffies(RX_TIMEOUT)); + if (!ret) { + XDNA_ERR(xdna, "Wait for completion timeout"); + return -ETIME; + } + + return hdl->error; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.h b/drivers/accel= /amdxdna/amdxdna_mailbox_helper.h new file mode 100644 index 000000000000..9d45a3bcde5e --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_MAILBOX_HELPER_H +#define _AMDXDNA_MAILBOX_HELPER_H + +#define TX_TIMEOUT 2000 /* miliseconds */ +#define RX_TIMEOUT 5000 /* miliseconds */ + +struct amdxdna_dev; + +struct xdna_notify { + struct completion comp; + u32 *data; + size_t size; + int error; +}; + +#define DECLARE_XDNA_MSG_COMMON(name, op, status) \ + struct name##_req req =3D { 0 }; \ + struct name##_resp resp =3D { status }; \ + struct xdna_notify hdl =3D { \ + .error =3D 0, \ + .data =3D (u32 *)&resp, \ + .size =3D sizeof(resp), \ + .comp =3D COMPLETION_INITIALIZER(hdl.comp), \ + }; \ + struct xdna_mailbox_msg msg =3D { \ + .send_data =3D (u8 *)&req, \ + .send_size =3D sizeof(req), \ + .handle =3D &hdl, \ + .opcode =3D op, \ + .notify_cb =3D xdna_msg_cb, \ + } + +int xdna_msg_cb(void *handle, const u32 *data, size_t size); +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *c= hann, + struct xdna_mailbox_msg *msg); + +#endif /* _AMDXDNA_MAILBOX_HELPER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 2f1a1c2441f9..64bce970514b 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -47,12 +47,20 @@ struct amdxdna_dev_info { const struct amdxdna_dev_ops *ops; }; =20 +struct amdxdna_fw_ver { + u32 major; + u32 minor; + u32 sub; + u32 build; +}; + struct amdxdna_dev { struct drm_device ddev; struct amdxdna_dev_hdl *dev_handle; const struct amdxdna_dev_info *dev_info; =20 struct mutex dev_lock; /* per device lock */ + struct amdxdna_fw_ver fw_ver; }; =20 /* diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/= amdxdna_sysfs.c index 5dd652fcf9d4..668b94b92714 100644 --- a/drivers/accel/amdxdna/amdxdna_sysfs.c +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -24,9 +24,20 @@ static ssize_t device_type_show(struct device *dev, stru= ct device_attribute *att } static DEVICE_ATTR_RO(device_type); =20 +static ssize_t fw_version_show(struct device *dev, struct device_attribute= *attr, char *buf) +{ + struct amdxdna_dev *xdna =3D dev_get_drvdata(dev); + + return sprintf(buf, "%d.%d.%d.%d\n", xdna->fw_ver.major, + xdna->fw_ver.minor, xdna->fw_ver.sub, + xdna->fw_ver.build); +} +static DEVICE_ATTR_RO(fw_version); + static struct attribute *amdxdna_attrs[] =3D { &dev_attr_device_type.attr, &dev_attr_vbnv.attr, + &dev_attr_fw_version.attr, NULL, }; =20 diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1= _regs.c index 858b31a82888..720aab0ed7c4 100644 --- a/drivers/accel/amdxdna/npu1_regs.c +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -8,6 +8,7 @@ #include =20 #include "aie2_pci.h" +#include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 /* Address definition from NPU1 docs */ diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2= _regs.c index 02b0f22c9f14..f3ea18bcf294 100644 --- a/drivers/accel/amdxdna/npu2_regs.c +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -8,6 +8,7 @@ #include =20 #include "aie2_pci.h" +#include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 /* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4= _regs.c index ca5ca5a6c751..db61142f0d4e 100644 --- a/drivers/accel/amdxdna/npu4_regs.c +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -8,6 +8,7 @@ #include =20 #include "aie2_pci.h" +#include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 /* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5= _regs.c index 07fddcbb86ec..debf4e95b9bb 100644 --- a/drivers/accel/amdxdna/npu5_regs.c +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -8,6 +8,7 @@ #include =20 #include "aie2_pci.h" +#include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 /* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ diff --git a/include/trace/events/amdxdna.h b/include/trace/events/amdxdna.h new file mode 100644 index 000000000000..33343d8f0622 --- /dev/null +++ b/include/trace/events/amdxdna.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM amdxdna + +#if !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_AMDXDNA_H + +#include + +DECLARE_EVENT_CLASS(xdna_mbox_msg, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 msg_id), + + TP_ARGS(name, chann_id, opcode, msg_id), + + TP_STRUCT__entry(__string(name, name) + __field(u32, chann_id) + __field(u32, opcode) + __field(u32, msg_id)), + + TP_fast_assign(__assign_str(name); + __entry->chann_id =3D chann_id; + __entry->opcode =3D opcode; + __entry->msg_id =3D msg_id;), + + TP_printk("%s.%d id 0x%x opcode 0x%x", __get_str(name), + __entry->chann_id, __entry->msg_id, __entry->opcode) +); + +DEFINE_EVENT(xdna_mbox_msg, mbox_set_tail, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 id), + TP_ARGS(name, chann_id, opcode, id) +); + +DEFINE_EVENT(xdna_mbox_msg, mbox_set_head, + TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 id), + TP_ARGS(name, chann_id, opcode, id) +); + +TRACE_EVENT(mbox_irq_handle, + TP_PROTO(char *name, int irq), + + TP_ARGS(name, irq), + + TP_STRUCT__entry(__string(name, name) + __field(int, irq)), + + TP_fast_assign(__assign_str(name); + __entry->irq =3D irq;), + + TP_printk("%s.%d", __get_str(name), __entry->irq) +); + +#endif /* !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) */ + +/* This part must be outside protection */ +#include --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2077.outbound.protection.outlook.com [40.107.101.77]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74A8A1E2018 for ; Fri, 11 Oct 2024 23:13:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.101.77 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688393; cv=fail; b=Pxt/De7gmnhCiwAX31zS4++XX6GRxU5Q4QMlu7bkNvOwmc+QjIsHtW/IBTsK7jxgUE+PXIkYx0s83vFpNTW/9WdhSSGJ2UUBtg55qWgXkU3EXwQxKsD5mEMm38RIVyZcZddRkhRNiETlbBbsV0uw9xLtvFTsUbrooz1PTW8G42c= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688393; c=relaxed/simple; bh=0BBYvSF0BjEgyxUEkHIl2gjaIOki/xp+z0FXoa4wPfU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FzbwxfnjGvqI1sGzU0MojKY1je6d9hRKibpyBCG3+9mobYLAS3j8cZdjZWGhdyVvAEx/zZdh+MQvd3P4hajI/6CR/1piaxUMfP8qiZNgFOW2J+5hzhfx05PyZxZdlNK/mZO1hwg9jgZm/eHLDaEIwjIhe8M4gj6ZPSISI6b+vYM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=0PPRpsxU; arc=fail smtp.client-ip=40.107.101.77 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="0PPRpsxU" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=W9W7blblJCumzL2dbK88q4Q5Pv9sgB8ZAqRtJoqEJ0l9nAPJMi1RnZ2P503VYCNm7qR8nOsONtK4MMYHlm/nxFuwoszQbPAP43OC/P2HlYzljJ//sx0Tns8RaqHeIfI7SuykRl0JgLyU90cHvS5WWXjiXDbSLz0q5tk86vSsqlGiAY8nL8/BtGIEyW/XNyKL0Zu0XdKE6xslwF/tD+9kisuz9LvMz/ED5GgSatgz7czenCwB3LkZbhSxNSiyAkvRSAoD6wWgAeUdE59zEBbu6ytsTqKSWWhdijJecu7w289wROOcho1cyg3UXQM275R5jQDLchz5vp6TKw3jtO09Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8vTX/f2auCvgrkdP/NlWUXcgAn18CmpHwD1J+J4xF3I=; b=uuvV0/qaOeeHt+bFQcjfKD0CRCP4FaP0789pF4sSgal16nHSCNAnnlD1Li7DDsoQVoV6DaPiOX0zGsLL47CyrX8zSFs8QdaKNx9HSgk0oaISDsR6O9UAE8yaeStn3J8eCTlMzjcHGF2ApAERDPTAuW8WOLQWupEqZDFWIm6rV4U0WxIHKzdcov09J+YT9pcL9R+49Se6vEYPdRLzF1u3JI5/fFbhzsE/h8NbcYtd4yddF9Sq2+hDiRRxxgriSgSObIESDLgQOt08K7GHBwWd1QOP7TyB6RMDmKC22SgjhKWBqIGM58c2vZyOeZJZCWwx3VMlInUmOZWRMrW92TXVAQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8vTX/f2auCvgrkdP/NlWUXcgAn18CmpHwD1J+J4xF3I=; b=0PPRpsxUHoawKGDXBK5hAtw8Su+nXz/d2tlrdz8B9mpJYCj9dbqipb3l3X1/5wdl8Bg+FFnJIlBHR/ZJ+zUDGGo89O1Qp7RasQuf5vW7YqqaBNzU4+/W3WLaveyr8q2xqteZVXRwBeGK9zKHFIJapfiIcSVNBZLEUfVrNL62CSM= Received: from BYAPR05CA0080.namprd05.prod.outlook.com (2603:10b6:a03:e0::21) by SJ2PR12MB8884.namprd12.prod.outlook.com (2603:10b6:a03:547::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8026.21; Fri, 11 Oct 2024 23:13:05 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::76) by BYAPR05CA0080.outlook.office365.com (2603:10b6:a03:e0::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.8 via Frontend Transport; Fri, 11 Oct 2024 23:13:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:04 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:59 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:58 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:12:58 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , , Jeffrey Hugo Subject: [PATCH V4 04/10] accel/amdxdna: Add hardware resource solver Date: Fri, 11 Oct 2024 16:12:38 -0700 Message-ID: <20241011231244.3182625-5-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|SJ2PR12MB8884:EE_ X-MS-Office365-Filtering-Correlation-Id: 3fce5890-bd3d-4e23-a4b7-08dcea4a42e5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|36860700013|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?ocpy9lPFTOTCvb3wqwvDOzvcR6zfPSq6Q6ybPrm6nfLKtT0KXWckvn0AQFll?= =?us-ascii?Q?UNedGZ6JIMpPcz+AkzUXSmkdTbwq+uLgu8O4JuREghPI5xaDIifql5DbKnxV?= =?us-ascii?Q?rI+h/106c8reRTnCiNlUd4CEPE8kBnuRo62iNSot+aJiZ0wt/4ZKYEaWt2Wc?= =?us-ascii?Q?p8/4J2BSuX5y4BT4vhdFWmv4T7s+qlkFx4bGRJdShwDpejl0xxf7VoIFcdPi?= =?us-ascii?Q?ndlHpzGfjkUioJlUeMTEgeh/nkExyaJHkCJiYkqeKd4GgkAXAVfizvqgg92B?= =?us-ascii?Q?M6bVxoWbbywdg4oFVVn9UYTHGb3I6Xm/cRyiufUZTzLszvsCKNjBUrH12SvF?= =?us-ascii?Q?0Em6Y2Cy8T+zps2GPSxPWsShA59fX4u1GGHOQIc6+udNOJJlTPm40CLuxY4x?= =?us-ascii?Q?tObeuzvNJhh3xlLD3IYTE3adz93UiDX6oynMACu3LJb9B6HNupG7ljNn3tFC?= =?us-ascii?Q?bv133tIqIe8pk4Kp7yczzjDRxlSEqJ2d29uJTbD3lU89q08QR3WmeBz08lWz?= =?us-ascii?Q?1H9MvV433NO68PkN3bS3cVayUNej+Hbb4jY9HHYdgyZKxv5pNTkuxkV7T6Nc?= =?us-ascii?Q?VWkabdXBIdSh6C37CUDDTnH68atTAo0swgDMD0+isMeVw6HwG7i2/SYxHV16?= =?us-ascii?Q?myClWWBOWcFL8h/0sN6ywjXP0HdtVVXJiC91utcBigrxl0ZBVXv/ygBZl8gl?= =?us-ascii?Q?3EzGDMJyGSbFfuYC5E8lab9n/WpTi6DfN5lI/wDCG3CYGQAZ41SxLGJrrJwP?= =?us-ascii?Q?lGHiCpMXCPq3fRifWpHuAD675cvboW6JFTEHtHYgEtMfyjEcNJLMrEwPXhFM?= =?us-ascii?Q?pxF3my2De5H0A3V/7Bc2WhMzLHBiJKP+6ZsXZfuDiRKQxv+N83BuQI5spGIr?= =?us-ascii?Q?CIy7rnUdBmqB9sWEtb7fxBbtiTNPjWqtxrIBFKINPeRQJXls/Cuy8xaitaX0?= =?us-ascii?Q?R54ymgNhohWFARe9bm7ZycgqhJnzyhfCjMWVvR/RV2o+NskruAK4g3OEzuat?= =?us-ascii?Q?svU1hxKitWQZw5D05+30MEAToHJycf/lxhL74DYVWMuibShN3TVV36M9gPai?= =?us-ascii?Q?S22YHYDGAR6Wtu+NhCzwnz1ulhcWxtZ29d1wqOGrAwv3aKiumGPTfb7w+NRG?= =?us-ascii?Q?sZ8LZEwtEoBXtNBGIFmy4/0NYEBvQLWasqucufjmOG0qcEeGBGs8acVF+kWR?= =?us-ascii?Q?M7FbGnqrTMCyNHNtZaBsLFW51yetEFagtqVPLreTUH8/QQe47mi5IHD8evL3?= =?us-ascii?Q?snVQDRKCraLtjzeH/A0+Du9OinBHgZSQ3yA/AiY5ckDdo6lo277h5BjpnS2f?= =?us-ascii?Q?4RHE3J/2nrIJKCGXd2U6AqpsZieWfqk3e8+XOqoubxZIYw=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(36860700013)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:04.8718 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3fce5890-bd3d-4e23-a4b7-08dcea4a42e5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8884 Content-Type: text/plain; charset="utf-8" The AI Engine consists of 2D array of tiles arranged as columns. Provides the basic column allocation and release functions for the tile columns. Co-developed-by: Min Ma Signed-off-by: Min Ma Reviewed-by: Jeffrey Hugo Signed-off-by: Lizhi Hou --- drivers/accel/amdxdna/Makefile | 1 + drivers/accel/amdxdna/aie2_pci.c | 23 +- drivers/accel/amdxdna/aie2_solver.c | 330 ++++++++++++++++++++++++ drivers/accel/amdxdna/aie2_solver.h | 154 +++++++++++ drivers/accel/amdxdna/amdxdna_pci_drv.h | 1 + 5 files changed, 508 insertions(+), 1 deletion(-) create mode 100644 drivers/accel/amdxdna/aie2_solver.c create mode 100644 drivers/accel/amdxdna/aie2_solver.h diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile index 1b4e78b43b44..39d3404fbc8f 100644 --- a/drivers/accel/amdxdna/Makefile +++ b/drivers/accel/amdxdna/Makefile @@ -5,6 +5,7 @@ amdxdna-y :=3D \ aie2_pci.o \ aie2_psp.o \ aie2_smu.o \ + aie2_solver.o \ amdxdna_mailbox.o \ amdxdna_mailbox_helper.o \ amdxdna_pci_drv.o \ diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index c987c731172d..aa66f401235a 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -14,9 +14,14 @@ =20 #include "aie2_msg_priv.h" #include "aie2_pci.h" +#include "aie2_solver.h" #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 +int aie2_max_col =3D XRS_MAX_COL; +module_param(aie2_max_col, uint, 0600); +MODULE_PARM_DESC(aie2_max_col, "Maximum column could be used"); + /* * The management mailbox channel is allocated by firmware. * The related register and ring buffer information is on SRAM BAR. @@ -306,6 +311,7 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) static int aie2_init(struct amdxdna_dev *xdna) { struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + struct init_config xrs_cfg =3D { 0 }; struct amdxdna_dev_hdl *ndev; struct psp_config psp_conf; const struct firmware *fw; @@ -406,7 +412,22 @@ static int aie2_init(struct amdxdna_dev *xdna) XDNA_ERR(xdna, "Query firmware failed, ret %d", ret); goto stop_hw; } - ndev->total_col =3D ndev->metadata.cols; + ndev->total_col =3D min(aie2_max_col, ndev->metadata.cols); + + xrs_cfg.clk_list.num_levels =3D 3; + xrs_cfg.clk_list.cu_clk_list[0] =3D 0; + xrs_cfg.clk_list.cu_clk_list[1] =3D 800; + xrs_cfg.clk_list.cu_clk_list[2] =3D 1000; + xrs_cfg.sys_eff_factor =3D 1; + xrs_cfg.ddev =3D &xdna->ddev; + xrs_cfg.total_col =3D ndev->total_col; + + xdna->xrs_hdl =3D xrsm_init(&xrs_cfg); + if (!xdna->xrs_hdl) { + XDNA_ERR(xdna, "Initialize resolver failed"); + ret =3D -EINVAL; + goto stop_hw; + } =20 release_firmware(fw); return 0; diff --git a/drivers/accel/amdxdna/aie2_solver.c b/drivers/accel/amdxdna/ai= e2_solver.c new file mode 100644 index 000000000000..a537c66589a4 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.c @@ -0,0 +1,330 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include + +#include "aie2_solver.h" + +struct partition_node { + struct list_head list; + u32 nshared; /* # shared requests */ + u32 start_col; /* start column */ + u32 ncols; /* # columns */ + bool exclusive; /* can not be shared if set */ +}; + +struct solver_node { + struct list_head list; + u64 rid; /* Request ID from consumer */ + + struct partition_node *pt_node; + void *cb_arg; + u32 cols_len; + u32 start_cols[] __counted_by(cols_len); +}; + +struct solver_rgroup { + u32 rgid; + u32 nnode; + u32 npartition_node; + + DECLARE_BITMAP(resbit, XRS_MAX_COL); + struct list_head node_list; + struct list_head pt_node_list; +}; + +struct solver_state { + struct solver_rgroup rgp; + struct init_config cfg; + struct xrs_action_ops *actions; +}; + +static u32 calculate_gops(struct aie_qos *rqos) +{ + u32 service_rate =3D 0; + + if (rqos->latency) + service_rate =3D (1000 / rqos->latency); + + if (rqos->fps > service_rate) + return rqos->fps * rqos->gops; + + return service_rate * rqos->gops; +} + +/* + * qos_meet() - Check the QOS request can be met. + */ +static int qos_meet(struct solver_state *xrs, struct aie_qos *rqos, u32 cg= ops) +{ + u32 request_gops =3D calculate_gops(rqos) * xrs->cfg.sys_eff_factor; + + if (request_gops <=3D cgops) + return 0; + + return -EINVAL; +} + +/* + * sanity_check() - Do a basic sanity check on allocation request. + */ +static int sanity_check(struct solver_state *xrs, struct alloc_requests *r= eq) +{ + struct cdo_parts *cdop =3D &req->cdo; + struct aie_qos *rqos =3D &req->rqos; + u32 cu_clk_freq; + + if (cdop->ncols > xrs->cfg.total_col) + return -EINVAL; + + /* + * We can find at least one CDOs groups that meet the + * GOPs requirement. + */ + cu_clk_freq =3D xrs->cfg.clk_list.cu_clk_list[xrs->cfg.clk_list.num_level= s - 1]; + + if (qos_meet(xrs, rqos, cdop->qos_cap.opc * cu_clk_freq / 1000)) + return -EINVAL; + + return 0; +} + +static struct solver_node *rg_search_node(struct solver_rgroup *rgp, u64 r= id) +{ + struct solver_node *node; + + list_for_each_entry(node, &rgp->node_list, list) { + if (node->rid =3D=3D rid) + return node; + } + + return NULL; +} + +static void remove_partition_node(struct solver_rgroup *rgp, + struct partition_node *pt_node) +{ + pt_node->nshared--; + if (pt_node->nshared > 0) + return; + + list_del(&pt_node->list); + rgp->npartition_node--; + + bitmap_clear(rgp->resbit, pt_node->start_col, pt_node->ncols); + kfree(pt_node); +} + +static void remove_solver_node(struct solver_rgroup *rgp, + struct solver_node *node) +{ + list_del(&node->list); + rgp->nnode--; + + if (node->pt_node) + remove_partition_node(rgp, node->pt_node); + + kfree(node); +} + +static int get_free_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node; + u32 ncols =3D req->cdo.ncols; + u32 col, i; + + for (i =3D 0; i < snode->cols_len; i++) { + col =3D snode->start_cols[i]; + if (find_next_bit(xrs->rgp.resbit, XRS_MAX_COL, col) >=3D col + ncols) + break; + } + + if (i =3D=3D snode->cols_len) + return -ENODEV; + + pt_node =3D kzalloc(sizeof(*pt_node), GFP_KERNEL); + if (!pt_node) + return -ENOMEM; + + pt_node->nshared =3D 1; + pt_node->start_col =3D col; + pt_node->ncols =3D ncols; + + /* + * Before fully support latency in QoS, if a request + * specifies a non-zero latency value, it will not share + * the partition with other requests. + */ + if (req->rqos.latency) + pt_node->exclusive =3D true; + + list_add_tail(&pt_node->list, &xrs->rgp.pt_node_list); + xrs->rgp.npartition_node++; + bitmap_set(xrs->rgp.resbit, pt_node->start_col, pt_node->ncols); + + snode->pt_node =3D pt_node; + + return 0; +} + +static int allocate_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node, *rpt_node =3D NULL; + int idx, ret; + + ret =3D get_free_partition(xrs, snode, req); + if (!ret) + return ret; + + /* try to get a share-able partition */ + list_for_each_entry(pt_node, &xrs->rgp.pt_node_list, list) { + if (pt_node->exclusive) + continue; + + if (rpt_node && pt_node->nshared >=3D rpt_node->nshared) + continue; + + for (idx =3D 0; idx < snode->cols_len; idx++) { + if (snode->start_cols[idx] !=3D pt_node->start_col) + continue; + + if (req->cdo.ncols !=3D pt_node->ncols) + continue; + + rpt_node =3D pt_node; + break; + } + } + + if (!rpt_node) + return -ENODEV; + + rpt_node->nshared++; + snode->pt_node =3D rpt_node; + + return 0; +} + +static struct solver_node *create_solver_node(struct solver_state *xrs, + struct alloc_requests *req) +{ + struct cdo_parts *cdop =3D &req->cdo; + struct solver_node *node; + int ret; + + node =3D kzalloc(struct_size(node, start_cols, cdop->cols_len), GFP_KERNE= L); + if (!node) + return ERR_PTR(-ENOMEM); + + node->rid =3D req->rid; + node->cols_len =3D cdop->cols_len; + memcpy(node->start_cols, cdop->start_cols, cdop->cols_len * sizeof(u32)); + + ret =3D allocate_partition(xrs, node, req); + if (ret) + goto free_node; + + list_add_tail(&node->list, &xrs->rgp.node_list); + xrs->rgp.nnode++; + return node; + +free_node: + kfree(node); + return ERR_PTR(ret); +} + +static void fill_load_action(struct solver_state *xrs, + struct solver_node *snode, + struct xrs_action_load *action) +{ + action->rid =3D snode->rid; + action->part.start_col =3D snode->pt_node->start_col; + action->part.ncols =3D snode->pt_node->ncols; +} + +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_= arg) +{ + struct xrs_action_load load_act; + struct solver_node *snode; + struct solver_state *xrs; + int ret; + + xrs =3D (struct solver_state *)hdl; + + ret =3D sanity_check(xrs, req); + if (ret) { + drm_err(xrs->cfg.ddev, "invalid request"); + return ret; + } + + if (rg_search_node(&xrs->rgp, req->rid)) { + drm_err(xrs->cfg.ddev, "rid %lld is in-use", req->rid); + return -EEXIST; + } + + snode =3D create_solver_node(xrs, req); + if (IS_ERR(snode)) + return PTR_ERR(snode); + + fill_load_action(xrs, snode, &load_act); + ret =3D xrs->cfg.actions->load(cb_arg, &load_act); + if (ret) + goto free_node; + + snode->cb_arg =3D cb_arg; + + drm_dbg(xrs->cfg.ddev, "start col %d ncols %d\n", + snode->pt_node->start_col, snode->pt_node->ncols); + + return 0; + +free_node: + remove_solver_node(&xrs->rgp, snode); + + return ret; +} + +int xrs_release_resource(void *hdl, u64 rid) +{ + struct solver_state *xrs =3D hdl; + struct solver_node *node; + + node =3D rg_search_node(&xrs->rgp, rid); + if (!node) { + drm_err(xrs->cfg.ddev, "node not exist"); + return -ENODEV; + } + + xrs->cfg.actions->unload(node->cb_arg); + remove_solver_node(&xrs->rgp, node); + + return 0; +} + +void *xrsm_init(struct init_config *cfg) +{ + struct solver_rgroup *rgp; + struct solver_state *xrs; + + xrs =3D drmm_kzalloc(cfg->ddev, sizeof(*xrs), GFP_KERNEL); + if (!xrs) + return NULL; + + memcpy(&xrs->cfg, cfg, sizeof(*cfg)); + + rgp =3D &xrs->rgp; + INIT_LIST_HEAD(&rgp->node_list); + INIT_LIST_HEAD(&rgp->pt_node_list); + + return xrs; +} diff --git a/drivers/accel/amdxdna/aie2_solver.h b/drivers/accel/amdxdna/ai= e2_solver.h new file mode 100644 index 000000000000..9b1847bb46a6 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.h @@ -0,0 +1,154 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_SOLVER_H +#define _AIE2_SOLVER_H + +#define XRS_MAX_COL 128 + +/* + * Structure used to describe a partition. A partition is column based + * allocation unit described by its start column and number of columns. + */ +struct aie_part { + u32 start_col; + u32 ncols; +}; + +/* + * The QoS capabilities of a given AIE partition. + */ +struct aie_qos_cap { + u32 opc; /* operations per cycle */ + u32 dma_bw; /* DMA bandwidth */ +}; + +/* + * QoS requirement of a resource allocation. + */ +struct aie_qos { + u32 gops; /* Giga operations */ + u32 fps; /* Frames per second */ + u32 dma_bw; /* DMA bandwidth */ + u32 latency; /* Frame response latency */ + u32 exec_time; /* Frame execution time */ + u32 priority; /* Request priority */ +}; + +/* + * Structure used to describe a relocatable CDO (Configuration Data Object= ). + */ +struct cdo_parts { + u32 *start_cols; /* Start column array */ + u32 cols_len; /* Length of start column array */ + u32 ncols; /* # of column */ + struct aie_qos_cap qos_cap; /* CDO QoS capabilities */ +}; + +/* + * Structure used to describe a request to allocate. + */ +struct alloc_requests { + u64 rid; + struct cdo_parts cdo; + struct aie_qos rqos; /* Requested QoS */ +}; + +/* + * Load callback argument + */ +struct xrs_action_load { + u32 rid; + struct aie_part part; +}; + +/* + * Define the power level available + * + * POWER_LEVEL_MIN: + * Lowest power level. Usually set when all actions are unloaded. + * + * POWER_LEVEL_n + * Power levels 0 - n, is a step increase in system frequencies + */ +enum power_level { + POWER_LEVEL_MIN =3D 0x0, + POWER_LEVEL_0 =3D 0x1, + POWER_LEVEL_1 =3D 0x2, + POWER_LEVEL_2 =3D 0x3, + POWER_LEVEL_3 =3D 0x4, + POWER_LEVEL_4 =3D 0x5, + POWER_LEVEL_5 =3D 0x6, + POWER_LEVEL_6 =3D 0x7, + POWER_LEVEL_7 =3D 0x8, + POWER_LEVEL_NUM, +}; + +/* + * Structure used to describe the frequency table. + * Resource solver chooses the frequency from the table + * to meet the QOS requirements. + */ +struct clk_list_info { + u32 num_levels; /* available power levels */ + u32 cu_clk_list[POWER_LEVEL_NUM]; /* available aie clock frequen= cies in Mhz*/ +}; + +struct xrs_action_ops { + int (*load)(void *cb_arg, struct xrs_action_load *action); + int (*unload)(void *cb_arg); +}; + +/* + * Structure used to describe information for solver during initialization. + */ +struct init_config { + u32 total_col; + u32 sys_eff_factor; /* system efficiency factor */ + u32 latency_adj; /* latency adjustment in ms */ + struct clk_list_info clk_list; /* List of frequencies available in = system */ + struct drm_device *ddev; + struct xrs_action_ops *actions; +}; + +/* + * xrsm_init() - Register resource solver. Resource solver client needs + * to call this function to register itself. + * + * @cfg: The system metrics for resource solver to use + * + * Return: A resource solver handle + * + * Note: We should only create one handle per AIE array to be managed. + */ +void *xrsm_init(struct init_config *cfg); + +/* + * xrs_allocate_resource() - Request to allocate resources for a given con= text + * and a partition metadata. (See struct part_me= ta) + * + * @hdl: Resource solver handle obtained from xrs_init() + * @req: Input to the Resource solver including request id + * and partition metadata. + * @cb_arg: callback argument pointer + * + * Return: 0 when successful. + * Or standard error number when failing + * + * Note: + * There is no lock mechanism inside resource solver. So it is + * the caller's responsibility to lock down XCLBINs and grab + * necessary lock. + */ +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_= arg); + +/* + * xrs_release_resource() - Request to free resources for a given context. + * + * @hdl: Resource solver handle obtained from xrs_init() + * @rid: The Request ID to identify the requesting context + */ +int xrs_release_resource(void *hdl, u64 rid); +#endif /* _AIE2_SOLVER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 64bce970514b..c0710d3130fd 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -58,6 +58,7 @@ struct amdxdna_dev { struct drm_device ddev; struct amdxdna_dev_hdl *dev_handle; const struct amdxdna_dev_info *dev_info; + void *xrs_hdl; =20 struct mutex dev_lock; /* per device lock */ struct amdxdna_fw_ver fw_ver; --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2042.outbound.protection.outlook.com [40.107.93.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA30B1E47BE for ; Fri, 11 Oct 2024 23:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.42 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688389; cv=fail; b=NxI75UoRARKBlDKAz0xlW1RSbTMmsvztt72xhoIpoxTymS3yvF822PpFRD8AhHczMf/3ud/ecVYouQsuYxIOxp8pac/FfI5O0Xjp8wxdqrW0VRRsQ7IlTiMlx4g0zngex8hxh9PG2iFzHYoEfEoXO2HLy7SUkitlUTxggkh/3NE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688389; c=relaxed/simple; bh=S7wlLP0giZqASFE1Ouw0Y1yLiJpD0HNV0KBBEAZBf+8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ph1uAixtubVgXfaafeNzaLNdlRdf78WWHVuLCzLOoVYrNyrdsBO939O25Zf2IoEQqMucN0ZGJ3Yq1959mfucx9nN5YMybD8R7SgVmaBe6UmSLHB4Ajcmr9yp+s+hxqKovpPFM7B9NYc+z9d+ZIAFy8FbKsCvlNO8Of6L5y3XwRA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Wj8gGwFz; arc=fail smtp.client-ip=40.107.93.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Wj8gGwFz" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=qip9R38KxRNOQ5oTeUhYBdr5sHjOCY4tkEOrw7iTTHk82yQp1WQNYzKHMBjC/QS1xFM5grcUfPi47Duzlp2+NpmfWq6dmDK2naGqKbMtw2HFBnP/shzgL0w7PsnpNtjuzWI9TRWaU8nKyzDRo7dm3FdlFpUso45qLJHIxprehGkunDPllRTXSfekvJGW3v6u4AOWVNxLVfv+T47Lx49ZAXZMAyykrvSGfcABz4RMmAVQNvU4jdeCRhqEdGZ0RrsMwef514c8nzH5PdQcy3/NqOhkUgYflkOvqv1sSaFc9lwmo8Jmy55xLUoaq3CRU58oE/XBWthrP/JF/J+tr0WOaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TJ4tYURsB5WQCW4ul/xfqriiumhA4IN3Fw6VfqymEEE=; b=xpJanREWPbSgDjhpBuneM5yljXUm/lG4VGt2LCPrnCHhRQgUuNTyEog86xIIzHtu9E31GvDlJ5VWxSPUqUNd7ovJ/hUJ4cZsV2HLM8FEo8rvSfvpQ20nziKU+G5C4d3ae1aNcy3IcvRqQOvoBS7V96puDmFtzrrXBQSEloXIdMvbqA0Hf0VnnzOHQ12P7HS6Iqrxth9uD+PjSyq36KSsEH5MlASVwX7HBD7e7R4veBtNgTbOzed0uihDWz7p4vzRAo7b68ytiZttqUTl8S33K3uwFi+AInpm48UfppxCJhnqbZllxRq36dq48o44jroUPrO1G/fl1wuK5zLSORP98A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TJ4tYURsB5WQCW4ul/xfqriiumhA4IN3Fw6VfqymEEE=; b=Wj8gGwFz9SSnJx79dfD72Ug0cAkUsbfnXgC04VamVmpSNSEAWpXRdcIzci6L6N1x5ZSwhGNlvmVAXy00Jag73iITh5BPEpH5HDGNXHmPTc21qYW7iGNODz0Bk2jeijYkkHpdzEBPb5HAWpD0BCwZkFmgn5CTcC02j09HGapuD9M= Received: from SJ0PR03CA0249.namprd03.prod.outlook.com (2603:10b6:a03:3a0::14) by SA3PR12MB8437.namprd12.prod.outlook.com (2603:10b6:806:2f5::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.27; Fri, 11 Oct 2024 23:13:01 +0000 Received: from CO1PEPF000044F6.namprd21.prod.outlook.com (2603:10b6:a03:3a0:cafe::99) by SJ0PR03CA0249.outlook.office365.com (2603:10b6:a03:3a0::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.20 via Frontend Transport; Fri, 11 Oct 2024 23:13:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044F6.mail.protection.outlook.com (10.167.241.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:00 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:00 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:12:59 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:12:59 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 05/10] accel/amdxdna: Add hardware context Date: Fri, 11 Oct 2024 16:12:39 -0700 Message-ID: <20241011231244.3182625-6-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F6:EE_|SA3PR12MB8437:EE_ X-MS-Office365-Filtering-Correlation-Id: 6635cb03-f3d2-4397-51bc-08dcea4a4076 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?e870VeNwRxRPFTDnACoCDOVuTc7eNlykN2m+NOY4G4GjA/m2oxdZSujYX+BC?= =?us-ascii?Q?FtmfhrcjzMzssF/vp8H1w3HCbtI7K8kTtHre7l/cJoIT1XYMFBCcr5MmIBGs?= =?us-ascii?Q?bSOVtQcOvqUa87e2kKgz4tgjaZdHNQVByo6tCPCH72zMfOmEaIK8fXh8029q?= =?us-ascii?Q?bGBraaNVWbUgJT8bx1z+ZWDsOHX5s6Ua03rzGfai96MfuUTkhXHkjgXk5CIv?= =?us-ascii?Q?cFHnfSUSr047S3wU4Mz2sPPW2JcfIFje+4JCNvs/RdF+7mee+w+o0YzqSOpY?= =?us-ascii?Q?CgnCxWknzxKQQRcOTGNWcbAaa6m/22o4wQodqbTBUENSHey4dLb5zwfbqhwR?= =?us-ascii?Q?d6raoKhqUR72JG2ACqum50eQZenmrDNRlgP9ROwQ/5zCpNann9j7DyX7fyOy?= =?us-ascii?Q?Xt5JzBuH79XLm4+dE0r1gJj8DMNXgPJqMJi3ZYm1hE5DX23ux/zsrlRfcSw+?= =?us-ascii?Q?T0FKhgnxErtv6IM8A27D63Ei3kyj2xLKpp/+v3quLoickU8Suf6ZsgzPpuUk?= =?us-ascii?Q?0cD7z6Qs2r5B4QzFBtxVf80n1xywlLINKE35BimQ6dxfOPDIuLvLyGTIGHWv?= =?us-ascii?Q?yE0R63wRsrkaT8Pk4KgqPFapdXLqgh1SAqkDpVtfE/jBMJG8Ug/nU/YtWC55?= =?us-ascii?Q?5a/yC2LkYhOH2V+PZx3h8CXnypWYpLISGHW1XLYRBYxIckIAJGvQDscglUTs?= =?us-ascii?Q?MD5FLpU421rTeZOt++D84D4y9czV5udGGQH29QHyyjHPfZyEuIxBZqQNdi+z?= =?us-ascii?Q?PrTrP9tjR/xbSDGl2jkwzuCEVF+kTTBYyoEuW00vQlyakW0/2wHQBR96fTYm?= =?us-ascii?Q?m2lgFpuc1Rxqw22xLCyHIxFLNGxddjyMtvGwYQ1VHn3kcUFBENzF2EL8Qk5z?= =?us-ascii?Q?Vm5oTX7elEXDbUmmD9EfEmI1c5N2tseIBoYsg09a/FkqI1JkVB/J1r9WXr0q?= =?us-ascii?Q?qI9v4Ys07C11HsMguNR65/yGZq+RzC1KfhbWo+JGmunncapvXo6bhHPk1WGE?= =?us-ascii?Q?2k+3todgaTvWJ1MKH71tEZuQlTTPIDXU3rxb+4HW6aUzFgAgGjf2nHWcdTXy?= =?us-ascii?Q?8Muka3e7vZyiengVufgIGCTtwIigiXBiUAMYexRIs3H3GNLSBReL4URFthwb?= =?us-ascii?Q?2MlAThDtwiFFnrOQ9t17qAyI4heCfJ046tdut3uVsWH+xBCag/9M6V+kC5lw?= =?us-ascii?Q?K3gTaGCA0gKR1v6DkJkS3joqtVXhAv6ca/mCYaRoXVb8zlGhgUkHymVE5uks?= =?us-ascii?Q?51pgsbOpSoxtE35RyvznBz25CQpWJwfGwIrErO1c9y4cCG017QF5ZnMbef40?= =?us-ascii?Q?r+S5TcGLlWGcoXLNt4v5FUJSNAcAntQgfho+bPU2VbdrJk46bW8Ct1zw+QNN?= =?us-ascii?Q?9VP57xY=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:00.7868 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6635cb03-f3d2-4397-51bc-08dcea4a4076 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F6.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB8437 Content-Type: text/plain; charset="utf-8" The hardware can be shared among multiple user applications. The hardware resources are allocated/freed based on the request from user application via driver IOCTLs. DRM_IOCTL_AMDXDNA_CREATE_HWCTX Allocate tile columns and create a hardware context structure to track the usage and status of the resources. A hardware context ID is returned for XDNA command execution. DRM_IOCTL_AMDXDNA_DESTROY_HWCTX Release hardware context based on its ID. The tile columns belong to this hardware context will be reclaimed. DRM_IOCTL_AMDXDNA_CONFIG_HWCTX Config hardware context. Bind the hardware context to the required resources. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- drivers/accel/amdxdna/Makefile | 2 + drivers/accel/amdxdna/aie2_ctx.c | 186 ++++++++++++++++++++ drivers/accel/amdxdna/aie2_message.c | 90 ++++++++++ drivers/accel/amdxdna/aie2_pci.c | 43 +++++ drivers/accel/amdxdna/aie2_pci.h | 13 ++ drivers/accel/amdxdna/amdxdna_ctx.c | 218 ++++++++++++++++++++++++ drivers/accel/amdxdna/amdxdna_ctx.h | 38 +++++ drivers/accel/amdxdna/amdxdna_pci_drv.c | 124 +++++++++++++- drivers/accel/amdxdna/amdxdna_pci_drv.h | 20 +++ include/uapi/drm/amdxdna_accel.h | 131 ++++++++++++++ 10 files changed, 864 insertions(+), 1 deletion(-) create mode 100644 drivers/accel/amdxdna/aie2_ctx.c create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.c create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.h diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile index 39d3404fbc8f..c86c90dfd303 100644 --- a/drivers/accel/amdxdna/Makefile +++ b/drivers/accel/amdxdna/Makefile @@ -1,11 +1,13 @@ # SPDX-License-Identifier: GPL-2.0-only =20 amdxdna-y :=3D \ + aie2_ctx.o \ aie2_message.o \ aie2_pci.o \ aie2_psp.o \ aie2_smu.o \ aie2_solver.o \ + amdxdna_ctx.o \ amdxdna_mailbox.o \ amdxdna_mailbox_helper.o \ amdxdna_pci_drv.o \ diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_= ctx.c new file mode 100644 index 000000000000..022b2b0b015d --- /dev/null +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -0,0 +1,186 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "aie2_solver.h" +#include "amdxdna_ctx.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +static int aie2_hwctx_col_list(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct amdxdna_dev_hdl *ndev; + int start, end, first, last; + u32 width =3D 1, entries =3D 0; + int i; + + if (!hwctx->num_tiles) { + XDNA_ERR(xdna, "Number of tiles is zero"); + return -EINVAL; + } + + ndev =3D xdna->dev_handle; + if (unlikely(!ndev->metadata.core.row_count)) { + XDNA_WARN(xdna, "Core tile row count is zero"); + return -EINVAL; + } + + hwctx->num_col =3D hwctx->num_tiles / ndev->metadata.core.row_count; + if (!hwctx->num_col || hwctx->num_col > ndev->total_col) { + XDNA_ERR(xdna, "Invalid num_col %d", hwctx->num_col); + return -EINVAL; + } + + if (ndev->priv->col_align =3D=3D COL_ALIGN_NATURE) + width =3D hwctx->num_col; + + /* + * In range [start, end], find out columns that is multiple of width. + * 'first' is the first column, + * 'last' is the last column, + * 'entries' is the total number of columns. + */ + start =3D xdna->dev_info->first_col; + end =3D ndev->total_col - hwctx->num_col; + if (start > 0 && end =3D=3D 0) { + XDNA_DBG(xdna, "Force start from col 0"); + start =3D 0; + } + first =3D start + (width - start % width) % width; + last =3D end - end % width; + if (last >=3D first) + entries =3D (last - first) / width + 1; + XDNA_DBG(xdna, "start %d end %d first %d last %d", + start, end, first, last); + + if (unlikely(!entries)) { + XDNA_ERR(xdna, "Start %d end %d width %d", + start, end, width); + return -EINVAL; + } + + hwctx->col_list =3D kmalloc_array(entries, sizeof(*hwctx->col_list), GFP_= KERNEL); + if (!hwctx->col_list) + return -ENOMEM; + + hwctx->col_list_len =3D entries; + hwctx->col_list[0] =3D first; + for (i =3D 1; i < entries; i++) + hwctx->col_list[i] =3D hwctx->col_list[i - 1] + width; + + print_hex_dump_debug("col_list: ", DUMP_PREFIX_OFFSET, 16, 4, hwctx->col_= list, + entries * sizeof(*hwctx->col_list), false); + return 0; +} + +static int aie2_alloc_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct alloc_requests *xrs_req; + int ret; + + xrs_req =3D kzalloc(sizeof(*xrs_req), GFP_KERNEL); + if (!xrs_req) + return -ENOMEM; + + xrs_req->cdo.start_cols =3D hwctx->col_list; + xrs_req->cdo.cols_len =3D hwctx->col_list_len; + xrs_req->cdo.ncols =3D hwctx->num_col; + xrs_req->cdo.qos_cap.opc =3D hwctx->max_opc; + + xrs_req->rqos.gops =3D hwctx->qos.gops; + xrs_req->rqos.fps =3D hwctx->qos.fps; + xrs_req->rqos.dma_bw =3D hwctx->qos.dma_bandwidth; + xrs_req->rqos.latency =3D hwctx->qos.latency; + xrs_req->rqos.exec_time =3D hwctx->qos.frame_exec_time; + xrs_req->rqos.priority =3D hwctx->qos.priority; + + xrs_req->rid =3D (uintptr_t)hwctx; + + ret =3D xrs_allocate_resource(xdna->xrs_hdl, xrs_req, hwctx); + if (ret) + XDNA_ERR(xdna, "Allocate AIE resource failed, ret %d", ret); + + kfree(xrs_req); + return ret; +} + +static void aie2_release_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + int ret; + + ret =3D xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); + if (ret) + XDNA_ERR(xdna, "Release AIE resource failed, ret %d", ret); +} + +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_client *client =3D hwctx->client; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx_priv *priv; + int ret; + + priv =3D kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + hwctx->priv =3D priv; + + ret =3D aie2_hwctx_col_list(hwctx); + if (ret) { + XDNA_ERR(xdna, "Create col list failed, ret %d", ret); + goto free_priv; + } + + ret =3D aie2_alloc_resource(hwctx); + if (ret) { + XDNA_ERR(xdna, "Alloc hw resource failed, ret %d", ret); + goto free_col_list; + } + + hwctx->status =3D HWCTX_STAT_INIT; + + XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name); + + return 0; + +free_col_list: + kfree(hwctx->col_list); +free_priv: + kfree(priv); + return ret; +} + +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) +{ + aie2_release_resource(hwctx); + + kfree(hwctx->col_list); + kfree(hwctx->priv); + kfree(hwctx->cus); +} + +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + switch (type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + return -EOPNOTSUPP; + default: + XDNA_DBG(xdna, "Not supported type %d", type); + return -EOPNOTSUPP; + } +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index cbf8ee54c6c2..4b8a71bf4fae 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -3,13 +3,16 @@ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. */ =20 +#include #include #include #include +#include #include =20 #include "aie2_msg_priv.h" #include "aie2_pci.h" +#include "amdxdna_ctx.h" #include "amdxdna_mailbox.h" #include "amdxdna_mailbox_helper.h" #include "amdxdna_pci_drv.h" @@ -192,3 +195,90 @@ int aie2_query_firmware_version(struct amdxdna_dev_hdl= *ndev, =20 return 0; } + +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx= *hwctx) +{ + DECLARE_AIE2_MSG(create_ctx, MSG_OP_CREATE_CONTEXT); + struct amdxdna_dev *xdna =3D ndev->xdna; + struct xdna_mailbox_chann_res x2i; + struct xdna_mailbox_chann_res i2x; + struct cq_pair *cq_pair; + u32 intr_reg; + int ret; + + req.aie_type =3D 1; + req.start_col =3D hwctx->start_col; + req.num_col =3D hwctx->num_col; + req.num_cq_pairs_requested =3D 1; + req.pasid =3D hwctx->client->pasid; + req.context_priority =3D 2; + + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + hwctx->fw_ctx_id =3D resp.context_id; + WARN_ONCE(hwctx->fw_ctx_id =3D=3D -1, "Unexpected context id"); + + cq_pair =3D &resp.cq_pair[0]; + x2i.mb_head_ptr_reg =3D AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.head_addr); + x2i.mb_tail_ptr_reg =3D AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.tail_addr); + x2i.rb_start_addr =3D AIE2_SRAM_OFF(ndev, cq_pair->x2i_q.buf_addr); + x2i.rb_size =3D cq_pair->x2i_q.buf_size; + + i2x.mb_head_ptr_reg =3D AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.head_addr); + i2x.mb_tail_ptr_reg =3D AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.tail_addr); + i2x.rb_start_addr =3D AIE2_SRAM_OFF(ndev, cq_pair->i2x_q.buf_addr); + i2x.rb_size =3D cq_pair->i2x_q.buf_size; + + ret =3D pci_irq_vector(to_pci_dev(xdna->ddev.dev), resp.msix_id); + if (ret =3D=3D -EINVAL) { + XDNA_ERR(xdna, "not able to create channel"); + goto out_destroy_context; + } + + intr_reg =3D i2x.mb_head_ptr_reg + 4; + hwctx->priv->mbox_chann =3D xdna_mailbox_create_channel(ndev->mbox, &x2i,= &i2x, + intr_reg, ret); + if (!hwctx->priv->mbox_chann) { + XDNA_ERR(xdna, "not able to create channel"); + ret =3D -EINVAL; + goto out_destroy_context; + } + + XDNA_DBG(xdna, "%s mailbox channel irq: %d, msix_id: %d", + hwctx->name, ret, resp.msix_id); + XDNA_DBG(xdna, "%s created fw ctx %d pasid %d", hwctx->name, + hwctx->fw_ctx_id, hwctx->client->pasid); + + return 0; + +out_destroy_context: + aie2_destroy_context(ndev, hwctx); + return ret; +} + +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx) +{ + DECLARE_AIE2_MSG(destroy_ctx, MSG_OP_DESTROY_CONTEXT); + struct amdxdna_dev *xdna =3D ndev->xdna; + int ret; + + if (hwctx->fw_ctx_id =3D=3D -1) + return 0; + + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); + + req.context_id =3D hwctx->fw_ctx_id; + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + XDNA_WARN(xdna, "%s destroy context failed, ret %d", hwctx->name, ret); + + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); + XDNA_DBG(xdna, "%s destroyed fw ctx %d", hwctx->name, + hwctx->fw_ctx_id); + hwctx->priv->mbox_chann =3D NULL; + hwctx->fw_ctx_id =3D -1; + + return ret; +} diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index aa66f401235a..ee9f114bc229 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -3,6 +3,7 @@ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. */ =20 +#include #include #include #include @@ -15,6 +16,7 @@ #include "aie2_msg_priv.h" #include "aie2_pci.h" #include "aie2_solver.h" +#include "amdxdna_ctx.h" #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 @@ -210,6 +212,43 @@ static void aie2_mgmt_fw_fini(struct amdxdna_dev_hdl *= ndev) XDNA_DBG(ndev->xdna, "Firmware suspended"); } =20 +static int aie2_xrs_load(void *cb_arg, struct xrs_action_load *action) +{ + struct amdxdna_hwctx *hwctx =3D cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna =3D hwctx->client->xdna; + + hwctx->start_col =3D action->part.start_col; + hwctx->num_col =3D action->part.ncols; + ret =3D aie2_create_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "create context failed, ret %d", ret); + + return ret; +} + +static int aie2_xrs_unload(void *cb_arg) +{ + struct amdxdna_hwctx *hwctx =3D cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna =3D hwctx->client->xdna; + + ret =3D aie2_destroy_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "destroy context failed, ret %d", ret); + + return ret; +} + +static struct xrs_action_ops aie2_xrs_actions =3D { + .load =3D aie2_xrs_load, + .unload =3D aie2_xrs_unload, +}; + static void aie2_hw_stop(struct amdxdna_dev *xdna) { struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); @@ -420,6 +459,7 @@ static int aie2_init(struct amdxdna_dev *xdna) xrs_cfg.clk_list.cu_clk_list[2] =3D 1000; xrs_cfg.sys_eff_factor =3D 1; xrs_cfg.ddev =3D &xdna->ddev; + xrs_cfg.actions =3D &aie2_xrs_actions; xrs_cfg.total_col =3D ndev->total_col; =20 xdna->xrs_hdl =3D xrsm_init(&xrs_cfg); @@ -456,4 +496,7 @@ static void aie2_fini(struct amdxdna_dev *xdna) const struct amdxdna_dev_ops aie2_ops =3D { .init =3D aie2_init, .fini =3D aie2_fini, + .hwctx_init =3D aie2_hwctx_init, + .hwctx_fini =3D aie2_hwctx_fini, + .hwctx_config =3D aie2_hwctx_config, }; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 4c81d10a0998..b789286bc9d4 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -77,6 +77,7 @@ enum psp_reg_idx { }; =20 struct amdxdna_fw_ver; +struct amdxdna_hwctx; =20 struct psp_config { const void *fw_buf; @@ -117,6 +118,10 @@ struct rt_config { u32 value; }; =20 +struct amdxdna_hwctx_priv { + void *mbox_chann; +}; + struct amdxdna_dev_hdl { struct amdxdna_dev *xdna; const struct amdxdna_dev_priv *priv; @@ -189,4 +194,12 @@ int aie2_query_aie_version(struct amdxdna_dev_hdl *nde= v, struct aie_version *ver int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metad= ata *metadata); int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, struct amdxdna_fw_ver *fw_ver); +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx= *hwctx); +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); + +/* aie2_hwctx.c */ +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size); + #endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/am= dxdna_ctx.c new file mode 100644 index 000000000000..8acf8bfe0db9 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include + +#include "amdxdna_ctx.h" +#include "amdxdna_pci_drv.h" + +#define MAX_HWCTX_ID 255 + +static void amdxdna_hwctx_destroy(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + + /* At this point, user is not able to submit new commands */ + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->hwctx_fini(hwctx); + mutex_unlock(&xdna->dev_lock); + + kfree(hwctx->name); + kfree(hwctx); +} + +/* + * This should be called in close() and remove(). DO NOT call in other sys= calls. + * This guarantee that when hwctx and resources will be released, if user + * doesn't call amdxdna_drm_destroy_hwctx_ioctl. + */ +void amdxdna_hwctx_remove_all(struct amdxdna_client *client) +{ + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + XDNA_DBG(client->xdna, "PID %d close HW context %d", + client->pid, hwctx->id); + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); + amdxdna_hwctx_destroy(hwctx); + mutex_lock(&client->hwctx_lock); + } + mutex_unlock(&client->hwctx_lock); +} + +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_drm_create_hwctx *args =3D data; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret, idx; + + if (args->ext || args->ext_flags) + return -EINVAL; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + hwctx =3D kzalloc(sizeof(*hwctx), GFP_KERNEL); + if (!hwctx) { + ret =3D -ENOMEM; + goto exit; + } + + if (copy_from_user(&hwctx->qos, u64_to_user_ptr(args->qos_p), sizeof(hwct= x->qos))) { + XDNA_ERR(xdna, "Access QoS info failed"); + ret =3D -EFAULT; + goto free_hwctx; + } + + hwctx->client =3D client; + hwctx->fw_ctx_id =3D -1; + hwctx->num_tiles =3D args->num_tiles; + hwctx->mem_size =3D args->mem_size; + hwctx->max_opc =3D args->max_opc; + mutex_lock(&client->hwctx_lock); + ret =3D idr_alloc_cyclic(&client->hwctx_idr, hwctx, 0, MAX_HWCTX_ID, GFP_= KERNEL); + if (ret < 0) { + mutex_unlock(&client->hwctx_lock); + XDNA_ERR(xdna, "Allocate hwctx ID failed, ret %d", ret); + goto free_hwctx; + } + hwctx->id =3D ret; + mutex_unlock(&client->hwctx_lock); + + hwctx->name =3D kasprintf(GFP_KERNEL, "hwctx.%d.%d", client->pid, hwctx->= id); + if (!hwctx->name) { + ret =3D -ENOMEM; + goto rm_id; + } + + mutex_lock(&xdna->dev_lock); + ret =3D xdna->dev_info->ops->hwctx_init(hwctx); + if (ret) { + mutex_unlock(&xdna->dev_lock); + XDNA_ERR(xdna, "Init hwctx failed, ret %d", ret); + goto free_name; + } + args->handle =3D hwctx->id; + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "PID %d create HW context %d, ret %d", client->pid, args->= handle, ret); + drm_dev_exit(idx); + return 0; + +free_name: + kfree(hwctx->name); +rm_id: + mutex_lock(&client->hwctx_lock); + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); +free_hwctx: + kfree(hwctx); +exit: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, st= ruct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_drm_destroy_hwctx *args =3D data; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret =3D 0, idx; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + mutex_lock(&client->hwctx_lock); + hwctx =3D idr_find(&client->hwctx_idr, args->handle); + if (!hwctx) { + mutex_unlock(&client->hwctx_lock); + ret =3D -EINVAL; + XDNA_DBG(xdna, "PID %d HW context %d not exist", + client->pid, args->handle); + goto out; + } + idr_remove(&client->hwctx_idr, hwctx->id); + mutex_unlock(&client->hwctx_lock); + + amdxdna_hwctx_destroy(hwctx); + + XDNA_DBG(xdna, "PID %d destroyed HW context %d", client->pid, args->handl= e); +out: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_drm_config_hwctx *args =3D data; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + u32 buf_size; + void *buf; + u64 val; + int ret; + + if (!xdna->dev_info->ops->hwctx_config) + return -EOPNOTSUPP; + + val =3D args->param_val; + buf_size =3D args->param_val_size; + + switch (args->param_type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + /* For those types that param_val is pointer */ + if (buf_size > PAGE_SIZE) { + XDNA_ERR(xdna, "Config CU param buffer too large"); + return -E2BIG; + } + + /* Hwctx needs to keep buf */ + buf =3D kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, u64_to_user_ptr(val), buf_size)) { + kfree(buf); + return -EFAULT; + } + + break; + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + /* For those types that param_val is a value */ + buf =3D NULL; + buf_size =3D 0; + break; + default: + XDNA_DBG(xdna, "Unknown HW context config type %d", args->param_type); + return -EINVAL; + } + + mutex_lock(&xdna->dev_lock); + hwctx =3D idr_find(&client->hwctx_idr, args->handle); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handl= e); + ret =3D -EINVAL; + goto unlock; + } + + ret =3D xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, b= uf, buf_size); + +unlock: + mutex_unlock(&xdna->dev_lock); + kfree(buf); + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/am= dxdna_ctx.h new file mode 100644 index 000000000000..3627770e5a98 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_CTX_H_ +#define _AMDXDNA_CTX_H_ + +struct amdxdna_hwctx { + struct amdxdna_client *client; + struct amdxdna_hwctx_priv *priv; + char *name; + + u32 id; + u32 max_opc; + u32 num_tiles; + u32 mem_size; + u32 fw_ctx_id; + u32 col_list_len; + u32 *col_list; + u32 start_col; + u32 num_col; +#define HWCTX_STAT_INIT 0 +#define HWCTX_STAT_READY 1 +#define HWCTX_STAT_STOP 2 + u32 status; + u32 old_status; + + struct amdxdna_qos_info qos; + struct amdxdna_hwctx_param_config_cu *cus; +}; + +void amdxdna_hwctx_remove_all(struct amdxdna_client *client); +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, st= ruct drm_file *filp); + +#endif /* _AMDXDNA_CTX_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 7a5945854e26..49e896092209 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -3,13 +3,16 @@ * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. */ =20 +#include #include #include #include #include #include +#include #include =20 +#include "amdxdna_ctx.h" #include "amdxdna_pci_drv.h" =20 /* @@ -33,13 +36,117 @@ static const struct amdxdna_device_id amdxdna_ids[] = =3D { {0} }; =20 -DEFINE_DRM_ACCEL_FOPS(amdxdna_fops); +static int amdxdna_drm_open(struct drm_device *ddev, struct drm_file *filp) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(ddev); + struct amdxdna_client *client; + int ret; + + client =3D kzalloc(sizeof(*client), GFP_KERNEL); + if (!client) + return -ENOMEM; + + client->pid =3D pid_nr(filp->pid); + client->xdna =3D xdna; + + client->sva =3D iommu_sva_bind_device(xdna->ddev.dev, current->mm); + if (IS_ERR(client->sva)) { + ret =3D PTR_ERR(client->sva); + XDNA_ERR(xdna, "SVA bind device failed, ret %d", ret); + goto failed; + } + client->pasid =3D iommu_sva_get_pasid(client->sva); + if (client->pasid =3D=3D IOMMU_PASID_INVALID) { + XDNA_ERR(xdna, "SVA get pasid failed"); + ret =3D -ENODEV; + goto unbind_sva; + } + mutex_init(&client->hwctx_lock); + idr_init_base(&client->hwctx_idr, AMDXDNA_INVALID_CTX_HANDLE + 1); + + mutex_lock(&xdna->dev_lock); + list_add_tail(&client->node, &xdna->client_list); + mutex_unlock(&xdna->dev_lock); + + filp->driver_priv =3D client; + client->filp =3D filp; + + XDNA_DBG(xdna, "pid %d opened", client->pid); + return 0; + +unbind_sva: + iommu_sva_unbind_device(client->sva); +failed: + kfree(client); + + return ret; +} + +static void amdxdna_drm_close(struct drm_device *ddev, struct drm_file *fi= lp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(ddev); + + XDNA_DBG(xdna, "closing pid %d", client->pid); + + idr_destroy(&client->hwctx_idr); + mutex_destroy(&client->hwctx_lock); + + iommu_sva_unbind_device(client->sva); + + XDNA_DBG(xdna, "pid %d closed", client->pid); + kfree(client); +} + +static int amdxdna_flush(struct file *f, fl_owner_t id) +{ + struct drm_file *filp =3D f->private_data; + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D client->xdna; + int idx; + + XDNA_DBG(xdna, "PID %d flushing...", client->pid); + if (!drm_dev_enter(&xdna->ddev, &idx)) + return 0; + + mutex_lock(&xdna->dev_lock); + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + amdxdna_hwctx_remove_all(client); + + drm_dev_exit(idx); + return 0; +} + +static const struct drm_ioctl_desc amdxdna_drm_ioctls[] =3D { + /* Context */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_HWCTX, amdxdna_drm_create_hwctx_ioctl, 0= ), + DRM_IOCTL_DEF_DRV(AMDXDNA_DESTROY_HWCTX, amdxdna_drm_destroy_hwctx_ioctl,= 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_CONFIG_HWCTX, amdxdna_drm_config_hwctx_ioctl, 0= ), +}; + +static const struct file_operations amdxdna_fops =3D { + .owner =3D THIS_MODULE, + .open =3D accel_open, + .release =3D drm_release, + .flush =3D amdxdna_flush, + .unlocked_ioctl =3D drm_ioctl, + .compat_ioctl =3D drm_compat_ioctl, + .poll =3D drm_poll, + .read =3D drm_read, + .llseek =3D noop_llseek, + .mmap =3D drm_gem_mmap, +}; =20 const struct drm_driver amdxdna_drm_drv =3D { .driver_features =3D DRIVER_GEM | DRIVER_COMPUTE_ACCEL, .fops =3D &amdxdna_fops, .name =3D "amdxdna_accel_driver", .desc =3D "AMD XDNA DRM implementation", + .open =3D amdxdna_drm_open, + .postclose =3D amdxdna_drm_close, + .ioctls =3D amdxdna_drm_ioctls, + .num_ioctls =3D ARRAY_SIZE(amdxdna_drm_ioctls), }; =20 static const struct amdxdna_dev_info * @@ -69,6 +176,7 @@ static int amdxdna_probe(struct pci_dev *pdev, const str= uct pci_device_id *id) return -ENODEV; =20 drmm_mutex_init(&xdna->ddev, &xdna->dev_lock); + INIT_LIST_HEAD(&xdna->client_list); pci_set_drvdata(pdev, xdna); =20 mutex_lock(&xdna->dev_lock); @@ -105,11 +213,25 @@ static int amdxdna_probe(struct pci_dev *pdev, const = struct pci_device_id *id) static void amdxdna_remove(struct pci_dev *pdev) { struct amdxdna_dev *xdna =3D pci_get_drvdata(pdev); + struct amdxdna_client *client; =20 drm_dev_unplug(&xdna->ddev); amdxdna_sysfs_fini(xdna); =20 mutex_lock(&xdna->dev_lock); + client =3D list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + while (client) { + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + + amdxdna_hwctx_remove_all(client); + + mutex_lock(&xdna->dev_lock); + client =3D list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + } + xdna->dev_info->ops->fini(xdna); mutex_unlock(&xdna->dev_lock); } diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index c0710d3130fd..5ec7fe168406 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -18,6 +18,7 @@ extern const struct drm_driver amdxdna_drm_drv; =20 struct amdxdna_dev; +struct amdxdna_hwctx; =20 /* * struct amdxdna_dev_ops - Device hardware operation callbacks @@ -25,6 +26,9 @@ struct amdxdna_dev; struct amdxdna_dev_ops { int (*init)(struct amdxdna_dev *xdna); void (*fini)(struct amdxdna_dev *xdna); + int (*hwctx_init)(struct amdxdna_hwctx *hwctx); + void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); + int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, voi= d *buf, u32 size); }; =20 /* @@ -61,6 +65,7 @@ struct amdxdna_dev { void *xrs_hdl; =20 struct mutex dev_lock; /* per device lock */ + struct list_head client_list; struct amdxdna_fw_ver fw_ver; }; =20 @@ -73,6 +78,21 @@ struct amdxdna_device_id { const struct amdxdna_dev_info *dev_info; }; =20 +/* + * struct amdxdna_client - amdxdna client + * A per fd data structure for managing context and other user process stu= ffs. + */ +struct amdxdna_client { + struct list_head node; + pid_t pid; + struct mutex hwctx_lock; /* protect hwctx */ + struct idr hwctx_idr; + struct amdxdna_dev *xdna; + struct drm_file *filp; + struct iommu_sva *sva; + int pasid; +}; + /* Add device info below */ extern const struct amdxdna_dev_info dev_npu1_info; extern const struct amdxdna_dev_info dev_npu2_info; diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h index 6d97e8e90cf6..133da8c007d0 100644 --- a/include/uapi/drm/amdxdna_accel.h +++ b/include/uapi/drm/amdxdna_accel.h @@ -6,17 +6,148 @@ #ifndef _UAPI_AMDXDNA_ACCEL_H_ #define _UAPI_AMDXDNA_ACCEL_H_ =20 +#include #include "drm.h" =20 #if defined(__cplusplus) extern "C" { #endif =20 +#define AMDXDNA_INVALID_CTX_HANDLE 0 + enum amdxdna_device_type { AMDXDNA_DEV_TYPE_UNKNOWN =3D -1, AMDXDNA_DEV_TYPE_KMQ, }; =20 +enum amdxdna_drm_ioctl_id { + DRM_AMDXDNA_CREATE_HWCTX, + DRM_AMDXDNA_DESTROY_HWCTX, + DRM_AMDXDNA_CONFIG_HWCTX, +}; + +/** + * struct qos_info - QoS information for driver. + * @gops: Giga operations per second. + * @fps: Frames per second. + * @dma_bandwidth: DMA bandwidtha. + * @latency: Frame response latency. + * @frame_exec_time: Frame execution time. + * @priority: Request priority. + * + * User program can provide QoS hints to driver. + */ +struct amdxdna_qos_info { + __u32 gops; + __u32 fps; + __u32 dma_bandwidth; + __u32 latency; + __u32 frame_exec_time; + __u32 priority; +}; + +/** + * struct amdxdna_drm_create_hwctx - Create hardware context. + * @ext: MBZ. + * @ext_flags: MBZ. + * @qos_p: Address of QoS info. + * @umq_bo: BO handle for user mode queue(UMQ). + * @log_buf_bo: BO handle for log buffer. + * @max_opc: Maximum operations per cycle. + * @num_tiles: Number of AIE tiles. + * @mem_size: Size of AIE tile memory. + * @umq_doorbell: Returned offset of doorbell associated with UMQ. + * @handle: Returned hardware context handle. + * @pad: Structure padding. + */ +struct amdxdna_drm_create_hwctx { + __u64 ext; + __u64 ext_flags; + __u64 qos_p; + __u32 umq_bo; + __u32 log_buf_bo; + __u32 max_opc; + __u32 num_tiles; + __u32 mem_size; + __u32 umq_doorbell; + __u32 handle; + __u32 pad; +}; + +/** + * struct amdxdna_drm_destroy_hwctx - Destroy hardware context. + * @handle: Hardware context handle. + * @pad: Structure padding. + */ +struct amdxdna_drm_destroy_hwctx { + __u32 handle; + __u32 pad; +}; + +/** + * struct amdxdna_cu_config - configuration for one CU + * @cu_bo: CU configuration buffer bo handle. + * @cu_func: Function of a CU. + * @pad: Structure padding. + */ +struct amdxdna_cu_config { + __u32 cu_bo; + __u8 cu_func; + __u8 pad[3]; +}; + +/** + * struct amdxdna_hwctx_param_config_cu - configuration for CUs in hardwar= e context + * @num_cus: Number of CUs to configure. + * @pad: Structure padding. + * @cu_configs: Array of CU configurations of struct amdxdna_cu_config. + */ +struct amdxdna_hwctx_param_config_cu { + __u16 num_cus; + __u16 pad[3]; + struct amdxdna_cu_config cu_configs[] __counted_by(num_cus); +}; + +enum amdxdna_drm_config_hwctx_param { + DRM_AMDXDNA_HWCTX_CONFIG_CU, + DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF, + DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF, + DRM_AMDXDNA_HWCTX_CONFIG_NUM +}; + +/** + * struct amdxdna_drm_config_hwctx - Configure hardware context. + * @handle: hardware context handle. + * @param_type: Value in enum amdxdna_drm_config_hwctx_param. Specifies the + * structure passed in via param_val. + * @param_val: A structure specified by the param_type struct member. + * @param_val_size: Size of the parameter buffer pointed to by the param_v= al. + * If param_val is not a pointer, driver can ignore this. + * @pad: Structure padding. + * + * Note: if the param_val is a pointer pointing to a buffer, the maximum s= ize + * of the buffer is 4KiB(PAGE_SIZE). + */ +struct amdxdna_drm_config_hwctx { + __u32 handle; + __u32 param_type; + __u64 param_val; + __u32 param_val_size; + __u32 pad; +}; + +#define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ + struct amdxdna_drm_create_hwctx) + +#define DRM_IOCTL_AMDXDNA_DESTROY_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_DESTROY_HWCTX, \ + struct amdxdna_drm_destroy_hwctx) + +#define DRM_IOCTL_AMDXDNA_CONFIG_HWCTX \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CONFIG_HWCTX, \ + struct amdxdna_drm_config_hwctx) + #if defined(__cplusplus) } /* extern c end */ #endif --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2061.outbound.protection.outlook.com [40.107.223.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C6AD1E767F for ; Fri, 11 Oct 2024 23:13:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.61 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688399; cv=fail; b=d1vkJOpD6LBOT53kOFJuxS1OHpcPCYzxflQ2qK2w0fxPZ/3r1vS2TH39VrwNGc7V6uRBskvrQ6GRTQUnG4oZDBZqtlfSf0F1qt+hETI4lE1Kq9i0+Nc4hQHsLqH51FDbhWUQwHBpJt7pf5EEif31lvg1smQX3B1z8UE3rTxYzdk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688399; c=relaxed/simple; bh=jEjOX2LaqW4Kk7+RCSz2tILH5xFZMNv10yMKmEuHj48=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Vr8As2L+HN+Y+Pb+OQ75UvVXTQdZgy2vaSKB04uIXieI/CW+azzZ0qixbM24KEu4I7WSBY+op1DnMVddymhBh5tLr/HktPWFD4xyUhLWK3hBr2d6ydoOm318YAdXdGrugsT6tsOQ6+/8k8umMWlQX4nPyjRmddvbAComVo+AEfU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=3/P340i6; arc=fail smtp.client-ip=40.107.223.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="3/P340i6" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=g3guLwdBNs9+kYhbB6LcJR+g//xZH9RHXMD5zzQIfCkt5o5a45V/Kvw8R1vdPgkMHdDsBSq3ahM6NbziKI9BbkydGddkL8mQP4zKH1gGnPIQj16U5lj+r/kh2/5stK1tVxf66SeN9jT0Pb0O2jn/RdOZO8espLx2w35VXbs7SilPoSeJPOe3KsHUMMODmD9JzGuibjrEdpZ5DwnTw3DwseQrRaRZbwUBwUUehpstXvQqzrE5Vm+uzag/2rm7VbdQPnjpuxKlWjz1E52NT7px8RBK4cXaQ9JyEYD7s6uWxG++Yp2vBNEQVUWqtB7B+bcUKLdp+VaSSf5EauzWCMNPaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NAJTzHeboB/lqE9B3N21ppD5/cuSoAZqj7BsJzmkJEk=; b=WfP6fJBOiEiYlxdMgtXptyA5sCovALkHZAEH1SDXBcWHOmpBU4REkalcLDXXqROKPuTLVJBVJVPemf3poeWlZW9XA5VP9ssOwyUG7JukLm5k1SnU/qr8I6Dm4kd/amScakPC0Ianw02FCcyt8eRA2ftdp0zRq4AwdsZ4vLfBjAPDrJuekNYAYT2MxkbXkPOegb6w5n+4RCTjhLKXIK1/51JKW0zo1zbknt4+92gYg1l8hhctrZucfOBMCHvdxiDXR+fTTmHscIG9pY91mam2Z86xg9GzGpPazRg0+LIcaC9qIJgQgPvchxHuxLYfdiR5huI+OpOetin7Py1RE7LnSQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NAJTzHeboB/lqE9B3N21ppD5/cuSoAZqj7BsJzmkJEk=; b=3/P340i6ZkL78SCtV8lY5tIJLxcsmX4/C2Skp3aX08kPd3Kbkg0+iNDwyT6w91zZOfRSUI3F+d/Ty8OSMurVO8V9DgXR/z15Gl5pTVED973lt1AkCEPiO8UXYbI6aO0kd/KfJGMKBjvdvbJZhR4FPOw+2nNiMrTvn07heq5vh5k= Received: from BYAPR05CA0093.namprd05.prod.outlook.com (2603:10b6:a03:e0::34) by SA1PR12MB7320.namprd12.prod.outlook.com (2603:10b6:806:2b7::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 23:13:09 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::99) by BYAPR05CA0093.outlook.office365.com (2603:10b6:a03:e0::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.9 via Frontend Transport; Fri, 11 Oct 2024 23:13:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:08 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:00 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:00 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 06/10] accel/amdxdna: Add GEM buffer object management Date: Fri, 11 Oct 2024 16:12:40 -0700 Message-ID: <20241011231244.3182625-7-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: None (SATLEXMB03.amd.com: lizhi.hou@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|SA1PR12MB7320:EE_ X-MS-Office365-Filtering-Correlation-Id: c5a89056-f886-49da-daa6-08dcea4a4502 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?NHzR2yUL0TKQd/SuUl1zkDJn5Z/SdjT1lMkNfIXsEgKb+Tzn6SNi4gQM+lm+?= =?us-ascii?Q?7pqnFjo3oQp3Sr6a8KAGaJaLW8isdxXYot54I0+cLco+bimtazw7UbRiQ/RD?= =?us-ascii?Q?si+iTBmrsa2ob/Q6qYTxJSV+4cwMswqgNNejfkWf+kmC8aWFEABa2cbYR4ja?= =?us-ascii?Q?/E8rxupGpqIh4TOMkR70C7GJpGYhCm9LAxxgffT2PGt5zlJ3/sSr+u9lidTS?= =?us-ascii?Q?BAbbbiWphLRfLJoVaz8vtqRjZ9SEVPDwl1ZJfZGBhjh1MsPd+r94A7EO1UW+?= =?us-ascii?Q?pytRvmCEchQM9knU2M9Rk98YsBEKIhT4TEW/SQKvqkc5aIsqo5vqfjTsTua6?= =?us-ascii?Q?/Z/0mrU6qCPuPaN2auv+PV94UaSGxqbAhZgAsaseS7SsuZlKTwrWNmN0xA9W?= =?us-ascii?Q?o11v8Qud3swp73oZgtqSoRSESUs7wfjzayFh+fwJlo7yXs6luEgX4F2sPihm?= =?us-ascii?Q?xP/nYlPchNgQJRmvKV0SKiXK+4UBLfoYZKqlk+6a2Vtbxq8GbFhZQuVhliSL?= =?us-ascii?Q?kggZPXGCD3UuFn28RsloYsLXk/96FT6lp92W0qjoucRFEk0/txiakS6KYnCW?= =?us-ascii?Q?tPfacZ7ZzWRA4xZ6DdJeBISWOkrMqKXWL4v6PWu8uLatNJRKIYiu9qiT1/jU?= =?us-ascii?Q?ExsCCYoGVKdSvTdEy1T4c4xbfpUc3qofv0JGXFqKKgvyKXYwG5PwvUsrHsG5?= =?us-ascii?Q?MTgrrjqKunKQRzAUBbcq7EThhXnF70cTUih0ivH70wTFFQSJUlhHKXwqfENx?= =?us-ascii?Q?cm7SOecKiL2+54OaUoG0pD+Kj8lb/73Y+FFOKiCQ7S4NbLVVm4KPhHcSgb3e?= =?us-ascii?Q?I6p2szsa+jBIo53LiWSYh2fqY6KQhR/hfqT8b4W4OHRRys5ba+nWgv2Po6al?= =?us-ascii?Q?y3KVgth9iYw39WmoZc36y0N6rSbT2dYzGyy8bewWevDBHDlpF4aZUf9hUA7W?= =?us-ascii?Q?q2BEmcLcYiHaoloLdIzEbZT8EBgl5T5pHdvDEjiSqPlEkZia1RASIVf264wU?= =?us-ascii?Q?EckGhn4F8kGPAu+8G2Yu4JGZlTANe80Xj91fH8XBRIjaocjke9bRRoMN5Fl0?= =?us-ascii?Q?a84FWBR/iriPJ42T6z4U+rn7kkH6gm6DREcZjKAV6YvyTXNtXbVKfn6GHwmg?= =?us-ascii?Q?evTrJq0cshXV91KWHA8fLyDLFtaIuk890nQ0277q0lDD267kukLGA5MSEX2S?= =?us-ascii?Q?oBr8b5NHg5c5RSv1HYEhzPZYtDPGHkB0pYKtFfzYlnJ3+ty8I3T3L4LJBumG?= =?us-ascii?Q?rWSvBTf9Y0mRoKcIUwXJrj1fbihWNcBQRYouaY4Ju7PTaq5t893CEcMcFlrG?= =?us-ascii?Q?tuT/NDVorDV7feeNKqwewCTuQpukHFqa4lKYN2Vo01WZKg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:08.4187 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c5a89056-f886-49da-daa6-08dcea4a4502 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7320 Content-Type: text/plain; charset="utf-8" There different types of BOs are supported: - shmem A user application uses shmem BOs as input/output for its workload running on NPU. - device memory heap The fixed size buffer dedicated to the device. - device buffer The buffer object allocated from device memory heap. - command buffer The buffer object created for delivering commands. The command buffer object is small and pinned on creation. New IOCTLs are added: CREATE_BO, GET_BO_INFO, SYNC_BO. SYNC_BO is used to explicitly flush CPU cache for BO memory. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- drivers/accel/amdxdna/Makefile | 1 + drivers/accel/amdxdna/aie2_ctx.c | 85 +++- drivers/accel/amdxdna/aie2_message.c | 79 +++ drivers/accel/amdxdna/aie2_pci.h | 3 + drivers/accel/amdxdna/amdxdna_ctx.h | 10 + drivers/accel/amdxdna/amdxdna_gem.c | 627 ++++++++++++++++++++++++ drivers/accel/amdxdna/amdxdna_gem.h | 65 +++ drivers/accel/amdxdna/amdxdna_pci_drv.c | 12 + drivers/accel/amdxdna/amdxdna_pci_drv.h | 6 + include/uapi/drm/amdxdna_accel.h | 77 +++ 10 files changed, 964 insertions(+), 1 deletion(-) create mode 100644 drivers/accel/amdxdna/amdxdna_gem.c create mode 100644 drivers/accel/amdxdna/amdxdna_gem.h diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile index c86c90dfd303..a688c378761f 100644 --- a/drivers/accel/amdxdna/Makefile +++ b/drivers/accel/amdxdna/Makefile @@ -8,6 +8,7 @@ amdxdna-y :=3D \ aie2_smu.o \ aie2_solver.o \ amdxdna_ctx.o \ + amdxdna_gem.o \ amdxdna_mailbox.o \ amdxdna_mailbox_helper.o \ amdxdna_pci_drv.o \ diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_= ctx.c index 022b2b0b015d..617fc05077d9 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -5,12 +5,15 @@ =20 #include #include +#include +#include #include #include =20 #include "aie2_pci.h" #include "aie2_solver.h" #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 @@ -128,6 +131,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) struct amdxdna_client *client =3D hwctx->client; struct amdxdna_dev *xdna =3D client->xdna; struct amdxdna_hwctx_priv *priv; + struct amdxdna_gem_obj *heap; int ret; =20 priv =3D kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); @@ -135,10 +139,28 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) return -ENOMEM; hwctx->priv =3D priv; =20 + mutex_lock(&client->mm_lock); + heap =3D client->dev_heap; + if (!heap) { + XDNA_ERR(xdna, "The client dev heap object not exist"); + mutex_unlock(&client->mm_lock); + ret =3D -ENOENT; + goto free_priv; + } + drm_gem_object_get(to_gobj(heap)); + mutex_unlock(&client->mm_lock); + priv->heap =3D heap; + + ret =3D amdxdna_gem_pin(heap); + if (ret) { + XDNA_ERR(xdna, "Dev heap pin failed, ret %d", ret); + goto put_heap; + } + ret =3D aie2_hwctx_col_list(hwctx); if (ret) { XDNA_ERR(xdna, "Create col list failed, ret %d", ret); - goto free_priv; + goto unpin; } =20 ret =3D aie2_alloc_resource(hwctx); @@ -147,14 +169,26 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) goto free_col_list; } =20 + ret =3D aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buffer failed, ret %d", ret); + goto release_resource; + } hwctx->status =3D HWCTX_STAT_INIT; =20 XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name); =20 return 0; =20 +release_resource: + aie2_release_resource(hwctx); free_col_list: kfree(hwctx->col_list); +unpin: + amdxdna_gem_unpin(heap); +put_heap: + drm_gem_object_put(to_gobj(heap)); free_priv: kfree(priv); return ret; @@ -164,11 +198,59 @@ void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) { aie2_release_resource(hwctx); =20 + amdxdna_gem_unpin(hwctx->priv->heap); + drm_gem_object_put(to_gobj(hwctx->priv->heap)); + kfree(hwctx->col_list); kfree(hwctx->priv); kfree(hwctx->cus); } =20 +static int aie2_hwctx_cu_config(struct amdxdna_hwctx *hwctx, void *buf, u3= 2 size) +{ + struct amdxdna_hwctx_param_config_cu *config =3D buf; + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + u32 total_size; + int ret; + + XDNA_DBG(xdna, "Config %d CU to %s", config->num_cus, hwctx->name); + if (hwctx->status !=3D HWCTX_STAT_INIT) { + XDNA_ERR(xdna, "Not support re-config CU"); + return -EINVAL; + } + + if (!config->num_cus) { + XDNA_ERR(xdna, "Number of CU is zero"); + return -EINVAL; + } + + total_size =3D struct_size(config, cu_configs, config->num_cus); + if (total_size > size) { + XDNA_ERR(xdna, "CU config larger than size"); + return -EINVAL; + } + + hwctx->cus =3D kmemdup(config, total_size, GFP_KERNEL); + if (!hwctx->cus) + return -ENOMEM; + + ret =3D aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Configu CU to firmware failed, ret %d", ret); + goto free_cus; + } + + wmb(); /* To avoid locking in command submit when check status */ + hwctx->status =3D HWCTX_STAT_READY; + + return 0; + +free_cus: + kfree(hwctx->cus); + hwctx->cus =3D NULL; + return ret; +} + int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size) { struct amdxdna_dev *xdna =3D hwctx->client->xdna; @@ -176,6 +258,7 @@ int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 = type, u64 value, void *bu drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); switch (type) { case DRM_AMDXDNA_HWCTX_CONFIG_CU: + return aie2_hwctx_cu_config(hwctx, buf, size); case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: return -EOPNOTSUPP; diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index 4b8a71bf4fae..28bd0560db61 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -5,6 +5,8 @@ =20 #include #include +#include +#include #include #include #include @@ -13,6 +15,7 @@ #include "aie2_msg_priv.h" #include "aie2_pci.h" #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_mailbox_helper.h" #include "amdxdna_pci_drv.h" @@ -282,3 +285,79 @@ int aie2_destroy_context(struct amdxdna_dev_hdl *ndev,= struct amdxdna_hwctx *hwc =20 return ret; } + +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size) +{ + DECLARE_AIE2_MSG(map_host_buffer, MSG_OP_MAP_HOST_BUFFER); + struct amdxdna_dev *xdna =3D ndev->xdna; + int ret; + + req.context_id =3D context_id; + req.buf_addr =3D addr; + req.buf_size =3D size; + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "fw ctx %d map host buf addr 0x%llx size 0x%llx", + context_id, addr, size); + + return 0; +} + +int aie2_config_cu(struct amdxdna_hwctx *hwctx) +{ + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + u32 shift =3D xdna->dev_info->dev_mem_buf_shift; + DECLARE_AIE2_MSG(config_cu, MSG_OP_CONFIG_CU); + struct drm_gem_object *gobj; + struct amdxdna_gem_obj *abo; + int ret, i; + + if (!chann) + return -ENODEV; + + if (hwctx->cus->num_cus > MAX_NUM_CUS) { + XDNA_DBG(xdna, "Exceed maximum CU %d", MAX_NUM_CUS); + return -EINVAL; + } + + for (i =3D 0; i < hwctx->cus->num_cus; i++) { + struct amdxdna_cu_config *cu =3D &hwctx->cus->cu_configs[i]; + + gobj =3D drm_gem_object_lookup(hwctx->client->filp, cu->cu_bo); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -EINVAL; + } + abo =3D to_xdna_obj(gobj); + + if (abo->type !=3D AMDXDNA_BO_DEV) { + drm_gem_object_put(gobj); + XDNA_ERR(xdna, "Invalid BO type"); + return -EINVAL; + } + + req.cfgs[i] =3D FIELD_PREP(AIE2_MSG_CFG_CU_PDI_ADDR, + abo->mem.dev_addr >> shift); + req.cfgs[i] |=3D FIELD_PREP(AIE2_MSG_CFG_CU_FUNC, cu->cu_func); + XDNA_DBG(xdna, "CU %d full addr 0x%llx, cfg 0x%x", i, + abo->mem.dev_addr, req.cfgs[i]); + drm_gem_object_put(gobj); + } + req.num_cus =3D hwctx->cus->num_cus; + + ret =3D xdna_send_msg_wait(xdna, chann, &msg); + if (ret =3D=3D -ETIME) + aie2_destroy_context(xdna->dev_handle, hwctx); + + if (resp.status =3D=3D AIE2_STATUS_SUCCESS) { + XDNA_DBG(xdna, "Configure %d CUs, ret %d", req.num_cus, ret); + return 0; + } + + XDNA_ERR(xdna, "Command opcode 0x%x failed, status 0x%x ret %d", + msg.opcode, resp.status, ret); + return ret; +} diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index b789286bc9d4..3ac936e2c9d1 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -119,6 +119,7 @@ struct rt_config { }; =20 struct amdxdna_hwctx_priv { + struct amdxdna_gem_obj *heap; void *mbox_chann; }; =20 @@ -196,6 +197,8 @@ int aie2_query_firmware_version(struct amdxdna_dev_hdl = *ndev, struct amdxdna_fw_ver *fw_ver); int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx= *hwctx); int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size); +int aie2_config_cu(struct amdxdna_hwctx *hwctx); =20 /* aie2_hwctx.c */ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/am= dxdna_ctx.h index 3627770e5a98..665b3208897d 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.h +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -6,6 +6,16 @@ #ifndef _AMDXDNA_CTX_H_ #define _AMDXDNA_CTX_H_ =20 +/* Exec buffer command header format */ +#define AMDXDNA_CMD_STATE GENMASK(3, 0) +#define AMDXDNA_CMD_EXTRA_CU_MASK GENMASK(11, 10) +#define AMDXDNA_CMD_COUNT GENMASK(22, 12) +#define AMDXDNA_CMD_OPCODE GENMASK(27, 23) +struct amdxdna_cmd { + u32 header; + u32 data[]; +}; + struct amdxdna_hwctx { struct amdxdna_client *client; struct amdxdna_hwctx_priv *priv; diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/am= dxdna_gem.c new file mode 100644 index 000000000000..66373baa4600 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -0,0 +1,627 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define XDNA_MAX_CMD_BO_SIZE SZ_32K + +static int +amdxdna_gem_insert_node_locked(struct amdxdna_gem_obj *abo, bool use_vmap) +{ + struct amdxdna_client *client =3D abo->client; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_mem *mem =3D &abo->mem; + u64 offset; + u32 align; + int ret; + + align =3D 1 << max(PAGE_SHIFT, xdna->dev_info->dev_mem_buf_shift); + ret =3D drm_mm_insert_node_generic(&abo->dev_heap->mm, &abo->mm_node, + mem->size, align, + 0, DRM_MM_INSERT_BEST); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + return ret; + } + + mem->dev_addr =3D abo->mm_node.start; + offset =3D mem->dev_addr - abo->dev_heap->mem.dev_addr; + mem->userptr =3D abo->dev_heap->mem.userptr + offset; + mem->pages =3D &abo->dev_heap->base.pages[offset >> PAGE_SHIFT]; + mem->nr_pages =3D mem->size >> PAGE_SHIFT; + + if (use_vmap) { + mem->kva =3D vmap(mem->pages, mem->nr_pages, VM_MAP, PAGE_KERNEL); + if (!mem->kva) { + XDNA_ERR(xdna, "Failed to vmap"); + drm_mm_remove_node(&abo->mm_node); + return -EFAULT; + } + } + + return 0; +} + +static void amdxdna_gem_obj_free(struct drm_gem_object *gobj) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(gobj->dev); + struct amdxdna_gem_obj *abo =3D to_xdna_obj(gobj); + struct iosys_map map =3D IOSYS_MAP_INIT_VADDR(abo->mem.kva); + + XDNA_DBG(xdna, "BO type %d xdna_addr 0x%llx", abo->type, abo->mem.dev_add= r); + if (abo->pinned) + amdxdna_gem_unpin(abo); + + if (abo->type =3D=3D AMDXDNA_BO_DEV) { + mutex_lock(&abo->client->mm_lock); + drm_mm_remove_node(&abo->mm_node); + mutex_unlock(&abo->client->mm_lock); + + vunmap(abo->mem.kva); + drm_gem_object_put(to_gobj(abo->dev_heap)); + drm_gem_object_release(gobj); + mutex_destroy(&abo->lock); + kfree(abo); + return; + } + + if (abo->type =3D=3D AMDXDNA_BO_DEV_HEAP) + drm_mm_takedown(&abo->mm); + + drm_gem_vunmap_unlocked(gobj, &map); + mutex_destroy(&abo->lock); + drm_gem_shmem_free(&abo->base); +} + +static const struct drm_gem_object_funcs amdxdna_gem_dev_obj_funcs =3D { + .free =3D amdxdna_gem_obj_free, +}; + +static bool amdxdna_hmm_invalidate(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct amdxdna_gem_obj *abo =3D container_of(mni, struct amdxdna_gem_obj, + mem.notifier); + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + + XDNA_DBG(xdna, "Invalid range 0x%llx, 0x%lx, type %d", + abo->mem.userptr, abo->mem.size, abo->type); + + if (!mmu_notifier_range_blockable(range)) + return false; + + xdna->dev_info->ops->hmm_invalidate(abo, cur_seq); + + return true; +} + +static const struct mmu_interval_notifier_ops amdxdna_hmm_ops =3D { + .invalidate =3D amdxdna_hmm_invalidate, +}; + +static void amdxdna_hmm_unregister(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + + if (!xdna->dev_info->ops->hmm_invalidate) + return; + + mmu_interval_notifier_remove(&abo->mem.notifier); + kvfree(abo->mem.pfns); + abo->mem.pfns =3D NULL; +} + +static int amdxdna_hmm_register(struct amdxdna_gem_obj *abo, unsigned long= addr, + size_t len) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + u32 nr_pages; + int ret; + + if (!xdna->dev_info->ops->hmm_invalidate) + return 0; + + if (abo->mem.pfns) + return -EEXIST; + + nr_pages =3D (PAGE_ALIGN(addr + len) - (addr & PAGE_MASK)) >> PAGE_SHIFT; + abo->mem.pfns =3D kvcalloc(nr_pages, sizeof(*abo->mem.pfns), + GFP_KERNEL); + if (!abo->mem.pfns) + return -ENOMEM; + + ret =3D mmu_interval_notifier_insert_locked(&abo->mem.notifier, + current->mm, + addr, + len, + &amdxdna_hmm_ops); + if (ret) { + XDNA_ERR(xdna, "Insert mmu notifier failed, ret %d", ret); + kvfree(abo->mem.pfns); + } + abo->mem.userptr =3D addr; + + return ret; +} + +static int amdxdna_gem_obj_mmap(struct drm_gem_object *gobj, + struct vm_area_struct *vma) +{ + struct amdxdna_gem_obj *abo =3D to_xdna_obj(gobj); + unsigned long num_pages; + int ret; + + ret =3D amdxdna_hmm_register(abo, vma->vm_start, gobj->size); + if (ret) + return ret; + + ret =3D drm_gem_shmem_mmap(&abo->base, vma); + if (ret) + goto hmm_unreg; + + num_pages =3D gobj->size >> PAGE_SHIFT; + /* Try to insert the pages */ + vm_flags_mod(vma, VM_MIXEDMAP, VM_PFNMAP); + ret =3D vm_insert_pages(vma, vma->vm_start, abo->base.pages, &num_pages); + if (ret) + XDNA_ERR(abo->client->xdna, "Failed insert pages, ret %d", ret); + + return 0; + +hmm_unreg: + amdxdna_hmm_unregister(abo); + return ret; +} + +static vm_fault_t amdxdna_gem_vm_fault(struct vm_fault *vmf) +{ + return drm_gem_shmem_vm_ops.fault(vmf); +} + +static void amdxdna_gem_vm_open(struct vm_area_struct *vma) +{ + drm_gem_shmem_vm_ops.open(vma); +} + +static void amdxdna_gem_vm_close(struct vm_area_struct *vma) +{ + struct drm_gem_object *gobj =3D vma->vm_private_data; + + amdxdna_hmm_unregister(to_xdna_obj(gobj)); + drm_gem_shmem_vm_ops.close(vma); +} + +static const struct vm_operations_struct amdxdna_gem_vm_ops =3D { + .fault =3D amdxdna_gem_vm_fault, + .open =3D amdxdna_gem_vm_open, + .close =3D amdxdna_gem_vm_close, +}; + +static const struct drm_gem_object_funcs amdxdna_gem_shmem_funcs =3D { + .free =3D amdxdna_gem_obj_free, + .print_info =3D drm_gem_shmem_object_print_info, + .pin =3D drm_gem_shmem_object_pin, + .unpin =3D drm_gem_shmem_object_unpin, + .get_sg_table =3D drm_gem_shmem_object_get_sg_table, + .vmap =3D drm_gem_shmem_object_vmap, + .vunmap =3D drm_gem_shmem_object_vunmap, + .mmap =3D amdxdna_gem_obj_mmap, + .vm_ops =3D &amdxdna_gem_vm_ops, +}; + +static struct amdxdna_gem_obj * +amdxdna_gem_create_obj(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo =3D kzalloc(sizeof(*abo), GFP_KERNEL); + if (!abo) + return ERR_PTR(-ENOMEM); + + abo->pinned =3D false; + abo->assigned_hwctx =3D AMDXDNA_INVALID_CTX_HANDLE; + mutex_init(&abo->lock); + + abo->mem.userptr =3D AMDXDNA_INVALID_ADDR; + abo->mem.dev_addr =3D AMDXDNA_INVALID_ADDR; + abo->mem.size =3D size; + + return abo; +} + +/* For drm_driver->gem_create_object callback */ +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo =3D amdxdna_gem_create_obj(dev, size); + if (IS_ERR(abo)) + return ERR_CAST(abo); + + to_gobj(abo)->funcs =3D &amdxdna_gem_shmem_funcs; + + return to_gobj(abo); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_alloc_shmem(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + + shmem =3D drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc =3D false; + + abo =3D to_xdna_obj(&shmem->base); + abo->client =3D client; + abo->type =3D AMDXDNA_BO_SHMEM; + + return abo; +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_dev_heap(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->size > xdna->dev_info->dev_mem_size) { + XDNA_DBG(xdna, "Invalid dev heap size 0x%llx, limit 0x%lx", + args->size, xdna->dev_info->dev_mem_size); + return ERR_PTR(-EINVAL); + } + + mutex_lock(&client->mm_lock); + if (client->dev_heap) { + XDNA_DBG(client->xdna, "dev heap is already created"); + ret =3D -EBUSY; + goto mm_unlock; + } + + shmem =3D drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) { + ret =3D PTR_ERR(shmem); + goto mm_unlock; + } + + shmem->map_wc =3D false; + abo =3D to_xdna_obj(&shmem->base); + + abo->type =3D AMDXDNA_BO_DEV_HEAP; + abo->client =3D client; + abo->mem.dev_addr =3D client->xdna->dev_info->dev_mem_base; + drm_mm_init(&abo->mm, abo->mem.dev_addr, abo->mem.size); + + client->dev_heap =3D abo; + drm_gem_object_get(to_gobj(abo)); + mutex_unlock(&client->mm_lock); + + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + size_t aligned_sz =3D PAGE_ALIGN(args->size); + struct amdxdna_gem_obj *abo, *heap; + int ret; + + mutex_lock(&client->mm_lock); + heap =3D client->dev_heap; + if (!heap) { + ret =3D -EINVAL; + goto mm_unlock; + } + + if (heap->mem.userptr =3D=3D AMDXDNA_INVALID_ADDR) { + XDNA_ERR(xdna, "Invalid dev heap userptr"); + ret =3D -EINVAL; + goto mm_unlock; + } + + if (args->size > heap->mem.size) { + XDNA_ERR(xdna, "Invalid dev bo size 0x%llx, limit 0x%lx", + args->size, heap->mem.size); + ret =3D -EINVAL; + goto mm_unlock; + } + + abo =3D amdxdna_gem_create_obj(&xdna->ddev, aligned_sz); + if (IS_ERR(abo)) { + ret =3D PTR_ERR(abo); + goto mm_unlock; + } + to_gobj(abo)->funcs =3D &amdxdna_gem_dev_obj_funcs; + abo->type =3D AMDXDNA_BO_DEV; + abo->client =3D client; + abo->dev_heap =3D heap; + ret =3D amdxdna_gem_insert_node_locked(abo, use_vmap); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + goto mm_unlock; + } + + drm_gem_object_get(to_gobj(heap)); + drm_gem_private_object_init(&xdna->ddev, to_gobj(abo), aligned_sz); + + mutex_unlock(&client->mm_lock); + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_cmd_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + struct iosys_map map; + int ret; + + if (args->size > XDNA_MAX_CMD_BO_SIZE) { + XDNA_ERR(xdna, "Command bo size 0x%llx too large", args->size); + return ERR_PTR(-EINVAL); + } + + if (args->size < sizeof(struct amdxdna_cmd)) { + XDNA_DBG(xdna, "Command BO size 0x%llx too small", args->size); + return ERR_PTR(-EINVAL); + } + + shmem =3D drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc =3D false; + abo =3D to_xdna_obj(&shmem->base); + + abo->type =3D AMDXDNA_BO_CMD; + abo->client =3D filp->driver_priv; + + ret =3D drm_gem_vmap_unlocked(to_gobj(abo), &map); + if (ret) { + XDNA_ERR(xdna, "Vmap cmd bo failed, ret %d", ret); + goto release_obj; + } + abo->mem.kva =3D map.vaddr; + + return abo; + +release_obj: + drm_gem_shmem_free(shmem); + return ERR_PTR(ret); +} + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct= drm_file *filp) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_drm_create_bo *args =3D data; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->flags || args->vaddr || !args->size) + return -EINVAL; + + XDNA_DBG(xdna, "BO arg type %d vaddr 0x%llx size 0x%llx flags 0x%llx", + args->type, args->vaddr, args->size, args->flags); + switch (args->type) { + case AMDXDNA_BO_SHMEM: + abo =3D amdxdna_drm_alloc_shmem(dev, args, filp); + break; + case AMDXDNA_BO_DEV_HEAP: + abo =3D amdxdna_drm_create_dev_heap(dev, args, filp); + break; + case AMDXDNA_BO_DEV: + abo =3D amdxdna_drm_alloc_dev_bo(dev, args, filp, false); + break; + case AMDXDNA_BO_CMD: + abo =3D amdxdna_drm_create_cmd_bo(dev, args, filp); + break; + default: + return -EINVAL; + } + if (IS_ERR(abo)) + return PTR_ERR(abo); + + /* ready to publish object to userspace */ + ret =3D drm_gem_handle_create(filp, to_gobj(abo), &args->handle); + if (ret) { + XDNA_ERR(xdna, "Create handle failed"); + goto put_obj; + } + + XDNA_DBG(xdna, "BO hdl %d type %d userptr 0x%llx xdna_addr 0x%llx size 0x= %lx", + args->handle, args->type, abo->mem.userptr, + abo->mem.dev_addr, abo->mem.size); +put_obj: + /* Dereference object reference. Handle holds it now. */ + drm_gem_object_put(to_gobj(abo)); + return ret; +} + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + int ret; + + switch (abo->type) { + case AMDXDNA_BO_SHMEM: + case AMDXDNA_BO_DEV_HEAP: + ret =3D drm_gem_shmem_pin(&abo->base); + break; + case AMDXDNA_BO_DEV: + ret =3D amdxdna_gem_pin(abo->dev_heap); + break; + default: + ret =3D -EOPNOTSUPP; + } + + XDNA_DBG(xdna, "BO type %d ret %d", abo->type, ret); + return ret; +} + +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo) +{ + int ret; + + mutex_lock(&abo->lock); + ret =3D amdxdna_gem_pin_nolock(abo); + mutex_unlock(&abo->lock); + + return ret; +} + +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo) +{ + mutex_lock(&abo->lock); + XDNA_DBG(abo->client->xdna, "BO type %d", abo->type); + switch (abo->type) { + case AMDXDNA_BO_SHMEM: + case AMDXDNA_BO_DEV_HEAP: + drm_gem_shmem_unpin(&abo->base); + break; + case AMDXDNA_BO_DEV: + amdxdna_gem_unpin(abo->dev_heap); + break; + default: + /* Should never go here */ + WARN_ONCE(1, "Unexpected BO type %d\n", abo->type); + } + mutex_unlock(&abo->lock); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + + gobj =3D drm_gem_object_lookup(client->filp, bo_hdl); + if (!gobj) { + XDNA_DBG(xdna, "Can not find bo %d", bo_hdl); + return NULL; + } + + abo =3D to_xdna_obj(gobj); + if (bo_type =3D=3D AMDXDNA_BO_INVALID || abo->type =3D=3D bo_type) + return abo; + + drm_gem_object_put(gobj); + return NULL; +} + +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, stru= ct drm_file *filp) +{ + struct amdxdna_drm_get_bo_info *args =3D data; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret =3D 0; + + if (args->ext || args->ext_flags) + return -EINVAL; + + gobj =3D drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_DBG(xdna, "Lookup GEM object %d failed", args->handle); + return -ENOENT; + } + + abo =3D to_xdna_obj(gobj); + args->vaddr =3D abo->mem.userptr; + args->xdna_addr =3D abo->mem.dev_addr; + + if (abo->type !=3D AMDXDNA_BO_DEV) + args->map_offset =3D drm_vma_node_offset_addr(&gobj->vma_node); + else + args->map_offset =3D AMDXDNA_INVALID_ADDR; + + XDNA_DBG(xdna, "BO hdl %d map_offset 0x%llx vaddr 0x%llx xdna_addr 0x%llx= ", + args->handle, args->map_offset, args->vaddr, args->xdna_addr); + + drm_gem_object_put(gobj); + return ret; +} + +/* + * The sync bo ioctl is to make sure the CPU cache is in sync with memory. + * This is required because NPU is not cache coherent device. CPU cache + * flushing/invalidation is expensive so it is best to handle this outside + * of the command submission path. This ioctl allows explicit cache + * flushing/invalidation outside of the critical path. + */ +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, + void *data, struct drm_file *filp) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_drm_sync_bo *args =3D data; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret; + + gobj =3D drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -ENOENT; + } + abo =3D to_xdna_obj(gobj); + + ret =3D amdxdna_gem_pin(abo); + if (ret) { + XDNA_ERR(xdna, "Pin BO %d failed, ret %d", args->handle, ret); + goto put_obj; + } + + if (abo->type =3D=3D AMDXDNA_BO_DEV) + drm_clflush_pages(abo->mem.pages, abo->mem.nr_pages); + else + drm_clflush_pages(abo->base.pages, gobj->size >> PAGE_SHIFT); + + amdxdna_gem_unpin(abo); + + XDNA_DBG(xdna, "Sync bo %d offset 0x%llx, size 0x%llx\n", + args->handle, args->offset, args->size); + +put_obj: + drm_gem_object_put(gobj); + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_gem.h b/drivers/accel/amdxdna/am= dxdna_gem.h new file mode 100644 index 000000000000..8ccc0375dd9d --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_GEM_H_ +#define _AMDXDNA_GEM_H_ + +struct amdxdna_mem { + u64 userptr; + void *kva; + u64 dev_addr; + size_t size; + struct page **pages; + u32 nr_pages; + struct mmu_interval_notifier notifier; + unsigned long *pfns; + bool map_invalid; +}; + +struct amdxdna_gem_obj { + struct drm_gem_shmem_object base; + struct amdxdna_client *client; + u8 type; + bool pinned; + struct mutex lock; /* Protects: pinned */ + struct amdxdna_mem mem; + + /* Below members is uninitialized when needed */ + struct drm_mm mm; /* For AMDXDNA_BO_DEV_HEAP */ + struct amdxdna_gem_obj *dev_heap; /* For AMDXDNA_BO_DEV */ + struct drm_mm_node mm_node; /* For AMDXDNA_BO_DEV */ + u32 assigned_hwctx; +}; + +#define to_gobj(obj) (&(obj)->base.base) + +static inline struct amdxdna_gem_obj *to_xdna_obj(struct drm_gem_object *g= obj) +{ + return container_of(gobj, struct amdxdna_gem_obj, base.base); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type); +static inline void amdxdna_gem_put_obj(struct amdxdna_gem_obj *abo) +{ + drm_gem_object_put(to_gobj(abo)); +} + +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size); +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap); + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo); +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo); +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo); + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct= drm_file *filp); +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, stru= ct drm_file *filp); +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, void *data, struct d= rm_file *filp); + +#endif /* _AMDXDNA_GEM_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 49e896092209..47ea79d4a021 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -7,12 +7,14 @@ #include #include #include +#include #include #include #include #include =20 #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 /* @@ -63,6 +65,7 @@ static int amdxdna_drm_open(struct drm_device *ddev, stru= ct drm_file *filp) } mutex_init(&client->hwctx_lock); idr_init_base(&client->hwctx_idr, AMDXDNA_INVALID_CTX_HANDLE + 1); + mutex_init(&client->mm_lock); =20 mutex_lock(&xdna->dev_lock); list_add_tail(&client->node, &xdna->client_list); @@ -91,6 +94,9 @@ static void amdxdna_drm_close(struct drm_device *ddev, st= ruct drm_file *filp) =20 idr_destroy(&client->hwctx_idr); mutex_destroy(&client->hwctx_lock); + mutex_destroy(&client->mm_lock); + if (client->dev_heap) + drm_gem_object_put(to_gobj(client->dev_heap)); =20 iommu_sva_unbind_device(client->sva); =20 @@ -123,6 +129,10 @@ static const struct drm_ioctl_desc amdxdna_drm_ioctls[= ] =3D { DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_HWCTX, amdxdna_drm_create_hwctx_ioctl, 0= ), DRM_IOCTL_DEF_DRV(AMDXDNA_DESTROY_HWCTX, amdxdna_drm_destroy_hwctx_ioctl,= 0), DRM_IOCTL_DEF_DRV(AMDXDNA_CONFIG_HWCTX, amdxdna_drm_config_hwctx_ioctl, 0= ), + /* BO */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_BO, amdxdna_drm_create_bo_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_BO_INFO, amdxdna_drm_get_bo_info_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_SYNC_BO, amdxdna_drm_sync_bo_ioctl, 0), }; =20 static const struct file_operations amdxdna_fops =3D { @@ -147,6 +157,8 @@ const struct drm_driver amdxdna_drm_drv =3D { .postclose =3D amdxdna_drm_close, .ioctls =3D amdxdna_drm_ioctls, .num_ioctls =3D ARRAY_SIZE(amdxdna_drm_ioctls), + + .gem_create_object =3D amdxdna_gem_create_object_cb, }; =20 static const struct amdxdna_dev_info * diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 5ec7fe168406..3dddde4ac12a 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -18,6 +18,7 @@ extern const struct drm_driver amdxdna_drm_drv; =20 struct amdxdna_dev; +struct amdxdna_gem_obj; struct amdxdna_hwctx; =20 /* @@ -29,6 +30,7 @@ struct amdxdna_dev_ops { int (*hwctx_init)(struct amdxdna_hwctx *hwctx); void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, voi= d *buf, u32 size); + void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq= ); }; =20 /* @@ -89,6 +91,10 @@ struct amdxdna_client { struct idr hwctx_idr; struct amdxdna_dev *xdna; struct drm_file *filp; + + struct mutex mm_lock; /* protect memory related */ + struct amdxdna_gem_obj *dev_heap; + struct iommu_sva *sva; int pasid; }; diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h index 133da8c007d0..3792750834b2 100644 --- a/include/uapi/drm/amdxdna_accel.h +++ b/include/uapi/drm/amdxdna_accel.h @@ -13,7 +13,9 @@ extern "C" { #endif =20 +#define AMDXDNA_INVALID_ADDR (~0UL) #define AMDXDNA_INVALID_CTX_HANDLE 0 +#define AMDXDNA_INVALID_BO_HANDLE 0 =20 enum amdxdna_device_type { AMDXDNA_DEV_TYPE_UNKNOWN =3D -1, @@ -24,6 +26,9 @@ enum amdxdna_drm_ioctl_id { DRM_AMDXDNA_CREATE_HWCTX, DRM_AMDXDNA_DESTROY_HWCTX, DRM_AMDXDNA_CONFIG_HWCTX, + DRM_AMDXDNA_CREATE_BO, + DRM_AMDXDNA_GET_BO_INFO, + DRM_AMDXDNA_SYNC_BO, }; =20 /** @@ -136,6 +141,66 @@ struct amdxdna_drm_config_hwctx { __u32 pad; }; =20 +enum amdxdna_bo_type { + AMDXDNA_BO_INVALID =3D 0, + AMDXDNA_BO_SHMEM, + AMDXDNA_BO_DEV_HEAP, + AMDXDNA_BO_DEV, + AMDXDNA_BO_CMD, +}; + +/** + * struct amdxdna_drm_create_bo - Create a buffer object. + * @flags: Buffer flags. MBZ. + * @vaddr: User VA of buffer if applied. MBZ. + * @size: Size in bytes. + * @type: Buffer type. + * @handle: Returned DRM buffer object handle. + */ +struct amdxdna_drm_create_bo { + __u64 flags; + __u64 vaddr; + __u64 size; + __u32 type; + __u32 handle; +}; + +/** + * struct amdxdna_drm_get_bo_info - Get buffer object information. + * @ext: MBZ. + * @ext_flags: MBZ. + * @handle: DRM buffer object handle. + * @pad: Structure padding. + * @map_offset: Returned DRM fake offset for mmap(). + * @vaddr: Returned user VA of buffer. 0 in case user needs mmap(). + * @xdna_addr: Returned XDNA device virtual address. + */ +struct amdxdna_drm_get_bo_info { + __u64 ext; + __u64 ext_flags; + __u32 handle; + __u32 pad; + __u64 map_offset; + __u64 vaddr; + __u64 xdna_addr; +}; + +/** + * struct amdxdna_drm_sync_bo - Sync buffer object. + * @handle: Buffer object handle. + * @direction: Direction of sync, can be from device or to device. + * @offset: Offset in the buffer to sync. + * @size: Size in bytes. + */ +struct amdxdna_drm_sync_bo { + __u32 handle; +#define SYNC_DIRECT_TO_DEVICE 0U +#define SYNC_DIRECT_FROM_DEVICE 1U + __u32 direction; + __u64 offset; + __u64 size; +}; + #define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ struct amdxdna_drm_create_hwctx) @@ -148,6 +213,18 @@ struct amdxdna_drm_config_hwctx { DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CONFIG_HWCTX, \ struct amdxdna_drm_config_hwctx) =20 +#define DRM_IOCTL_AMDXDNA_CREATE_BO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_BO, \ + struct amdxdna_drm_create_bo) + +#define DRM_IOCTL_AMDXDNA_GET_BO_INFO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_GET_BO_INFO, \ + struct amdxdna_drm_get_bo_info) + +#define DRM_IOCTL_AMDXDNA_SYNC_BO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_SYNC_BO, \ + struct amdxdna_drm_sync_bo) + #if defined(__cplusplus) } /* extern c end */ #endif --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2045.outbound.protection.outlook.com [40.107.237.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 724BB1E9082 for ; Fri, 11 Oct 2024 23:13:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.237.45 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688404; cv=fail; b=ld57k749jTE6/oar2SHR+nHTO67S+IEyQAaOpB58nev7kTvmDpveGoEKbAmEu1a11SmDfI/BvrYWlB3l9fVZtbDSYHWWc1w2H+WrmqQl9Vg4CnzZhEzoJ47tmld5TH2OdEqoXg69Ls1HSWKKz/85to+toxjd9gHbCjArA++P/iM= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688404; c=relaxed/simple; bh=+ypV+dPxQqertegqhtbjv+/huMELAo6M7OdXPbt2lk8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JM9cT+3jTqsJFPXDw3WVVhTCpuVCEushwTZHCK8S0ouxDssajK6cLcYeh2FHaCR/t46zGV8EMzte8OTuIEUq0qQfh0xYnyVe7g0KxNypVLXT4UfKpYzu74ngjBrxpOyMvr7IgySVGOpe9paQMlqTTUlo4oxb0kxGJAHImAq3tls= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=m46OWF9X; arc=fail smtp.client-ip=40.107.237.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="m46OWF9X" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yLz7LZ3Wrtq2MAgnSoKQfCfvOQckFSlfR75NXtEtL5cIb+Q/gCGmQ63WPbGFTOQB2YQdJ0B0XSYL+NX7gGexGhFNZCcX6ztpfNXYh8RCi5KQMMeE1Zs41t4JR9wjv0ZYUnuzhtOFXYuLfuf/KeK/wUfCQv9Ws9jODWgkqBJaVO3rY0XG28p0dhIDM/KtwAYKRTV2CijSBSLxEqEIEIpmKCisElrW4fHqseLc6ElKA64cps0I8/CzmUE4KFOFaeUr2cbqh11T0Gt5QNbWo8lMcm+cIHrNF8eQNP5uuiApInYO5YIKom4l/KPMTWFH8MEYNEaoW0I16vZaGqvukmtNpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Via7aCsI+aqyyTTn9wZbnrb6ePJZMnRIiICKajA4dBE=; b=RayEPaEoW0xO77HYGsmn6zcsXmqYVUYffpsN0Fl3Fnk3x/XE7L8dHRpliozIT22vJ63cqdzAJ4fD40W76qyFiwM2d0WY6id4MAOEwYrujf4mwMeVIBoeg9Uh8sTec5ked9hN+q4CpLDaC/yZvZLL0HFl5yfaISjaWtaoXiMRZyX0N+dMstEgFWm0J+GhqUPXd04q9/DtDKMAN3UGNE/pf4rMhbqsDjMZX2nrILPIX5yGhJJVkgeICZYeQUwPU+7JOrie0RZ/lLkSv/PcDaWMKNlrdO9TaAk7jO6/a0inVgQonLtRvLCZsyKDgBVzC72chGpuBx2rC6Up9IkwlXgP4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Via7aCsI+aqyyTTn9wZbnrb6ePJZMnRIiICKajA4dBE=; b=m46OWF9XMPxmIFJlj7EY7fqz6d6yjP5uc2OX4pDjbaIO/JO2nDKCfLkktnIfoBUU5ttBPKwuBOsF78618NYaKSorXX36bZCi8JR9HjZS5qLLLZGoIu3ePTT2CAG7mmgeM+DYI0KWRH3eofOKS+f0jpoyTz3EJgd24hW21V0MsK8= Received: from BYAPR05CA0083.namprd05.prod.outlook.com (2603:10b6:a03:e0::24) by PH0PR12MB7471.namprd12.prod.outlook.com (2603:10b6:510:1e9::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.21; Fri, 11 Oct 2024 23:13:10 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::2b) by BYAPR05CA0083.outlook.office365.com (2603:10b6:a03:e0::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.8 via Frontend Transport; Fri, 11 Oct 2024 23:13:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:09 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:02 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:01 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:00 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 07/10] accel/amdxdna: Add command execution Date: Fri, 11 Oct 2024 16:12:41 -0700 Message-ID: <20241011231244.3182625-8-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|PH0PR12MB7471:EE_ X-MS-Office365-Filtering-Correlation-Id: ada3550a-239d-4c4c-8274-08dcea4a45cd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?h+LkqTYjj3w1Q1wrZL3C+kVOE1BdUGVXbGqWILkzq38z9cqvp6xPdcW0gkX9?= =?us-ascii?Q?z8CkAYEequHUbGZoyagWn7omw9xIadamsuXdFYYJvd6EP6TxY9PoWHgG38Qk?= =?us-ascii?Q?J4btOqAktNgD11K0cQznpHTH0l8mmb+3SdTSmpa0y4EobgSMLfYw8+84YWlq?= =?us-ascii?Q?79wX5eq0PAQLKPr3OJlD2mTJtQCFfrpq3QbCma06ChZEVZovnDc+Fmhhn59P?= =?us-ascii?Q?97xKfa5kPc4rcX7H1IIona3Pei2fMlCFZ++87r3MyfemEGFKMzrEwY91KO7n?= =?us-ascii?Q?8jWQzrGxF6ssdJC0D8rsjnk2ZoDdYWi0GIRnj7uXgYWBFPuTMyvBgApCoU5r?= =?us-ascii?Q?Tvf8862+Db02DDaVPW5fu5oAykETObwV/leMJIDfYtBMD9RIF9ctnd1GQ7AP?= =?us-ascii?Q?LDwRMFtDj9/CZ5vC6MQCNfD+h21/79+bcIHctaTOUIM/iA5YqwZ7E8v+Rht6?= =?us-ascii?Q?6mQbcAYJDX+qk3QoIy2SCXJ5o3nW29t5bi5KX8+tyiEVJjs3GCsMn0J4mjSk?= =?us-ascii?Q?RAU4ZTUhvxSCUOsXGQPqvWSqXXvQLhkzxQIlTdSUsocU7D9Z+C6Sq83VqkLD?= =?us-ascii?Q?qNxfr0vN6djPc+/JvpFhi5gkhmzk5NbacLRaY3m9egnm+H4LCf4Jd1CPaTtt?= =?us-ascii?Q?Tws16qHsZG6owWAwsnHaEKsNOaMxw4hZ9/TTwl62IAi9CcFSj7P1pTzN+ZiX?= =?us-ascii?Q?cB5y+PDl5d7ijXIqVgnfDW5ZHpD7IhncEvANVl8qQeP1ym/TR3tQCNy5PpB7?= =?us-ascii?Q?Ij3FYp3Ks4k+ASfcQOSI27Z9N88aJBAFKlT/MMSfMhlYbOhDVdRC/iGJhZM7?= =?us-ascii?Q?WVRjq8UNR/oQpe95YT0p0zj21enbZSl4fSvkuJKbwdQWKWkRjIwtWxlu+fFV?= =?us-ascii?Q?ysHOo74zmgtoV3SDMQVrtQrsJExzWCQipdPbmSeWkGu7qjW2wHGbi/HvppQ9?= =?us-ascii?Q?N6/rLQAncJPFaItOynmocl2NUdoNgFfsDpe8pwm2N7kYcC9Alvy7OjR2SsYy?= =?us-ascii?Q?nHzzw0EDiHMs7h7WHDB3+8Q7N48Kg1IWoDhz6EzSIIqgtO7nwinM1TsvzTqo?= =?us-ascii?Q?IVZ6A8VvFD7DtpqAdMFmOeMXOM/CzhL0F9DbFIX9ATtuMrhk0MTfnPgdLYQt?= =?us-ascii?Q?AiqRRxrXcUHxuRefDqvUWQ9jNOdFqE9j0szrO52IUezHZUEuJn2wG7PyV0/O?= =?us-ascii?Q?iKpVBQ9lRyv0x4Yq0URs/LX8jGvv4PIneYqSygV9dyQujxjMjlGbARQsm7ej?= =?us-ascii?Q?x/XfQSN+LiXHCdit/+OIxNJ5YmfUluRqs1cF+O2AMLUzx4f8YvAFpDPuCrIg?= =?us-ascii?Q?jUYaxLa+VLtEPKT5rC524DKT04/wNbYN8vChsXJFUv6qTg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:09.7937 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ada3550a-239d-4c4c-8274-08dcea4a45cd X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB7471 Content-Type: text/plain; charset="utf-8" Add interfaces for user application to submit command and wait for its completion. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou --- drivers/accel/amdxdna/aie2_ctx.c | 624 +++++++++++++++++- drivers/accel/amdxdna/aie2_message.c | 343 ++++++++++ drivers/accel/amdxdna/aie2_pci.c | 6 + drivers/accel/amdxdna/aie2_pci.h | 35 + drivers/accel/amdxdna/aie2_psp.c | 2 + drivers/accel/amdxdna/aie2_smu.c | 2 + drivers/accel/amdxdna/amdxdna_ctx.c | 375 ++++++++++- drivers/accel/amdxdna/amdxdna_ctx.h | 110 +++ drivers/accel/amdxdna/amdxdna_gem.c | 1 + .../accel/amdxdna/amdxdna_mailbox_helper.c | 5 + drivers/accel/amdxdna/amdxdna_pci_drv.c | 6 + drivers/accel/amdxdna/amdxdna_pci_drv.h | 5 + drivers/accel/amdxdna/amdxdna_sysfs.c | 5 + drivers/accel/amdxdna/npu1_regs.c | 1 + drivers/accel/amdxdna/npu2_regs.c | 1 + drivers/accel/amdxdna/npu4_regs.c | 1 + drivers/accel/amdxdna/npu5_regs.c | 1 + include/trace/events/amdxdna.h | 41 ++ include/uapi/drm/amdxdna_accel.h | 59 ++ 19 files changed, 1614 insertions(+), 9 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_= ctx.c index 617fc05077d9..f9010a902c99 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -8,8 +8,11 @@ #include #include #include +#include #include +#include =20 +#include "aie2_msg_priv.h" #include "aie2_pci.h" #include "aie2_solver.h" #include "amdxdna_ctx.h" @@ -17,6 +20,361 @@ #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 +bool force_cmdlist; +module_param(force_cmdlist, bool, 0600); +MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default false)"); + +#define HWCTX_MAX_TIMEOUT 60000 /* miliseconds */ + +static int +aie2_hwctx_add_job(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job) +{ + struct amdxdna_sched_job *other; + int idx; + + idx =3D get_job_idx(hwctx->priv->seq); + /* When pending list full, hwctx->seq points to oldest fence */ + other =3D hwctx->priv->pending[idx]; + if (other && other->fence) + return -EAGAIN; + + if (other) { + dma_fence_put(other->out_fence); + amdxdna_job_put(other); + } + + hwctx->priv->pending[idx] =3D job; + job->seq =3D hwctx->priv->seq++; + kref_get(&job->refcnt); + + return 0; +} + +static struct amdxdna_sched_job * +aie2_hwctx_get_job(struct amdxdna_hwctx *hwctx, u64 seq) +{ + int idx; + + /* Special sequence number for oldest fence if exist */ + if (seq =3D=3D AMDXDNA_INVALID_CMD_HANDLE) { + idx =3D get_job_idx(hwctx->priv->seq); + goto out; + } + + if (seq >=3D hwctx->priv->seq) + return ERR_PTR(-EINVAL); + + if (seq + HWCTX_MAX_CMDS < hwctx->priv->seq) + return NULL; + + idx =3D get_job_idx(seq); + +out: + return hwctx->priv->pending[idx]; +} + +/* The bad_job is used in aie2_sched_job_timedout, otherwise, set it to NU= LL */ +static void aie2_hwctx_stop(struct amdxdna_dev *xdna, struct amdxdna_hwctx= *hwctx, + struct drm_sched_job *bad_job) +{ + drm_sched_stop(&hwctx->priv->sched, bad_job); + aie2_destroy_context(xdna->dev_handle, hwctx); +} + +static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwc= tx *hwctx) +{ + struct amdxdna_gem_obj *heap =3D hwctx->priv->heap; + int ret; + + ret =3D aie2_create_context(xdna->dev_handle, hwctx); + if (ret) { + XDNA_ERR(xdna, "Create hwctx failed, ret %d", ret); + goto out; + } + + ret =3D aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buf failed, ret %d", ret); + goto out; + } + + if (hwctx->status !=3D HWCTX_STAT_READY) { + XDNA_DBG(xdna, "hwctx is not ready, status %d", hwctx->status); + goto out; + } + + ret =3D aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config cu failed, ret %d", ret); + goto out; + } + +out: + drm_sched_start(&hwctx->priv->sched); + XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); + return ret; +} + +void aie2_stop_ctx_by_col_map(struct amdxdna_client *client, u32 col_map) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + /* check if the HW context uses the error column */ + if (!(col_map & amdxdna_hwctx_col_map(hwctx))) + continue; + + aie2_hwctx_stop(xdna, hwctx, NULL); + hwctx->old_status =3D hwctx->status; + hwctx->status =3D HWCTX_STAT_STOP; + XDNA_DBG(xdna, "Stop %s", hwctx->name); + } + mutex_unlock(&client->hwctx_lock); +} + +void aie2_restart_ctx(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + if (hwctx->status !=3D HWCTX_STAT_STOP) + continue; + + hwctx->status =3D hwctx->old_status; + XDNA_DBG(xdna, "Resetting %s", hwctx->name); + aie2_hwctx_restart(xdna, hwctx); + } + mutex_unlock(&client->hwctx_lock); +} + +static int aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_sched_job *job; + + mutex_lock(&hwctx->priv->io_lock); + if (!hwctx->priv->seq) { + mutex_unlock(&hwctx->priv->io_lock); + return 0; + } + + job =3D aie2_hwctx_get_job(hwctx, hwctx->priv->seq - 1); + if (IS_ERR_OR_NULL(job)) { + mutex_unlock(&hwctx->priv->io_lock); + XDNA_WARN(hwctx->client->xdna, "Corrupted pending list"); + return 0; + } + mutex_unlock(&hwctx->priv->io_lock); + + wait_event(hwctx->priv->job_free_wq, !job->fence); + + return 0; +} + +static void +aie2_sched_notify(struct amdxdna_sched_job *job) +{ + struct dma_fence *fence =3D job->fence; + + job->hwctx->priv->completed++; + dma_fence_signal(fence); + trace_xdna_job(&job->base, job->hwctx->name, "signaled fence", job->seq); + dma_fence_put(fence); + mmput(job->mm); + amdxdna_job_put(job); +} + +static int +aie2_sched_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + struct amdxdna_gem_obj *cmd_abo; + u32 ret =3D 0; + u32 status; + + cmd_abo =3D job->cmd_bo; + + if (unlikely(!data)) + goto out; + + if (unlikely(size !=3D sizeof(u32))) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + + status =3D *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + if (status =3D=3D AIE2_STATUS_SUCCESS) + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + else + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ERROR); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_nocmd_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + u32 ret =3D 0; + u32 status; + + if (unlikely(!data)) + goto out; + + if (unlikely(size !=3D sizeof(u32))) { + ret =3D -EINVAL; + goto out; + } + + status =3D *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_cmdlist_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + struct amdxdna_gem_obj *cmd_abo; + struct cmd_chain_resp *resp; + struct amdxdna_dev *xdna; + u32 fail_cmd_status; + u32 fail_cmd_idx; + u32 ret =3D 0; + + cmd_abo =3D job->cmd_bo; + if (unlikely(!data) || unlikely(size !=3D sizeof(u32) * 3)) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + + resp =3D (struct cmd_chain_resp *)data; + xdna =3D job->hwctx->client->xdna; + XDNA_DBG(xdna, "Status 0x%x", resp->status); + if (resp->status =3D=3D AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + goto out; + } + + /* Slow path to handle error, read from ringbuf on BAR */ + fail_cmd_idx =3D resp->fail_cmd_idx; + fail_cmd_status =3D resp->fail_cmd_status; + XDNA_DBG(xdna, "Failed cmd idx %d, status 0x%x", + fail_cmd_idx, fail_cmd_status); + + if (fail_cmd_status =3D=3D AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + amdxdna_cmd_set_state(cmd_abo, fail_cmd_status); + + if (amdxdna_cmd_get_op(cmd_abo) =3D=3D ERT_CMD_CHAIN) { + struct amdxdna_cmd_chain *cc =3D amdxdna_cmd_get_payload(cmd_abo, NULL); + + cc->error_index =3D fail_cmd_idx; + if (cc->error_index >=3D cc->command_count) + cc->error_index =3D 0; + } +out: + aie2_sched_notify(job); + return ret; +} + +static struct dma_fence * +aie2_sched_job_run(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct amdxdna_hwctx *hwctx =3D job->hwctx; + struct dma_fence *fence; + int ret; + + if (!mmget_not_zero(job->mm)) + return ERR_PTR(-ESRCH); + + kref_get(&job->refcnt); + fence =3D dma_fence_get(job->fence); + + if (unlikely(!cmd_abo)) { + ret =3D aie2_sync_bo(hwctx, job, aie2_sched_nocmd_resp_handler); + goto out; + } + + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_NEW); + + if (amdxdna_cmd_get_op(cmd_abo) =3D=3D ERT_CMD_CHAIN) + ret =3D aie2_cmdlist_multi_execbuf(hwctx, job, aie2_sched_cmdlist_resp_h= andler); + else if (force_cmdlist) + ret =3D aie2_cmdlist_single_execbuf(hwctx, job, aie2_sched_cmdlist_resp_= handler); + else + ret =3D aie2_execbuf(hwctx, job, aie2_sched_resp_handler); + +out: + if (ret) { + dma_fence_put(job->fence); + amdxdna_job_put(job); + mmput(job->mm); + fence =3D ERR_PTR(ret); + } + trace_xdna_job(sched_job, hwctx->name, "sent to device", job->seq); + + return fence; +} + +static void aie2_sched_job_free(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx =3D job->hwctx; + + trace_xdna_job(sched_job, hwctx->name, "job free", job->seq); + drm_sched_job_cleanup(sched_job); + job->fence =3D NULL; + amdxdna_job_put(job); + + wake_up(&hwctx->priv->job_free_wq); +} + +static enum drm_gpu_sched_stat +aie2_sched_job_timedout(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx =3D job->hwctx; + struct amdxdna_dev *xdna; + + xdna =3D hwctx->client->xdna; + trace_xdna_job(sched_job, hwctx->name, "job timedout", job->seq); + mutex_lock(&xdna->dev_lock); + aie2_hwctx_stop(xdna, hwctx, sched_job); + + aie2_hwctx_restart(xdna, hwctx); + mutex_unlock(&xdna->dev_lock); + + return DRM_GPU_SCHED_STAT_NOMINAL; +} + +const struct drm_sched_backend_ops sched_ops =3D { + .run_job =3D aie2_sched_job_run, + .free_job =3D aie2_sched_job_free, + .timedout_job =3D aie2_sched_job_timedout, +}; + static int aie2_hwctx_col_list(struct amdxdna_hwctx *hwctx) { struct amdxdna_dev *xdna =3D hwctx->client->xdna; @@ -130,9 +488,10 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) { struct amdxdna_client *client =3D hwctx->client; struct amdxdna_dev *xdna =3D client->xdna; + struct drm_gpu_scheduler *sched; struct amdxdna_hwctx_priv *priv; struct amdxdna_gem_obj *heap; - int ret; + int i, ret; =20 priv =3D kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); if (!priv) @@ -157,10 +516,48 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) goto put_heap; } =20 + for (i =3D 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + struct amdxdna_gem_obj *abo; + struct amdxdna_drm_create_bo args =3D { + .flags =3D 0, + .type =3D AMDXDNA_BO_DEV, + .vaddr =3D 0, + .size =3D MAX_CHAIN_CMDBUF_SIZE, + }; + + abo =3D amdxdna_drm_alloc_dev_bo(&xdna->ddev, &args, client->filp, true); + if (IS_ERR(abo)) { + ret =3D PTR_ERR(abo); + goto free_cmd_bufs; + } + + XDNA_DBG(xdna, "Command buf %d addr 0x%llx size 0x%lx", + i, abo->mem.dev_addr, abo->mem.size); + priv->cmd_buf[i] =3D abo; + } + + sched =3D &priv->sched; + mutex_init(&priv->io_lock); + ret =3D drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT, + HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT), + NULL, NULL, hwctx->name, xdna->ddev.dev); + if (ret) { + XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret); + goto free_cmd_bufs; + } + + ret =3D drm_sched_entity_init(&priv->entity, DRM_SCHED_PRIORITY_NORMAL, + &sched, 1, NULL); + if (ret) { + XDNA_ERR(xdna, "Failed to initial sched entiry. ret %d", ret); + goto free_sched; + } + init_waitqueue_head(&priv->job_free_wq); + ret =3D aie2_hwctx_col_list(hwctx); if (ret) { XDNA_ERR(xdna, "Create col list failed, ret %d", ret); - goto unpin; + goto free_entity; } =20 ret =3D aie2_alloc_resource(hwctx); @@ -185,7 +582,16 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) aie2_release_resource(hwctx); free_col_list: kfree(hwctx->col_list); -unpin: +free_entity: + drm_sched_entity_destroy(&priv->entity); +free_sched: + drm_sched_fini(&priv->sched); +free_cmd_bufs: + for (i =3D 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + if (!priv->cmd_buf[i]) + continue; + drm_gem_object_put(to_gobj(priv->cmd_buf[i])); + } amdxdna_gem_unpin(heap); put_heap: drm_gem_object_put(to_gobj(heap)); @@ -196,11 +602,43 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) =20 void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) { + struct amdxdna_sched_job *job; + struct amdxdna_dev *xdna; + int idx; + + xdna =3D hwctx->client->xdna; + drm_sched_wqueue_stop(&hwctx->priv->sched); + + /* Now, scheduler will not send command to device. */ aie2_release_resource(hwctx); =20 + /* + * All submitted commands are aborted. + * Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run() + * will return NODEV if it is called. + */ + drm_sched_wqueue_start(&hwctx->priv->sched); + + aie2_hwctx_wait_for_idle(hwctx); + drm_sched_entity_destroy(&hwctx->priv->entity); + drm_sched_fini(&hwctx->priv->sched); + + for (idx =3D 0; idx < HWCTX_MAX_CMDS; idx++) { + job =3D hwctx->priv->pending[idx]; + if (!job) + continue; + + dma_fence_put(job->out_fence); + amdxdna_job_put(job); + } + XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq); + + for (idx =3D 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++) + drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx])); amdxdna_gem_unpin(hwctx->priv->heap); drm_gem_object_put(to_gobj(hwctx->priv->heap)); =20 + mutex_destroy(&hwctx->priv->io_lock); kfree(hwctx->col_list); kfree(hwctx->priv); kfree(hwctx->cus); @@ -267,3 +705,183 @@ int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u3= 2 type, u64 value, void *bu return -EOPNOTSUPP; } } + +static int aie2_populate_range(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + struct mm_struct *mm =3D abo->mem.notifier.mm; + struct hmm_range range =3D { 0 }; + unsigned long timeout; + int ret; + + XDNA_INFO_ONCE(xdna, "populate memory range %llx size %lx", + abo->mem.userptr, abo->mem.size); + range.notifier =3D &abo->mem.notifier; + range.start =3D abo->mem.userptr; + range.end =3D abo->mem.userptr + abo->mem.size; + range.hmm_pfns =3D abo->mem.pfns; + range.default_flags =3D HMM_PFN_REQ_FAULT; + + if (!mmget_not_zero(mm)) + return -EFAULT; + + timeout =3D jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); +again: + range.notifier_seq =3D mmu_interval_read_begin(&abo->mem.notifier); + mmap_read_lock(mm); + ret =3D hmm_range_fault(&range); + mmap_read_unlock(mm); + if (ret) { + if (time_after(jiffies, timeout)) { + ret =3D -ETIME; + goto put_mm; + } + + if (ret =3D=3D -EBUSY) + goto again; + + goto put_mm; + } + + dma_resv_lock(to_gobj(abo)->resv, NULL); + if (mmu_interval_read_retry(&abo->mem.notifier, range.notifier_seq)) { + dma_resv_unlock(to_gobj(abo)->resv); + goto again; + } + abo->mem.map_invalid =3D false; + dma_resv_unlock(to_gobj(abo)->resv); + +put_mm: + mmput(mm); + return ret; +} + +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job = *job, u64 *seq) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct ww_acquire_ctx acquire_ctx; + struct amdxdna_gem_obj *abo; + unsigned long timeout =3D 0; + int ret, i; + + ret =3D drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx); + if (ret) { + XDNA_ERR(xdna, "DRM job init failed, ret %d", ret); + return ret; + } + + drm_sched_job_arm(&job->base); + job->out_fence =3D dma_fence_get(&job->base.s_fence->finished); + +retry: + ret =3D drm_gem_lock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (ret) { + XDNA_WARN(xdna, "Failed to reverve fence, ret %d", ret); + goto put_fence; + } + + for (i =3D 0; i < job->bo_cnt; i++) { + abo =3D to_xdna_obj(job->bos[i]); + if (abo->mem.map_invalid) { + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (!timeout) { + timeout =3D jiffies + + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); + } else if (time_after(jiffies, timeout)) { + ret =3D -ETIME; + goto put_fence; + } + + ret =3D aie2_populate_range(abo); + if (ret) + goto put_fence; + goto retry; + } + + ret =3D dma_resv_reserve_fences(job->bos[i]->resv, 1); + if (ret) { + XDNA_WARN(xdna, "Failed to reserve fences %d", ret); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + goto put_fence; + } + } + + for (i =3D 0; i < job->bo_cnt; i++) + dma_resv_add_fence(job->bos[i]->resv, job->out_fence, DMA_RESV_USAGE_WRI= TE); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + + mutex_lock(&hwctx->priv->io_lock); + ret =3D aie2_hwctx_add_job(hwctx, job); + if (ret) { + mutex_unlock(&hwctx->priv->io_lock); + goto signal_fence; + } + + *seq =3D job->seq; + drm_sched_entity_push_job(&job->base); + mutex_unlock(&hwctx->priv->io_lock); + + return 0; + +signal_fence: + dma_fence_signal(job->out_fence); +put_fence: + dma_fence_put(job->out_fence); + drm_sched_job_cleanup(&job->base); + return ret; +} + +int aie2_cmd_wait(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout) +{ + signed long remaining =3D MAX_SCHEDULE_TIMEOUT; + struct amdxdna_sched_job *job; + struct dma_fence *out_fence; + long ret; + + mutex_lock(&hwctx->priv->io_lock); + job =3D aie2_hwctx_get_job(hwctx, seq); + if (IS_ERR(job)) { + mutex_unlock(&hwctx->priv->io_lock); + ret =3D PTR_ERR(job); + goto out; + } + + if (unlikely(!job)) { + mutex_unlock(&hwctx->priv->io_lock); + ret =3D 0; + goto out; + } + out_fence =3D dma_fence_get(job->out_fence); + mutex_unlock(&hwctx->priv->io_lock); + + if (timeout) + remaining =3D msecs_to_jiffies(timeout); + + ret =3D dma_fence_wait_timeout(out_fence, true, remaining); + if (!ret) + ret =3D -ETIME; + else if (ret > 0) + ret =3D 0; + + dma_fence_put(out_fence); +out: + return ret; +} + +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, + unsigned long cur_seq) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + struct drm_gem_object *gobj =3D to_gobj(abo); + long ret; + + dma_resv_lock(gobj->resv, NULL); + abo->mem.map_invalid =3D true; + mmu_interval_set_seq(&abo->mem.notifier, cur_seq); + ret =3D dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, + true, MAX_SCHEDULE_TIMEOUT); + dma_resv_unlock(gobj->resv); + + if (!ret || ret =3D=3D -ERESTARTSYS) + XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index 28bd0560db61..3dc4a9a8571e 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -4,10 +4,12 @@ */ =20 #include +#include #include #include #include #include +#include #include #include #include @@ -361,3 +363,344 @@ int aie2_config_cu(struct amdxdna_hwctx *hwctx) msg.opcode, resp.status, ret); return ret; } + +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + union { + struct execute_buffer_req ebuf; + struct exec_dpu_req dpu; + } req; + struct xdna_mailbox_msg msg; + u32 payload_len; + void *payload; + int cu_idx; + int ret; + u32 op; + + if (!chann) + return -ENODEV; + + payload =3D amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (!payload) { + XDNA_ERR(xdna, "Invalid command, cannot get payload"); + return -EINVAL; + } + + cu_idx =3D amdxdna_cmd_get_cu_idx(cmd_abo); + if (cu_idx < 0) { + XDNA_DBG(xdna, "Invalid cu idx"); + return -EINVAL; + } + + op =3D amdxdna_cmd_get_op(cmd_abo); + switch (op) { + case ERT_START_CU: + if (unlikely(payload_len > sizeof(req.ebuf.payload))) + XDNA_DBG(xdna, "Invalid ebuf payload len: %d", payload_len); + req.ebuf.cu_idx =3D cu_idx; + memcpy(req.ebuf.payload, payload, sizeof(req.ebuf.payload)); + msg.send_size =3D sizeof(req.ebuf); + msg.opcode =3D MSG_OP_EXECUTE_BUFFER_CF; + break; + case ERT_START_NPU: { + struct amdxdna_cmd_start_npu *sn =3D payload; + + if (unlikely(payload_len - sizeof(*sn) > sizeof(req.dpu.payload))) + XDNA_DBG(xdna, "Invalid dpu payload len: %d", payload_len); + req.dpu.inst_buf_addr =3D sn->buffer; + req.dpu.inst_size =3D sn->buffer_size; + req.dpu.inst_prop_cnt =3D sn->prop_count; + req.dpu.cu_idx =3D cu_idx; + memcpy(req.dpu.payload, sn->prop_args, sizeof(req.dpu.payload)); + msg.send_size =3D sizeof(req.dpu); + msg.opcode =3D MSG_OP_EXEC_DPU; + break; + } + default: + XDNA_DBG(xdna, "Invalid ERT cmd op code: %d", op); + return -EINVAL; + } + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + print_hex_dump_debug("cmd: ", DUMP_PREFIX_OFFSET, 16, 4, &req, + 0x40, false); + + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_cf(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_execbuf_cf *buf =3D cmd_buf + offset; + int cu_idx =3D amdxdna_cmd_get_cu_idx(abo); + u32 payload_len; + void *payload; + + if (cu_idx < 0) + return -EINVAL; + + payload =3D amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + + if (!slot_cf_has_space(offset, payload_len)) + return -ENOSPC; + + buf->cu_idx =3D cu_idx; + buf->arg_cnt =3D payload_len / sizeof(u32); + memcpy(buf->args, payload, payload_len); + /* Accurate buf size to hint firmware to do necessary copy */ + *size =3D sizeof(*buf) + payload_len; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_dpu(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_dpu *buf =3D cmd_buf + offset; + int cu_idx =3D amdxdna_cmd_get_cu_idx(abo); + struct amdxdna_cmd_start_npu *sn; + u32 payload_len; + void *payload; + u32 arg_sz; + + if (cu_idx < 0) + return -EINVAL; + + payload =3D amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + sn =3D payload; + arg_sz =3D payload_len - sizeof(*sn); + if (payload_len < sizeof(*sn) || arg_sz > MAX_DPU_ARGS_SIZE) + return -EINVAL; + + if (!slot_dpu_has_space(offset, arg_sz)) + return -ENOSPC; + + buf->inst_buf_addr =3D sn->buffer; + buf->inst_size =3D sn->buffer_size; + buf->inst_prop_cnt =3D sn->prop_count; + buf->cu_idx =3D cu_idx; + buf->arg_cnt =3D arg_sz / sizeof(u32); + memcpy(buf->args, sn->prop_args, arg_sz); + + /* Accurate buf size to hint firmware to do necessary copy */ + *size +=3D sizeof(*buf) + arg_sz; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot(u32 op, struct amdxdna_gem_obj *cmdbuf_abo, u32= offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + u32 this_op =3D amdxdna_cmd_get_op(abo); + void *cmd_buf =3D cmdbuf_abo->mem.kva; + int ret; + + if (this_op !=3D op) { + ret =3D -EINVAL; + goto done; + } + + switch (op) { + case ERT_START_CU: + ret =3D aie2_cmdlist_fill_one_slot_cf(cmd_buf, offset, abo, size); + break; + case ERT_START_NPU: + ret =3D aie2_cmdlist_fill_one_slot_dpu(cmd_buf, offset, abo, size); + break; + default: + ret =3D -EOPNOTSUPP; + } + +done: + if (ret) { + XDNA_ERR(abo->client->xdna, "Can't fill slot for cmd op %d ret %d", + op, ret); + } + return ret; +} + +static inline struct amdxdna_gem_obj * +aie2_cmdlist_get_cmd_buf(struct amdxdna_sched_job *job) +{ + int idx =3D get_job_idx(job->seq); + + return job->hwctx->priv->cmd_buf[idx]; +} + +static void +aie2_cmdlist_prepare_request(struct cmd_chain_req *req, + struct amdxdna_gem_obj *cmdbuf_abo, u32 size, u32 cnt) +{ + req->buf_addr =3D cmdbuf_abo->mem.dev_addr; + req->buf_size =3D size; + req->count =3D cnt; + drm_clflush_virt_range(cmdbuf_abo->mem.kva, size); + XDNA_DBG(cmdbuf_abo->client->xdna, "Command buf addr 0x%llx size 0x%x cou= nt %d", + req->buf_addr, size, cnt); +} + +static inline u32 +aie2_cmd_op_to_msg_op(u32 op) +{ + switch (op) { + case ERT_START_CU: + return MSG_OP_CHAIN_EXEC_BUFFER_CF; + case ERT_START_NPU: + return MSG_OP_CHAIN_EXEC_DPU; + default: + return MSG_OP_MAX_OPCODE; + } +} + +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo =3D aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_client *client =3D hwctx->client; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct amdxdna_cmd_chain *payload; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 payload_len; + u32 offset =3D 0; + u32 size; + int ret; + u32 op; + u32 i; + + op =3D amdxdna_cmd_get_op(cmd_abo); + payload =3D amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (op !=3D ERT_CMD_CHAIN || !payload || + payload_len < struct_size(payload, data, payload->command_count)) + return -EINVAL; + + for (i =3D 0; i < payload->command_count; i++) { + u32 boh =3D (u32)(payload->data[i]); + struct amdxdna_gem_obj *abo; + + abo =3D amdxdna_gem_get_obj(client, boh, AMDXDNA_BO_CMD); + if (!abo) { + XDNA_ERR(client->xdna, "Failed to find cmd BO %d", boh); + return -ENOENT; + } + + /* All sub-cmd should have same op, use the first one. */ + if (i =3D=3D 0) + op =3D amdxdna_cmd_get_op(abo); + + ret =3D aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, offset, abo, &size); + amdxdna_gem_put_obj(abo); + if (ret) + return -EINVAL; + + offset +=3D size; + } + + /* The offset is the accumulated total size of the cmd buffer */ + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, offset, payload->command_c= ount); + + msg.opcode =3D aie2_cmd_op_to_msg_op(op); + if (msg.opcode =3D=3D MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo =3D aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 size; + int ret; + u32 op; + + op =3D amdxdna_cmd_get_op(cmd_abo); + ret =3D aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, 0, cmd_abo, &size); + if (ret) + return ret; + + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, size, 1); + + msg.opcode =3D aie2_cmd_op_to_msg_op(op); + if (msg.opcode =3D=3D MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *abo =3D to_xdna_obj(job->bos[0]); + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct xdna_mailbox_msg msg; + struct sync_bo_req req; + int ret =3D 0; + + req.src_addr =3D 0; + req.dst_addr =3D abo->mem.dev_addr - hwctx->client->dev_heap->mem.dev_add= r; + req.size =3D abo->mem.size; + + /* Device to Host */ + req.type =3D FIELD_PREP(AIE2_MSG_SYNC_BO_SRC_TYPE, SYNC_BO_DEV_MEM) | + FIELD_PREP(AIE2_MSG_SYNC_BO_DST_TYPE, SYNC_BO_HOST_MEM); + + XDNA_DBG(xdna, "sync %d bytes src(0x%llx) to dst(0x%llx) completed", + req.size, req.src_addr, req.dst_addr); + + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + msg.opcode =3D MSG_OP_SYNC_BO; + + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index ee9f114bc229..6017826a7104 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -5,8 +5,10 @@ =20 #include #include +#include #include #include +#include #include #include #include @@ -17,6 +19,7 @@ #include "aie2_pci.h" #include "aie2_solver.h" #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 @@ -499,4 +502,7 @@ const struct amdxdna_dev_ops aie2_ops =3D { .hwctx_init =3D aie2_hwctx_init, .hwctx_fini =3D aie2_hwctx_fini, .hwctx_config =3D aie2_hwctx_config, + .cmd_submit =3D aie2_cmd_submit, + .cmd_wait =3D aie2_cmd_wait, + .hmm_invalidate =3D aie2_hmm_invalidate, }; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 3ac936e2c9d1..81877d9c0542 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -76,6 +76,7 @@ enum psp_reg_idx { PSP_MAX_REGS /* Keep this at the end */ }; =20 +struct amdxdna_client; struct amdxdna_fw_ver; struct amdxdna_hwctx; =20 @@ -118,9 +119,28 @@ struct rt_config { u32 value; }; =20 +/* + * Define the maximum number of pending commands in a hardware context. + * Must be power of 2! + */ +#define HWCTX_MAX_CMDS 4 +#define get_job_idx(seq) ((seq) & (HWCTX_MAX_CMDS - 1)) struct amdxdna_hwctx_priv { struct amdxdna_gem_obj *heap; void *mbox_chann; + + struct drm_gpu_scheduler sched; + struct drm_sched_entity entity; + + struct mutex io_lock; /* protect seq and cmd order */ + struct wait_queue_head job_free_wq; + struct amdxdna_sched_job *pending[HWCTX_MAX_CMDS]; + u32 num_pending; + u64 seq; + /* Completed job counter */ + u64 completed; + + struct amdxdna_gem_obj *cmd_buf[HWCTX_MAX_CMDS]; }; =20 struct amdxdna_dev_hdl { @@ -199,10 +219,25 @@ int aie2_create_context(struct amdxdna_dev_hdl *ndev,= struct amdxdna_hwctx *hwct int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size); int aie2_config_cu(struct amdxdna_hwctx *hwctx); +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)); =20 /* aie2_hwctx.c */ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size); +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job = *job, u64 *seq); +int aie2_cmd_wait(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, unsigned long cur_se= q); +void aie2_stop_ctx_by_col_map(struct amdxdna_client *client, u32 col_map); +void aie2_restart_ctx(struct amdxdna_client *client); =20 #endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_= psp.c index 2efcfd1941bf..35ba55e2ab1e 100644 --- a/drivers/accel/amdxdna/aie2_psp.c +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -4,8 +4,10 @@ */ =20 #include +#include #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_= smu.c index 3fa7064649aa..91893d438da7 100644 --- a/drivers/accel/amdxdna/aie2_smu.c +++ b/drivers/accel/amdxdna/aie2_smu.c @@ -4,7 +4,9 @@ */ =20 #include +#include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/am= dxdna_ctx.c index 8acf8bfe0db9..b76640e1fdd0 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.c +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -7,17 +7,65 @@ #include #include #include +#include +#include #include +#include +#include =20 #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 #define MAX_HWCTX_ID 255 +#define MAX_ARG_COUNT 4095 =20 -static void amdxdna_hwctx_destroy(struct amdxdna_hwctx *hwctx) +struct amdxdna_fence { + struct dma_fence base; + spinlock_t lock; /* for base */ + struct amdxdna_hwctx *hwctx; +}; + +static const char *amdxdna_fence_get_driver_name(struct dma_fence *fence) +{ + return KBUILD_MODNAME; +} + +static const char *amdxdna_fence_get_timeline_name(struct dma_fence *fence) +{ + struct amdxdna_fence *xdna_fence; + + xdna_fence =3D container_of(fence, struct amdxdna_fence, base); + + return xdna_fence->hwctx->name; +} + +static const struct dma_fence_ops fence_ops =3D { + .get_driver_name =3D amdxdna_fence_get_driver_name, + .get_timeline_name =3D amdxdna_fence_get_timeline_name, +}; + +static struct dma_fence *amdxdna_fence_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_fence *fence; + + fence =3D kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return NULL; + + fence->hwctx =3D hwctx; + spin_lock_init(&fence->lock); + dma_fence_init(&fence->base, &fence_ops, &fence->lock, hwctx->id, 0); + return &fence->base; +} + +static void amdxdna_hwctx_destroy_rcu(struct amdxdna_hwctx *hwctx, + struct srcu_struct *ss) { struct amdxdna_dev *xdna =3D hwctx->client->xdna; =20 + synchronize_srcu(ss); + /* At this point, user is not able to submit new commands */ mutex_lock(&xdna->dev_lock); xdna->dev_info->ops->hwctx_fini(hwctx); @@ -27,6 +75,46 @@ static void amdxdna_hwctx_destroy(struct amdxdna_hwctx *= hwctx) kfree(hwctx); } =20 +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + u32 num_masks, count; + + if (amdxdna_cmd_get_op(abo) =3D=3D ERT_CMD_CHAIN) + num_masks =3D 0; + else + num_masks =3D 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + + if (size) { + count =3D FIELD_GET(AMDXDNA_CMD_COUNT, cmd->header); + if (unlikely(count <=3D num_masks)) { + *size =3D 0; + return NULL; + } + *size =3D (count - num_masks) * sizeof(u32); + } + return &cmd->data[num_masks]; +} + +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + u32 num_masks, i; + u32 *cu_mask; + + if (amdxdna_cmd_get_op(abo) =3D=3D ERT_CMD_CHAIN) + return -1; + + num_masks =3D 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + cu_mask =3D cmd->data; + for (i =3D 0; i < num_masks; i++) { + if (cu_mask[i]) + return ffs(cu_mask[i]) - 1; + } + + return -1; +} + /* * This should be called in close() and remove(). DO NOT call in other sys= calls. * This guarantee that when hwctx and resources will be released, if user @@ -43,7 +131,7 @@ void amdxdna_hwctx_remove_all(struct amdxdna_client *cli= ent) client->pid, hwctx->id); idr_remove(&client->hwctx_idr, hwctx->id); mutex_unlock(&client->hwctx_lock); - amdxdna_hwctx_destroy(hwctx); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); mutex_lock(&client->hwctx_lock); } mutex_unlock(&client->hwctx_lock); @@ -134,6 +222,12 @@ int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device = *dev, void *data, struct d if (!drm_dev_enter(dev, &idx)) return -ENODEV; =20 + /* + * Use hwctx_lock to achieve exclusion with other hwctx writers, + * SRCU to synchronize with exec/wait command ioctls. + * + * The pushed jobs are handled by DRM scheduler during destroy. + */ mutex_lock(&client->hwctx_lock); hwctx =3D idr_find(&client->hwctx_idr, args->handle); if (!hwctx) { @@ -146,7 +240,7 @@ int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *= dev, void *data, struct d idr_remove(&client->hwctx_idr, hwctx->id); mutex_unlock(&client->hwctx_lock); =20 - amdxdna_hwctx_destroy(hwctx); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); =20 XDNA_DBG(xdna, "PID %d destroyed HW context %d", client->pid, args->handl= e); out: @@ -160,10 +254,10 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device = *dev, void *data, struct dr struct amdxdna_drm_config_hwctx *args =3D data; struct amdxdna_dev *xdna =3D to_xdna_dev(dev); struct amdxdna_hwctx *hwctx; + int ret, idx; u32 buf_size; void *buf; u64 val; - int ret; =20 if (!xdna->dev_info->ops->hwctx_config) return -EOPNOTSUPP; @@ -202,17 +296,286 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device= *dev, void *data, struct dr } =20 mutex_lock(&xdna->dev_lock); + idx =3D srcu_read_lock(&client->hwctx_srcu); hwctx =3D idr_find(&client->hwctx_idr, args->handle); if (!hwctx) { XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handl= e); ret =3D -EINVAL; - goto unlock; + goto unlock_srcu; } =20 ret =3D xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, b= uf, buf_size); =20 -unlock: +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); mutex_unlock(&xdna->dev_lock); kfree(buf); return ret; } + +static void +amdxdna_arg_bos_put(struct amdxdna_sched_job *job) +{ + int i; + + for (i =3D 0; i < job->bo_cnt; i++) { + if (!job->bos[i]) + break; + drm_gem_object_put(job->bos[i]); + } +} + +static int +amdxdna_arg_bos_lookup(struct amdxdna_client *client, + struct amdxdna_sched_job *job, + u32 *bo_hdls, u32 bo_cnt) +{ + struct drm_gem_object *gobj; + int i, ret; + + job->bo_cnt =3D bo_cnt; + for (i =3D 0; i < job->bo_cnt; i++) { + struct amdxdna_gem_obj *abo; + + gobj =3D drm_gem_object_lookup(client->filp, bo_hdls[i]); + if (!gobj) { + ret =3D -ENOENT; + goto put_shmem_bo; + } + abo =3D to_xdna_obj(gobj); + + mutex_lock(&abo->lock); + if (abo->pinned) { + mutex_unlock(&abo->lock); + job->bos[i] =3D gobj; + continue; + } + + ret =3D amdxdna_gem_pin_nolock(abo); + if (ret) { + mutex_unlock(&abo->lock); + drm_gem_object_put(gobj); + goto put_shmem_bo; + } + abo->pinned =3D true; + mutex_unlock(&abo->lock); + + job->bos[i] =3D gobj; + } + + return 0; + +put_shmem_bo: + amdxdna_arg_bos_put(job); + return ret; +} + +static void amdxdna_sched_job_release(struct kref *ref) +{ + struct amdxdna_sched_job *job; + + job =3D container_of(ref, struct amdxdna_sched_job, refcnt); + + trace_amdxdna_debug_point(job->hwctx->name, job->seq, "job release"); + amdxdna_arg_bos_put(job); + amdxdna_gem_put_obj(job->cmd_bo); + kfree(job); +} + +void amdxdna_job_put(struct amdxdna_sched_job *job) +{ + kref_put(&job->refcnt, amdxdna_sched_job_release); +} + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdl, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_sched_job *job; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + XDNA_DBG(xdna, "Command BO hdl %d, Arg BO count %d", cmd_bo_hdl, arg_bo_c= nt); + job =3D kzalloc(struct_size(job, bos, arg_bo_cnt), GFP_KERNEL); + if (!job) + return -ENOMEM; + + if (cmd_bo_hdl !=3D AMDXDNA_INVALID_BO_HANDLE) { + job->cmd_bo =3D amdxdna_gem_get_obj(client, cmd_bo_hdl, AMDXDNA_BO_CMD); + if (!job->cmd_bo) { + XDNA_ERR(xdna, "Failed to get cmd bo from %d", cmd_bo_hdl); + ret =3D -EINVAL; + goto free_job; + } + } else { + job->cmd_bo =3D NULL; + } + + ret =3D amdxdna_arg_bos_lookup(client, job, arg_bo_hdls, arg_bo_cnt); + if (ret) { + XDNA_ERR(xdna, "Argument BOs lookup failed, ret %d", ret); + goto cmd_put; + } + + idx =3D srcu_read_lock(&client->hwctx_srcu); + hwctx =3D idr_find(&client->hwctx_idr, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret =3D -EINVAL; + goto unlock_srcu; + } + + if (hwctx->status !=3D HWCTX_STAT_READY) { + XDNA_ERR(xdna, "HW Context is not ready"); + ret =3D -EINVAL; + goto unlock_srcu; + } + + job->hwctx =3D hwctx; + job->mm =3D current->mm; + + job->fence =3D amdxdna_fence_create(hwctx); + if (!job->fence) { + XDNA_ERR(xdna, "Failed to create fence"); + ret =3D -ENOMEM; + goto unlock_srcu; + } + kref_init(&job->refcnt); + + ret =3D xdna->dev_info->ops->cmd_submit(hwctx, job, seq); + if (ret) + goto put_fence; + + /* + * The amdxdna_hwctx_destroy_rcu() will release hwctx and associated + * resource after synchronize_srcu(). The submitted jobs should be + * handled by the queue, for example DRM scheduler, in device layer. + * For here we can unlock SRCU. + */ + srcu_read_unlock(&client->hwctx_srcu, idx); + trace_amdxdna_debug_point(hwctx->name, *seq, "job pushed"); + + return 0; + +put_fence: + dma_fence_put(job->fence); +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + amdxdna_arg_bos_put(job); +cmd_put: + amdxdna_gem_put_obj(job->cmd_bo); +free_job: + kfree(job); + return ret; +} + +/* + * The submit command ioctl submits a command to firmware. One firmware co= mmand + * may contain multiple command BOs for processing as a whole. + * The command sequence number is returned which can be used for wait comm= and ioctl. + */ +static int amdxdna_drm_submit_execbuf(struct amdxdna_client *client, + struct amdxdna_drm_exec_cmd *args) +{ + struct amdxdna_dev *xdna =3D client->xdna; + u32 *arg_bo_hdls; + u32 cmd_bo_hdl; + int ret; + + if (!args->arg_count || args->arg_count > MAX_ARG_COUNT) { + XDNA_ERR(xdna, "Invalid arg bo count %d", args->arg_count); + return -EINVAL; + } + + /* Only support single command for now. */ + if (args->cmd_count !=3D 1) { + XDNA_ERR(xdna, "Invalid cmd bo count %d", args->cmd_count); + return -EINVAL; + } + + cmd_bo_hdl =3D (u32)args->cmd_handles; + arg_bo_hdls =3D kcalloc(args->arg_count, sizeof(u32), GFP_KERNEL); + if (!arg_bo_hdls) + return -ENOMEM; + ret =3D copy_from_user(arg_bo_hdls, u64_to_user_ptr(args->args), + args->arg_count * sizeof(u32)); + if (ret) { + ret =3D -EFAULT; + goto free_cmd_bo_hdls; + } + + ret =3D amdxdna_cmd_submit(client, cmd_bo_hdl, arg_bo_hdls, + args->arg_count, args->hwctx, &args->seq); + if (ret) + XDNA_DBG(xdna, "Submit cmds failed, ret %d", ret); + +free_cmd_bo_hdls: + kfree(arg_bo_hdls); + if (!ret) + XDNA_DBG(xdna, "Pushed cmd %lld to scheduler", args->seq); + return ret; +} + +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struc= t drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_drm_exec_cmd *args =3D data; + + if (args->ext || args->ext_flags) + return -EINVAL; + + switch (args->type) { + case AMDXDNA_CMD_SUBMIT_EXEC_BUF: + return amdxdna_drm_submit_execbuf(client, args); + } + + XDNA_ERR(client->xdna, "Invalid command type %d", args->type); + return -EINVAL; +} + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + if (!xdna->dev_info->ops->cmd_wait) + return -EOPNOTSUPP; + + /* For locking concerns, see amdxdna_drm_exec_cmd_ioctl. */ + idx =3D srcu_read_lock(&client->hwctx_srcu); + hwctx =3D idr_find(&client->hwctx_idr, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret =3D -EINVAL; + goto unlock_hwctx_srcu; + } + + ret =3D xdna->dev_info->ops->cmd_wait(hwctx, seq, timeout); + +unlock_hwctx_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + return ret; +} + +int amdxdna_drm_wait_cmd_ioctl(struct drm_device *dev, void *data, struct = drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_drm_wait_cmd *args =3D data; + int ret; + + XDNA_DBG(xdna, "PID %d hwctx %d timeout set %d ms for cmd %lld", + client->pid, args->hwctx, args->timeout, args->seq); + + ret =3D amdxdna_cmd_wait(client, args->hwctx, args->seq, args->timeout); + + XDNA_DBG(xdna, "PID %d hwctx %d cmd %lld wait finished, ret %d", + client->pid, args->hwctx, args->seq, ret); + + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/am= dxdna_ctx.h index 665b3208897d..65f9c1dfe32c 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.h +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -6,6 +6,52 @@ #ifndef _AMDXDNA_CTX_H_ #define _AMDXDNA_CTX_H_ =20 +#include "amdxdna_gem.h" + +struct amdxdna_hwctx_priv; + +enum ert_cmd_opcode { + ERT_START_CU =3D 0, + ERT_CMD_CHAIN =3D 19, + ERT_START_NPU =3D 20, +}; + +enum ert_cmd_state { + ERT_CMD_STATE_INVALID, + ERT_CMD_STATE_NEW, + ERT_CMD_STATE_QUEUED, + ERT_CMD_STATE_RUNNING, + ERT_CMD_STATE_COMPLETED, + ERT_CMD_STATE_ERROR, + ERT_CMD_STATE_ABORT, + ERT_CMD_STATE_SUBMITTED, + ERT_CMD_STATE_TIMEOUT, + ERT_CMD_STATE_NORESPONSE, +}; + +/* + * Interpretation of the beginning of data payload for ERT_START_NPU in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is regular kernel a= rgs. + */ +struct amdxdna_cmd_start_npu { + u64 buffer; /* instruction buffer address */ + u32 buffer_size; /* size of buffer in bytes */ + u32 prop_count; /* properties count */ + u32 prop_args[]; /* properties and regular kernel arguments */ +}; + +/* + * Interpretation of the beginning of data payload for ERT_CMD_CHAIN in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is cmd BO handles. + */ +struct amdxdna_cmd_chain { + u32 command_count; + u32 submit_index; + u32 error_index; + u32 reserved[3]; + u64 data[] __counted_by(command_count); +}; + /* Exec buffer command header format */ #define AMDXDNA_CMD_STATE GENMASK(3, 0) #define AMDXDNA_CMD_EXTRA_CU_MASK GENMASK(11, 10) @@ -40,9 +86,73 @@ struct amdxdna_hwctx { struct amdxdna_hwctx_param_config_cu *cus; }; =20 +#define drm_job_to_xdna_job(j) \ + container_of(j, struct amdxdna_sched_job, base) + +struct amdxdna_sched_job { + struct drm_sched_job base; + struct kref refcnt; + struct amdxdna_hwctx *hwctx; + struct mm_struct *mm; + /* The fence to notice DRM scheduler that job is done by hardware */ + struct dma_fence *fence; + /* user can wait on this fence */ + struct dma_fence *out_fence; + u64 seq; + struct amdxdna_gem_obj *cmd_bo; + size_t bo_cnt; + struct drm_gem_object *bos[] __counted_by(bo_cnt); +}; + +static inline u32 +amdxdna_cmd_get_op(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header); +} + +static inline void +amdxdna_cmd_set_state(struct amdxdna_gem_obj *abo, enum ert_cmd_state s) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + cmd->header &=3D ~AMDXDNA_CMD_STATE; + cmd->header |=3D FIELD_PREP(AMDXDNA_CMD_STATE, s); +} + +static inline enum ert_cmd_state +amdxdna_cmd_get_state(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_STATE, cmd->header); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size); +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo); + +static inline u32 amdxdna_hwctx_col_map(struct amdxdna_hwctx *hwctx) +{ + return GENMASK(hwctx->start_col + hwctx->num_col - 1, + hwctx->start_col); +} + +void amdxdna_job_put(struct amdxdna_sched_job *job); + void amdxdna_hwctx_remove_all(struct amdxdna_client *client); + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdls, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq); + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout); + int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, st= ruct drm_file *filp); +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struc= t drm_file *filp); +int amdxdna_drm_wait_cmd_ioctl(struct drm_device *dev, void *data, struct = drm_file *filp); =20 #endif /* _AMDXDNA_CTX_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/am= dxdna_gem.c index 66373baa4600..091674a4bbfa 100644 --- a/drivers/accel/amdxdna/amdxdna_gem.c +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include =20 diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c b/drivers/accel= /amdxdna/amdxdna_mailbox_helper.c index 42b615394605..5139a9c96a91 100644 --- a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c @@ -3,10 +3,15 @@ * Copyright (C) 2024, Advanced Micro Devices, Inc. */ =20 +#include #include #include +#include +#include +#include #include =20 +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_mailbox_helper.h" #include "amdxdna_pci_drv.h" diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 47ea79d4a021..5c1e863825e0 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include =20 @@ -64,6 +65,7 @@ static int amdxdna_drm_open(struct drm_device *ddev, stru= ct drm_file *filp) goto unbind_sva; } mutex_init(&client->hwctx_lock); + init_srcu_struct(&client->hwctx_srcu); idr_init_base(&client->hwctx_idr, AMDXDNA_INVALID_CTX_HANDLE + 1); mutex_init(&client->mm_lock); =20 @@ -93,6 +95,7 @@ static void amdxdna_drm_close(struct drm_device *ddev, st= ruct drm_file *filp) XDNA_DBG(xdna, "closing pid %d", client->pid); =20 idr_destroy(&client->hwctx_idr); + cleanup_srcu_struct(&client->hwctx_srcu); mutex_destroy(&client->hwctx_lock); mutex_destroy(&client->mm_lock); if (client->dev_heap) @@ -133,6 +136,9 @@ static const struct drm_ioctl_desc amdxdna_drm_ioctls[]= =3D { DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_BO, amdxdna_drm_create_bo_ioctl, 0), DRM_IOCTL_DEF_DRV(AMDXDNA_GET_BO_INFO, amdxdna_drm_get_bo_info_ioctl, 0), DRM_IOCTL_DEF_DRV(AMDXDNA_SYNC_BO, amdxdna_drm_sync_bo_ioctl, 0), + /* Exectuion */ + DRM_IOCTL_DEF_DRV(AMDXDNA_EXEC_CMD, amdxdna_drm_submit_cmd_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_WAIT_CMD, amdxdna_drm_wait_cmd_ioctl, 0), }; =20 static const struct file_operations amdxdna_fops =3D { diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 3dddde4ac12a..0324e73094b2 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -20,6 +20,7 @@ extern const struct drm_driver amdxdna_drm_drv; struct amdxdna_dev; struct amdxdna_gem_obj; struct amdxdna_hwctx; +struct amdxdna_sched_job; =20 /* * struct amdxdna_dev_ops - Device hardware operation callbacks @@ -31,6 +32,8 @@ struct amdxdna_dev_ops { void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, voi= d *buf, u32 size); void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq= ); + int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job, u64 *seq); + int (*cmd_wait)(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); }; =20 /* @@ -88,6 +91,8 @@ struct amdxdna_client { struct list_head node; pid_t pid; struct mutex hwctx_lock; /* protect hwctx */ + /* do NOT wait this srcu when hwctx_lock is hold */ + struct srcu_struct hwctx_srcu; struct idr hwctx_idr; struct amdxdna_dev *xdna; struct drm_file *filp; diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/= amdxdna_sysfs.c index 668b94b92714..f27e4ee960a0 100644 --- a/drivers/accel/amdxdna/amdxdna_sysfs.c +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -3,9 +3,14 @@ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. */ =20 +#include #include +#include #include +#include +#include =20 +#include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 static ssize_t vbnv_show(struct device *dev, struct device_attribute *attr= , char *buf) diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1= _regs.c index 720aab0ed7c4..f00c50461b09 100644 --- a/drivers/accel/amdxdna/npu1_regs.c +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2= _regs.c index f3ea18bcf294..00cb381031d2 100644 --- a/drivers/accel/amdxdna/npu2_regs.c +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4= _regs.c index db61142f0d4e..b6dae9667cca 100644 --- a/drivers/accel/amdxdna/npu4_regs.c +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5= _regs.c index debf4e95b9bb..bed1baf8e160 100644 --- a/drivers/accel/amdxdna/npu5_regs.c +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/include/trace/events/amdxdna.h b/include/trace/events/amdxdna.h index 33343d8f0622..c6cb2da7b706 100644 --- a/include/trace/events/amdxdna.h +++ b/include/trace/events/amdxdna.h @@ -9,8 +9,49 @@ #if !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_AMDXDNA_H =20 +#include #include =20 +TRACE_EVENT(amdxdna_debug_point, + TP_PROTO(const char *name, u64 number, const char *str), + + TP_ARGS(name, number, str), + + TP_STRUCT__entry(__string(name, name) + __field(u64, number) + __string(str, str)), + + TP_fast_assign(__assign_str(name); + __entry->number =3D number; + __assign_str(str);), + + TP_printk("%s:%llu %s", __get_str(name), __entry->number, + __get_str(str)) +); + +TRACE_EVENT(xdna_job, + TP_PROTO(struct drm_sched_job *sched_job, const char *name, const cha= r *str, u64 seq), + + TP_ARGS(sched_job, name, str, seq), + + TP_STRUCT__entry(__string(name, name) + __string(str, str) + __field(u64, fence_context) + __field(u64, fence_seqno) + __field(u64, seq)), + + TP_fast_assign(__assign_str(name); + __assign_str(str); + __entry->fence_context =3D sched_job->s_fence->finished.context; + __entry->fence_seqno =3D sched_job->s_fence->finished.seqno; + __entry->seq =3D seq;), + + TP_printk("fence=3D(context:%llu, seqno:%lld), %s seq#:%lld %s", + __entry->fence_context, __entry->fence_seqno, + __get_str(name), __entry->seq, + __get_str(str)) +); + DECLARE_EVENT_CLASS(xdna_mbox_msg, TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 msg_id), =20 diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h index 3792750834b2..08f3ec7146ab 100644 --- a/include/uapi/drm/amdxdna_accel.h +++ b/include/uapi/drm/amdxdna_accel.h @@ -13,6 +13,7 @@ extern "C" { #endif =20 +#define AMDXDNA_INVALID_CMD_HANDLE (~0UL) #define AMDXDNA_INVALID_ADDR (~0UL) #define AMDXDNA_INVALID_CTX_HANDLE 0 #define AMDXDNA_INVALID_BO_HANDLE 0 @@ -29,6 +30,8 @@ enum amdxdna_drm_ioctl_id { DRM_AMDXDNA_CREATE_BO, DRM_AMDXDNA_GET_BO_INFO, DRM_AMDXDNA_SYNC_BO, + DRM_AMDXDNA_EXEC_CMD, + DRM_AMDXDNA_WAIT_CMD, }; =20 /** @@ -201,6 +204,54 @@ struct amdxdna_drm_sync_bo { __u64 size; }; =20 +enum amdxdna_cmd_type { + AMDXDNA_CMD_SUBMIT_EXEC_BUF =3D 0, + AMDXDNA_CMD_SUBMIT_DEPENDENCY, + AMDXDNA_CMD_SUBMIT_SIGNAL, +}; + +/** + * struct amdxdna_drm_exec_cmd - Execute command. + * @ext: MBZ. + * @ext_flags: MBZ. + * @hwctx: Hardware context handle. + * @type: One of command type in enum amdxdna_cmd_type. + * @cmd_handles: Array of command handles or the command handle itself + * in case of just one. + * @args: Array of arguments for all command handles. + * @cmd_count: Number of command handles in the cmd_handles array. + * @arg_count: Number of arguments in the args array. + * @seq: Returned sequence number for this command. + */ +struct amdxdna_drm_exec_cmd { + __u64 ext; + __u64 ext_flags; + __u32 hwctx; + __u32 type; + __u64 cmd_handles; + __u64 args; + __u32 cmd_count; + __u32 arg_count; + __u64 seq; +}; + +/** + * struct amdxdna_drm_wait_cmd - Wait exectuion command. + * + * @hwctx: hardware context handle. + * @timeout: timeout in ms, 0 implies infinite wait. + * @seq: sequence number of the command returned by execute command. + * + * Wait a command specified by seq to be completed. + * Using AMDXDNA_INVALID_CMD_HANDLE as seq means wait till there is a free= slot + * to submit a new command. + */ +struct amdxdna_drm_wait_cmd { + __u32 hwctx; + __u32 timeout; + __u64 seq; +}; + #define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ struct amdxdna_drm_create_hwctx) @@ -225,6 +276,14 @@ struct amdxdna_drm_sync_bo { DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_SYNC_BO, \ struct amdxdna_drm_sync_bo) =20 +#define DRM_IOCTL_AMDXDNA_EXEC_CMD \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_EXEC_CMD, \ + struct amdxdna_drm_exec_cmd) + +#define DRM_IOCTL_AMDXDNA_WAIT_CMD \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_WAIT_CMD, \ + struct amdxdna_drm_wait_cmd) + #if defined(__cplusplus) } /* extern c end */ #endif --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2066.outbound.protection.outlook.com [40.107.236.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9D9C1E2009 for ; Fri, 11 Oct 2024 23:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.66 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688392; cv=fail; b=T7NZ+SSuFlIeRCHZwVDidYtyn3sxnV1nSdslvdhAouRn441ytvqio46vwNRzCOyXLt/guIb6RiAvjilJs5BlwEJcl6j95cQgiugZ8gD66A8hrhGXzcJjEAdGAkw2EOUzPi722BdhZm2Hdi3eI/CmSkv1kW6mcakSKIJ9wwLfdBg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688392; c=relaxed/simple; bh=pmA+Lij5be6i9UrnhvznoxkWikHaVU5VIqRbdm/9dGo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PCMnIGeF79MvZbHtJ60XcrDY3fUlqPUx9UfkPCYYQ03mRePyLz6tq4xSn3XBvesrgWPsDfNJT/6fcM2Ru8Ude9PCrQJTUV+PJOTb+Jelw2krk879xw53ht6fAstoO6RtO8PKgOSRcoMdG/gZbOrXlEOxXidSU2dEpLMx7PMO0U4= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=g2K9pgLV; arc=fail smtp.client-ip=40.107.236.66 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="g2K9pgLV" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=OdDUsxqqztHETCmhEpTvbjbRFYqiizO791/oJGZKgxfru3geknkYtZ/wqrS7QhDyQ1a8unRfxLXLbi700j/6Aro6tsIX937gBgsFXD1g/Y0MksOBQ8+yH2A8ETfuz6vvp8KfZKOPqpieDy8/oE6icp4AthD+al1lAbwRQHmmXoub3bQIfDGfqX/f4VfR6XvK2pPMdEdbg9S4TktR5Bvq9yVj9CYBOY66XLmHwYkaLl2/v8/SXTLCJmmVcppkuMMW+RhtDcX9rh1jTAW285S+pSKiCJOB/cmNPjCwWAGD/0YSOGXgGkUjJF+T3gicHl/S+m3CVAfkj3YVpyqhRk1sdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tpeRw8N+aUKMwAG2av5xNgQovLosGegxLnsTsNLo0UI=; b=WNBxNORMfjmnrjXJmkkdWouYaQ4zxLLobAzPUc4VUBMuelFAMEhp9TZPeixkNDU7LBzd38xym8X6p7D43WZJUW40+OVbM1Fa4h4Rxsycwso/KwyXD5g8TY+gWT2E7mDkVSGPYGZDZ9e01HLIfgadkpGW0kQ4JVQlCxwi9pFjBCawiwoKSkKkJ03Ctkogu6dQGXY25uRNqiLR/AlgPh6jRe+XP/yzfVXZUWiNRcQ2kGoR4/XU9PHDyMYufmHqeRO5wx2S5Hxt40ayZgcJ3TOnTNCUNrUuUYWbdmjWQrdS5RBCpkP7vkgUHxUDyN0HbE2ijeg3/OrI8Es4uDWZMX44IA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tpeRw8N+aUKMwAG2av5xNgQovLosGegxLnsTsNLo0UI=; b=g2K9pgLV/34o3O/LsVQJ5yuiDkDdVy1MzfiNjpP1UN4CSBWsXRXvoVRdCb5jMnPtuIzRz9aWKN+5doqlcgiNdeYIJ/qpgfAgRwvoS4HFZNMuqjCBwz5763s32QEqk7zjdEmCVPFtK5H/gMBPC2xVhYZeWLRGbMMFbiADJ42uFb0= Received: from SJ0PR03CA0006.namprd03.prod.outlook.com (2603:10b6:a03:33a::11) by PH7PR12MB7455.namprd12.prod.outlook.com (2603:10b6:510:20e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 23:13:04 +0000 Received: from CO1PEPF000044FA.namprd21.prod.outlook.com (2603:10b6:a03:33a:cafe::fe) by SJ0PR03CA0006.outlook.office365.com (2603:10b6:a03:33a::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.20 via Frontend Transport; Fri, 11 Oct 2024 23:13:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044FA.mail.protection.outlook.com (10.167.241.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:04 +0000 Received: from SATLEXMB05.amd.com (10.181.40.146) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:03 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB05.amd.com (10.181.40.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:03 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:02 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , , Narendra Gutta , Xiaoming Ren Subject: [PATCH V4 08/10] accel/amdxdna: Add suspend and resume Date: Fri, 11 Oct 2024 16:12:42 -0700 Message-ID: <20241011231244.3182625-9-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: None (SATLEXMB05.amd.com: lizhi.hou@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044FA:EE_|PH7PR12MB7455:EE_ X-MS-Office365-Filtering-Correlation-Id: 29791ed2-902c-4c57-aeb7-08dcea4a42ab X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?dbzAqmbeCmEp3y1+fZn+ulUBAqyk5Ql58CsClHHHkZONAJ/VpdiDphRAgyOt?= =?us-ascii?Q?KyblEoEA4O6ynxmb0VjKkaiKZcwOzrnR/omgBk8am96+IJJHRqoMCrXN3awU?= =?us-ascii?Q?HnusqDz441QWavvNIwXoVLpYFS29HfrDOkT2pP4SzPMwyVvMmK+Kedq4Iiui?= =?us-ascii?Q?wzLzNGHlj0klMskndih4HpIwiXs5PU3O1Sd1CRI8kL55y1YgcCG0v+PRdYiL?= =?us-ascii?Q?YhIwNn/7i3puTc8CLIWzXs5fkyQr17tGcOjhheEnstzCzT770hJmHFYO5/UY?= =?us-ascii?Q?rxQOeJ6wa4MgpqaYcywxxRDiHtee/Ha+Z6B/fAxqX3R1WZgr8Ob07OOZk/Xx?= =?us-ascii?Q?hrRcAehWUlTzzFydHCyZNFrYTV9+RCWC0LFOCywgo6nqHUqKokINAtZgvQbn?= =?us-ascii?Q?7jAgyM9f0FX1+WbUASRqfxzEnTSiWROKMsrnZrZxFTpH2m5jdYhkqHucSb4v?= =?us-ascii?Q?gKa2b0cX228HVXrnchMfLHQn6c0KAxNdojzkqit8zfzpcq9GGDa0/gb34pVI?= =?us-ascii?Q?Ig/iiXqrX0J0gN00nkNqROhIsIsSmwZF+cDjCAeX9hy8nC6oZz6wy8RgC7+k?= =?us-ascii?Q?Jbq+PTkfFRdHvmooOH2ipbvgsjTZB5ipZFQQ3zVdAjD7CcdDdkeP7t4wshwQ?= =?us-ascii?Q?jNWdbf2H9R8nePVQ9fGo/q4WDv1hYFfwqHlYtaRpPmPp9JYysjhijjfJL/UA?= =?us-ascii?Q?1XwaU5JenNd4TF54S/sXoinC7yDP47ygkRlTP4NWfT8V9jqYiib9Uq009EcK?= =?us-ascii?Q?9Hg6ynKYx6+ROquQAZqvtNe3diNhHtXzlwal1GDIjLAnL2cZhevOaceX3ikq?= =?us-ascii?Q?Q4rVnZmp7wFaMbH2RNg1N2uobIpdOknd0wnO3WfUtZWB/471wD7PyhVTMwEC?= =?us-ascii?Q?MDN4SHzlAW8k3vGrNkAdZIj2xqPZfw8Y3vygS4FNaJSpzOOdPvKrbxfSobla?= =?us-ascii?Q?ECUfy87/pVUptgqmURYCnrqEsw1uhqwNnZmE3HuUSTnmRGB5M/A0xi7zlESO?= =?us-ascii?Q?fbYEvCuNprfV2xFAFKOwVG0Vdhqt0lwMnyszwgJ1vxD3rJufqgH/EaK/YUCh?= =?us-ascii?Q?Qu9XWFPXcsjiuOyUiWOXCEJ+IFrUqSx5kL9mkkpsG0B00II5tpt7zOnLceUD?= =?us-ascii?Q?/2kiKwfZH4EpsDdVUhcf8G1etpDqJ9eBu8UqzxWdVxXDZB1EOtmfxMMa+UMs?= =?us-ascii?Q?tSZGf+0Ri1U77FomvI2SLW34KTVIgUJ7mCPace56ukF1Lxrf2KHDvHu7n37m?= =?us-ascii?Q?jxnbIlXXQd+9v71UWCbrxLIikB5RmoKdxkBR2gKZAmKXRd81sRjIbNabhiLR?= =?us-ascii?Q?QPSG8ipkGC1NHFvz7q4Ou6Qq47fwP/tvyiVCutpEfzJQFIiMJQIzkx0OZUFi?= =?us-ascii?Q?Q5KfbB4Pt1B5kbLMpmwjGI/Q/UE1?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(1800799024)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:04.4919 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 29791ed2-902c-4c57-aeb7-08dcea4a42ab X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044FA.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7455 Content-Type: text/plain; charset="utf-8" Implement PCI power management suspend and resume callbacks. Co-developed-by: Narendra Gutta Signed-off-by: Narendra Gutta Co-developed-by: Xiaoming Ren Signed-off-by: Xiaoming Ren Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- drivers/accel/amdxdna/aie2_ctx.c | 30 ++++++ drivers/accel/amdxdna/aie2_pci.c | 4 + drivers/accel/amdxdna/aie2_pci.h | 2 + drivers/accel/amdxdna/amdxdna_ctx.c | 26 +++++ drivers/accel/amdxdna/amdxdna_ctx.h | 2 + drivers/accel/amdxdna/amdxdna_pci_drv.c | 120 +++++++++++++++++++++++- drivers/accel/amdxdna/amdxdna_pci_drv.h | 4 + 7 files changed, 185 insertions(+), 3 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_= ctx.c index f9010a902c99..8da842731103 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -179,6 +179,36 @@ static int aie2_hwctx_wait_for_idle(struct amdxdna_hwc= tx *hwctx) return 0; } =20 +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + + /* + * Command timeout is unlikely. But if it happens, it doesn't + * break the system. aie2_hwctx_stop() will destroy mailbox + * and abort all commands. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + aie2_hwctx_wait_for_idle(hwctx); + aie2_hwctx_stop(xdna, hwctx, NULL); + hwctx->old_status =3D hwctx->status; + hwctx->status =3D HWCTX_STAT_STOP; +} + +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + + /* + * The resume path cannot guarantee that mailbox channel can be + * regenerated. If this happen, when submit message to this + * mailbox channel, error will return. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + hwctx->status =3D hwctx->old_status; + aie2_hwctx_restart(xdna, hwctx); +} + static void aie2_sched_notify(struct amdxdna_sched_job *job) { diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index 6017826a7104..7a1729bed62d 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -499,10 +499,14 @@ static void aie2_fini(struct amdxdna_dev *xdna) const struct amdxdna_dev_ops aie2_ops =3D { .init =3D aie2_init, .fini =3D aie2_fini, + .resume =3D aie2_hw_start, + .suspend =3D aie2_hw_stop, .hwctx_init =3D aie2_hwctx_init, .hwctx_fini =3D aie2_hwctx_fini, .hwctx_config =3D aie2_hwctx_config, .cmd_submit =3D aie2_cmd_submit, .cmd_wait =3D aie2_cmd_wait, .hmm_invalidate =3D aie2_hmm_invalidate, + .hwctx_suspend =3D aie2_hwctx_suspend, + .hwctx_resume =3D aie2_hwctx_resume, }; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 81877d9c0542..bbcc3d85e13c 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -234,6 +234,8 @@ int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct am= dxdna_sched_job *job, int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size); +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx); int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job = *job, u64 *seq); int aie2_cmd_wait(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, unsigned long cur_se= q); diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/am= dxdna_ctx.c index b76640e1fdd0..6a1db715129e 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.c +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -59,6 +59,32 @@ static struct dma_fence *amdxdna_fence_create(struct amd= xdna_hwctx *hwctx) return &fence->base; } =20 +void amdxdna_hwctx_suspend(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + xdna->dev_info->ops->hwctx_suspend(hwctx); + mutex_unlock(&client->hwctx_lock); +} + +void amdxdna_hwctx_resume(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + xdna->dev_info->ops->hwctx_resume(hwctx); + mutex_unlock(&client->hwctx_lock); +} + static void amdxdna_hwctx_destroy_rcu(struct amdxdna_hwctx *hwctx, struct srcu_struct *ss) { diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/am= dxdna_ctx.h index 65f9c1dfe32c..1db8944b1956 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.h +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -141,6 +141,8 @@ static inline u32 amdxdna_hwctx_col_map(struct amdxdna_= hwctx *hwctx) void amdxdna_job_put(struct amdxdna_sched_job *job); =20 void amdxdna_hwctx_remove_all(struct amdxdna_client *client); +void amdxdna_hwctx_suspend(struct amdxdna_client *client); +void amdxdna_hwctx_resume(struct amdxdna_client *client); =20 int amdxdna_cmd_submit(struct amdxdna_client *client, u32 cmd_bo_hdls, u32 *arg_bo_hdls, u32 arg_bo_cnt, diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 5c1e863825e0..92113d83e861 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -13,11 +13,14 @@ #include #include #include +#include =20 #include "amdxdna_ctx.h" #include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 +#define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* miliseconds */ + /* * Bind the driver base on (vendor_id, device_id) pair and later use the * (device_id, rev_id) pair as a key to select the devices. The devices wi= th @@ -45,9 +48,17 @@ static int amdxdna_drm_open(struct drm_device *ddev, str= uct drm_file *filp) struct amdxdna_client *client; int ret; =20 + ret =3D pm_runtime_resume_and_get(ddev->dev); + if (ret) { + XDNA_ERR(xdna, "Failed to get rpm, ret %d", ret); + return ret; + } + client =3D kzalloc(sizeof(*client), GFP_KERNEL); - if (!client) - return -ENOMEM; + if (!client) { + ret =3D -ENOMEM; + goto put_rpm; + } =20 client->pid =3D pid_nr(filp->pid); client->xdna =3D xdna; @@ -83,6 +94,9 @@ static int amdxdna_drm_open(struct drm_device *ddev, stru= ct drm_file *filp) iommu_sva_unbind_device(client->sva); failed: kfree(client); +put_rpm: + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); =20 return ret; } @@ -105,6 +119,8 @@ static void amdxdna_drm_close(struct drm_device *ddev, = struct drm_file *filp) =20 XDNA_DBG(xdna, "pid %d closed", client->pid); kfree(client); + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); } =20 static int amdxdna_flush(struct file *f, fl_owner_t id) @@ -182,10 +198,11 @@ amdxdna_get_dev_info(struct pci_dev *pdev) =20 static int amdxdna_probe(struct pci_dev *pdev, const struct pci_device_id = *id) { + struct device *dev =3D &pdev->dev; struct amdxdna_dev *xdna; int ret; =20 - xdna =3D devm_drm_dev_alloc(&pdev->dev, &amdxdna_drm_drv, typeof(*xdna), = ddev); + xdna =3D devm_drm_dev_alloc(dev, &amdxdna_drm_drv, typeof(*xdna), ddev); if (IS_ERR(xdna)) return PTR_ERR(xdna); =20 @@ -211,12 +228,19 @@ static int amdxdna_probe(struct pci_dev *pdev, const = struct pci_device_id *id) goto failed_dev_fini; } =20 + pm_runtime_set_autosuspend_delay(dev, AMDXDNA_AUTOSUSPEND_DELAY); + pm_runtime_use_autosuspend(dev); + pm_runtime_allow(dev); + ret =3D drm_dev_register(&xdna->ddev, 0); if (ret) { XDNA_ERR(xdna, "DRM register failed, ret %d", ret); + pm_runtime_forbid(dev); goto failed_sysfs_fini; } =20 + pm_runtime_mark_last_busy(dev); + pm_runtime_put_autosuspend(dev); return 0; =20 failed_sysfs_fini: @@ -231,8 +255,12 @@ static int amdxdna_probe(struct pci_dev *pdev, const s= truct pci_device_id *id) static void amdxdna_remove(struct pci_dev *pdev) { struct amdxdna_dev *xdna =3D pci_get_drvdata(pdev); + struct device *dev =3D &pdev->dev; struct amdxdna_client *client; =20 + pm_runtime_get_noresume(dev); + pm_runtime_forbid(dev); + drm_dev_unplug(&xdna->ddev); amdxdna_sysfs_fini(xdna); =20 @@ -254,11 +282,97 @@ static void amdxdna_remove(struct pci_dev *pdev) mutex_unlock(&xdna->dev_lock); } =20 +static int amdxdna_dev_suspend_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->suspend) + xdna->dev_info->ops->suspend(xdna); + + return 0; +} + +static int amdxdna_dev_resume_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->resume) + return xdna->dev_info->ops->resume(xdna); + + return 0; +} + +static int amdxdna_pmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna =3D pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + + mutex_lock(&xdna->dev_lock); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_suspend(client); + + amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_pmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna =3D pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + int ret; + + XDNA_INFO(xdna, "firmware resuming..."); + mutex_lock(&xdna->dev_lock); + ret =3D amdxdna_dev_resume_nolock(xdna); + if (ret) { + XDNA_ERR(xdna, "resume NPU firmware failed"); + mutex_unlock(&xdna->dev_lock); + return ret; + } + + XDNA_INFO(xdna, "hardware context resuming..."); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_resume(client); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_rpmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna =3D pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret =3D amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime suspend done ret: %d", ret); + return ret; +} + +static int amdxdna_rpmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna =3D pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret =3D amdxdna_dev_resume_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime resume done ret: %d", ret); + return ret; +} + +static const struct dev_pm_ops amdxdna_pm_ops =3D { + SET_SYSTEM_SLEEP_PM_OPS(amdxdna_pmops_suspend, amdxdna_pmops_resume) + SET_RUNTIME_PM_OPS(amdxdna_rpmops_suspend, amdxdna_rpmops_resume, NULL) +}; + static struct pci_driver amdxdna_pci_driver =3D { .name =3D KBUILD_MODNAME, .id_table =3D pci_ids, .probe =3D amdxdna_probe, .remove =3D amdxdna_remove, + .driver.pm =3D &amdxdna_pm_ops, }; =20 module_pci_driver(amdxdna_pci_driver); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 0324e73094b2..01b516743a00 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -28,10 +28,14 @@ struct amdxdna_sched_job; struct amdxdna_dev_ops { int (*init)(struct amdxdna_dev *xdna); void (*fini)(struct amdxdna_dev *xdna); + int (*resume)(struct amdxdna_dev *xdna); + void (*suspend)(struct amdxdna_dev *xdna); int (*hwctx_init)(struct amdxdna_hwctx *hwctx); void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, voi= d *buf, u32 size); void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq= ); + void (*hwctx_suspend)(struct amdxdna_hwctx *hwctx); + void (*hwctx_resume)(struct amdxdna_hwctx *hwctx); int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job, u64 *seq); int (*cmd_wait)(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); }; --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2041.outbound.protection.outlook.com [40.107.220.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53AC61E7C34 for ; Fri, 11 Oct 2024 23:13:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.220.41 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688399; cv=fail; b=Kv43s6oxIzvn4N7joviOjowQIt4s8Fzbnwl8Dl3G0kLd89JkELqpMMj1We2NBqSipJ7CSxeJSJ45ziUwXdoOowYiTeuzfV4y3J0cOo00D5xy2ZNqldN/UcTo2txQoEN1ih5LERb/iOkilFY9Qo4gI3CF+oCGJPCBlLwHt3I3ORQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688399; c=relaxed/simple; bh=lonl8F/msZqYA/JhNxy37dIV2+1sbuu/AWTTbz+RdnE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ltQuk7ucBTZp157gzABBM7EBDkFqq4thRmVwlTu/LyonktctulBo7Qu47iw2zxYqTCN26XGQsbAerowhi/4zIe5taFBfRi9A/rcCkJ4/wl316J3PmYAErasphbpFb7v0NhS+thmhpd0cKX2A4aIexH/xowMuh4JedGKSqO0XlgU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=IebqPj3H; arc=fail smtp.client-ip=40.107.220.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="IebqPj3H" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=C4Af4MSspH8Hh9DBOU/L5RiBrGH8eSGSucirsTHtpSkCZr7312hngq+IB5mLYGkMNBYVn/2zAPYGJhvpW6FdQTbU8gZDVyJm06yW6P0lvdrIypzkUBcOZ6wU85rchOjPTHYXN7o07d51+xS7n7NOoFKcw1CzO5irn5g5d0vKPaWMg7RMxICBQgcdgffb3V7Lkx1ajW6YAhJ5YvUDKNuyvwWQ2YgNqbKTfAoR2u13pyvQ4bsMPVsFP51UXZd8Cpv/Lb3sWr2t8A0wCxK3r8yScm6mjnBiNHXzOBOjT5EukW+3pUHKDaTzPsL1vYuSLlsRtVyI3GeZBSAnev7dlxSPhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0omk7BBydGcUDiYmnPH7y4Nv24rvoOY1GICKExfOMKs=; b=T+OGd6mKY7ev5z4NM5RZbRHGKmsGwBOVNvGoxTz0P/WanmyZfkDTkg/y6X1tTr3FhjqB54PoGvU60m5qcq7nDKHhORbusXCQhKxyWwb9/QK3CCsIR6PRpPM5YDHjofFwbeH+De3LRw0gTz3NzTzjA14gDaeHyTsBlufsXGjd8jWezBSSZgULzx1MP2QyRsUCeiVzazJO1uOqFrdBxcHtbc2/DBJcFgB18ZFCU6W2BWfIJZrevM0HHIbwxDzsw1/JQspaF6XruyebXxG27arEWEf/PuAVhLT3OVeT3BBgTQf0Q3aeE/GVl2mFmB8lyB8l/KP5lq+lnA18zGR+Vtqr5g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0omk7BBydGcUDiYmnPH7y4Nv24rvoOY1GICKExfOMKs=; b=IebqPj3HQJ/NT95U1ngYgCcQk0G3MEBptc9Z7BD20LoWrGqLnVqsls5fHSLAG+4wnGEoPVN1OcVU05qfzQw5PEbStv8uG2ADJ0zjVMb5SSwyi/K+54fa7XhDZq/mbGkve8pL3dNQSBDXlADF7JuDAotWOPKmlcWqWPqZ+9Vno70= Received: from BYAPR05CA0097.namprd05.prod.outlook.com (2603:10b6:a03:e0::38) by DM4PR12MB7743.namprd12.prod.outlook.com (2603:10b6:8:101::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.18; Fri, 11 Oct 2024 23:13:11 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::ee) by BYAPR05CA0097.outlook.office365.com (2603:10b6:a03:e0::38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.9 via Frontend Transport; Fri, 11 Oct 2024 23:13:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:10 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:04 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:03 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 09/10] accel/amdxdna: Add error handling Date: Fri, 11 Oct 2024 16:12:43 -0700 Message-ID: <20241011231244.3182625-10-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: None (SATLEXMB03.amd.com: lizhi.hou@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|DM4PR12MB7743:EE_ X-MS-Office365-Filtering-Correlation-Id: 77dc1739-3463-44fb-5478-08dcea4a4661 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?BRcP3oFPuTd/DfqDa4zYlyTYfvr1pHwV/skzRJ35fTsYxclNn3z1CUie8Q79?= =?us-ascii?Q?+44N5IuqWW/Fh2fH0KL7y6vws708HJYYcZWsqpMRXiNwdeQjHBCGGqednmHh?= =?us-ascii?Q?3CAHbfmAp2mnONRRuHPUgnKCRfjvyDDJRRm6fekilwTLPJBzG83+t/sBhkPW?= =?us-ascii?Q?mMBPA2w7Gy8cRQud52uWjMbvQ7+HVmnUSUTxlEovQyXINQR60kBrWN1PiKsM?= =?us-ascii?Q?cmdYQB/sRGOLhYAaFHRfGYyzzirq9XJpoTDbyjrPZb44QhyKtuamY3Ma67B1?= =?us-ascii?Q?Br65Rz7u7ghZ8u47QkBVzA5LhXUJ6Rdr7Qd1JJP87gzZ8GsnNCtBFCTLOAh9?= =?us-ascii?Q?cD1T7aVZZ1hfT2X4R5YTsCYWY3OvhwPQOVfm766nxu74bp6R1pBoTFPiUYcL?= =?us-ascii?Q?ygYM+4lphEAL4TX8uA0SwPpsEccHLD+m9JpHR/swXSI2511AwWD+oet5L5GE?= =?us-ascii?Q?r2MJ8JUkXxjblhnUaRycWDSr437dQAhyGH5JvDf8QLKIwrmAL+cpBVNRJjVC?= =?us-ascii?Q?I75gcccPxGEx2zDvjgnNqKlrnHl36cVymRxVEgXcgF5+XZlo/qUau8+aQTRS?= =?us-ascii?Q?Cn21GT+t0Wl19ycAzUuH9C2Eh3OcXpZEJfUQYDgXBvMOwk88M/9teNwalaUr?= =?us-ascii?Q?/GykGUHNS6xqJYBNpBZJGnqJmT7llruULMe+Azuo/eUqmHkJlRJ5iRuEmsPF?= =?us-ascii?Q?s6hFps+cvTXRPWubeS3W2qfCSXcY8s6NcgIMAi33xql/vYOe3oSznHIWroUb?= =?us-ascii?Q?V7bm/KNcJVH+IOIxWHlGZpfL+hGXTs+kvi/XJ/oHss686T48HQScf6+O4gc1?= =?us-ascii?Q?d5566WviZrzJ1tfwXFgl1D15YT+LeCxFu/8StCyQXzh8fXYxZHPoyjbab6oh?= =?us-ascii?Q?PM/D1wVEjMXp32WgTaOqK/n1c4VdaaEyliwnpTPZm+esqgpVDWQKcDm7PeAA?= =?us-ascii?Q?CVKUGVKHg5i3Dek8Dkp5HkU4SX7mb4W9Cg6sXcepuex6roVTxCR+xvE0/p98?= =?us-ascii?Q?3d9caIQQ4j2wsMfakSHP3Vt2QLYKsfoBDjaoTQUFunDZKGZxtRB50FLOIwZD?= =?us-ascii?Q?RvRN3SYxsS1NP+V3hSDAB5R3Cj4ZHaKetTxN1QntP3YSjCP1Q91AW0lnsHVF?= =?us-ascii?Q?njh5l2+4tMEdRItrtitgDaKsXRJsUDZHnXZBfG8Xq/+2Xh9apPJeMK0yPlyi?= =?us-ascii?Q?i++tLQTVLjW1ZJTaz8wO48YaDl6NiMP5GXNmrsKlN+J8NMj9brPqdcgNJigA?= =?us-ascii?Q?z8f8oxg7UsrbAfiJjug9xhsdLWYTTmRmS2df0H+BHHl+LTfpt0s+VfydTuEo?= =?us-ascii?Q?3NdcF2x9NCDRlQsLHD8mQjppVRB3KPglyXpSEOaonXTyKg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:10.7156 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 77dc1739-3463-44fb-5478-08dcea4a4661 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB7743 Content-Type: text/plain; charset="utf-8" When there is a hardware error, the NPU firmware notifies the host through a mailbox message. The message includes details of the error, such as the tile and column indexes where the error occurred. The driver starts a thread to handle the NPU error message. The thread stops the clients which are using the column where error occurred. Then the driver resets that column. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou --- drivers/accel/amdxdna/Makefile | 1 + drivers/accel/amdxdna/aie2_error.c | 356 +++++++++++++++++++++++++++ drivers/accel/amdxdna/aie2_message.c | 19 ++ drivers/accel/amdxdna/aie2_pci.c | 32 +++ drivers/accel/amdxdna/aie2_pci.h | 9 + 5 files changed, 417 insertions(+) create mode 100644 drivers/accel/amdxdna/aie2_error.c diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile index a688c378761f..ed6f87910880 100644 --- a/drivers/accel/amdxdna/Makefile +++ b/drivers/accel/amdxdna/Makefile @@ -2,6 +2,7 @@ =20 amdxdna-y :=3D \ aie2_ctx.o \ + aie2_error.o \ aie2_message.o \ aie2_pci.o \ aie2_psp.o \ diff --git a/drivers/accel/amdxdna/aie2_error.c b/drivers/accel/amdxdna/aie= 2_error.c new file mode 100644 index 000000000000..d2787549f3b7 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_error.c @@ -0,0 +1,356 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +struct async_event { + struct amdxdna_dev_hdl *ndev; + struct async_event_msg_resp resp; + struct workqueue_struct *wq; + struct work_struct work; + u8 *buf; + dma_addr_t addr; + u32 size; +}; + +struct async_events { + struct workqueue_struct *wq; + u8 *buf; + dma_addr_t addr; + u32 size; + u32 event_cnt; + struct async_event event[] __counted_by(event_cnt); +}; + +/* + * Below enum, struct and lookup tables are porting from XAIE util header = file. + * + * Below data is defined by AIE device and it is used for decode error mes= sage + * from the device. + */ + +enum aie_module_type { + AIE_MEM_MOD =3D 0, + AIE_CORE_MOD, + AIE_PL_MOD, +}; + +enum aie_error_category { + AIE_ERROR_SATURATION =3D 0, + AIE_ERROR_FP, + AIE_ERROR_STREAM, + AIE_ERROR_ACCESS, + AIE_ERROR_BUS, + AIE_ERROR_INSTRUCTION, + AIE_ERROR_ECC, + AIE_ERROR_LOCK, + AIE_ERROR_DMA, + AIE_ERROR_MEM_PARITY, + /* Unknown is not from XAIE, added for better category */ + AIE_ERROR_UNKNOWN, +}; + +/* Don't pack, unless XAIE side changed */ +struct aie_error { + u8 row; + u8 col; + u32 mod_type; + u8 event_id; +}; + +struct aie_err_info { + u32 err_cnt; + u32 ret_code; + u32 rsvd; + struct aie_error payload[] __counted_by(err_cnt); +}; + +struct aie_event_category { + u8 event_id; + enum aie_error_category category; +}; + +#define EVENT_CATEGORY(id, cat) { id, cat } +static const struct aie_event_category aie_ml_mem_event_cat[] =3D { + EVENT_CATEGORY(88U, AIE_ERROR_ECC), + EVENT_CATEGORY(90U, AIE_ERROR_ECC), + EVENT_CATEGORY(91U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(92U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(93U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(94U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(95U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(96U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(97U, AIE_ERROR_DMA), + EVENT_CATEGORY(98U, AIE_ERROR_DMA), + EVENT_CATEGORY(99U, AIE_ERROR_DMA), + EVENT_CATEGORY(100U, AIE_ERROR_DMA), + EVENT_CATEGORY(101U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_core_event_cat[] =3D { + EVENT_CATEGORY(55U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(56U, AIE_ERROR_STREAM), + EVENT_CATEGORY(57U, AIE_ERROR_STREAM), + EVENT_CATEGORY(58U, AIE_ERROR_BUS), + EVENT_CATEGORY(59U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(60U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(62U, AIE_ERROR_ECC), + EVENT_CATEGORY(64U, AIE_ERROR_ECC), + EVENT_CATEGORY(65U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(66U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(67U, AIE_ERROR_LOCK), + EVENT_CATEGORY(70U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(71U, AIE_ERROR_STREAM), + EVENT_CATEGORY(72U, AIE_ERROR_BUS), +}; + +static const struct aie_event_category aie_ml_mem_tile_event_cat[] =3D { + EVENT_CATEGORY(130U, AIE_ERROR_ECC), + EVENT_CATEGORY(132U, AIE_ERROR_ECC), + EVENT_CATEGORY(133U, AIE_ERROR_DMA), + EVENT_CATEGORY(134U, AIE_ERROR_DMA), + EVENT_CATEGORY(135U, AIE_ERROR_STREAM), + EVENT_CATEGORY(136U, AIE_ERROR_STREAM), + EVENT_CATEGORY(137U, AIE_ERROR_STREAM), + EVENT_CATEGORY(138U, AIE_ERROR_BUS), + EVENT_CATEGORY(139U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_shim_tile_event_cat[] =3D { + EVENT_CATEGORY(64U, AIE_ERROR_BUS), + EVENT_CATEGORY(65U, AIE_ERROR_STREAM), + EVENT_CATEGORY(66U, AIE_ERROR_STREAM), + EVENT_CATEGORY(67U, AIE_ERROR_BUS), + EVENT_CATEGORY(68U, AIE_ERROR_BUS), + EVENT_CATEGORY(69U, AIE_ERROR_BUS), + EVENT_CATEGORY(70U, AIE_ERROR_BUS), + EVENT_CATEGORY(71U, AIE_ERROR_BUS), + EVENT_CATEGORY(72U, AIE_ERROR_DMA), + EVENT_CATEGORY(73U, AIE_ERROR_DMA), + EVENT_CATEGORY(74U, AIE_ERROR_LOCK), +}; + +static enum aie_error_category +aie_get_error_category(u8 row, u8 event_id, enum aie_module_type mod_type) +{ + const struct aie_event_category *lut; + int num_entry; + int i; + + switch (mod_type) { + case AIE_PL_MOD: + lut =3D aie_ml_shim_tile_event_cat; + num_entry =3D ARRAY_SIZE(aie_ml_shim_tile_event_cat); + break; + case AIE_CORE_MOD: + lut =3D aie_ml_core_event_cat; + num_entry =3D ARRAY_SIZE(aie_ml_core_event_cat); + break; + case AIE_MEM_MOD: + if (row =3D=3D 1) { + lut =3D aie_ml_mem_tile_event_cat; + num_entry =3D ARRAY_SIZE(aie_ml_mem_tile_event_cat); + } else { + lut =3D aie_ml_mem_event_cat; + num_entry =3D ARRAY_SIZE(aie_ml_mem_event_cat); + } + break; + default: + return AIE_ERROR_UNKNOWN; + } + + for (i =3D 0; i < num_entry; i++) { + if (event_id !=3D lut[i].event_id) + continue; + + return lut[i].category; + } + + return AIE_ERROR_UNKNOWN; +} + +static u32 aie2_error_backtrack(struct amdxdna_dev_hdl *ndev, void *err_in= fo, u32 num_err) +{ + struct aie_error *errs =3D err_info; + u32 err_col =3D 0; /* assume that AIE has less than 32 columns */ + int i; + + /* Get err column bitmap */ + for (i =3D 0; i < num_err; i++) { + struct aie_error *err =3D &errs[i]; + enum aie_error_category cat; + + cat =3D aie_get_error_category(err->row, err->event_id, err->mod_type); + XDNA_ERR(ndev->xdna, "Row: %d, Col: %d, module %d, event ID %d, category= %d", + err->row, err->col, err->mod_type, + err->event_id, cat); + + if (err->col >=3D 32) { + XDNA_WARN(ndev->xdna, "Invalid column number"); + break; + } + + err_col |=3D (1 << err->col); + } + + return err_col; +} + +static int aie2_error_async_cb(void *handle, const u32 *data, size_t size) +{ + struct async_event_msg_resp *resp; + struct async_event *e =3D handle; + + if (data) { + resp =3D (struct async_event_msg_resp *)data; + e->resp.type =3D resp->type; + wmb(); /* Update status in the end, so that no lock for here */ + e->resp.status =3D resp->status; + } + queue_work(e->wq, &e->work); + return 0; +} + +static int aie2_error_event_send(struct async_event *e) +{ + drm_clflush_virt_range(e->buf, e->size); /* device can access */ + return aie2_register_asyn_event_msg(e->ndev, e->addr, e->size, e, + aie2_error_async_cb); +} + +static void aie2_error_worker(struct work_struct *err_work) +{ + struct aie_err_info *info; + struct amdxdna_dev *xdna; + struct async_event *e; + u32 max_err; + u32 err_col; + + e =3D container_of(err_work, struct async_event, work); + + xdna =3D e->ndev->xdna; + + if (e->resp.status =3D=3D MAX_AIE2_STATUS_CODE) + return; + + e->resp.status =3D MAX_AIE2_STATUS_CODE; + + print_hex_dump_debug("AIE error: ", DUMP_PREFIX_OFFSET, 16, 4, + e->buf, 0x100, false); + + info =3D (struct aie_err_info *)e->buf; + XDNA_DBG(xdna, "Error count %d return code %d", info->err_cnt, info->ret_= code); + + max_err =3D (e->size - sizeof(*info)) / sizeof(struct aie_error); + if (unlikely(info->err_cnt > max_err)) { + WARN_ONCE(1, "Error count too large %d\n", info->err_cnt); + return; + } + err_col =3D aie2_error_backtrack(e->ndev, info->payload, info->err_cnt); + if (!err_col) { + XDNA_WARN(xdna, "Did not get error column"); + return; + } + + mutex_lock(&xdna->dev_lock); + /* Re-sent this event to firmware */ + if (aie2_error_event_send(e)) + XDNA_WARN(xdna, "Unable to register async event"); + mutex_unlock(&xdna->dev_lock); +} + +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna =3D ndev->xdna; + struct async_event *e; + int i, ret; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + for (i =3D 0; i < ndev->async_events->event_cnt; i++) { + e =3D &ndev->async_events->event[i]; + ret =3D aie2_error_event_send(e); + if (ret) + return ret; + } + + return 0; +} + +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna =3D ndev->xdna; + struct async_events *events; + + events =3D ndev->async_events; + destroy_workqueue(events->wq); + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); + kfree(events); +} + +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna =3D ndev->xdna; + u32 total_col =3D ndev->total_col; + u32 total_size =3D ASYNC_BUF_SIZE * total_col; + struct async_events *events; + int i, ret; + + events =3D kzalloc(struct_size(events, event, total_col), GFP_KERNEL); + if (!events) + return -ENOMEM; + + events->buf =3D dma_alloc_noncoherent(xdna->ddev.dev, total_size, &events= ->addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!events->buf) { + ret =3D -ENOMEM; + goto free_events; + } + events->size =3D total_size; + events->event_cnt =3D total_col; + + events->wq =3D alloc_ordered_workqueue("async_wq", 0); + if (!events->wq) { + ret =3D -ENOMEM; + goto free_buf; + } + + for (i =3D 0; i < events->event_cnt; i++) { + struct async_event *e =3D &events->event[i]; + u32 offset =3D i * ASYNC_BUF_SIZE; + + e->ndev =3D ndev; + e->wq =3D events->wq; + e->buf =3D &events->buf[offset]; + e->addr =3D events->addr + offset; + e->size =3D ASYNC_BUF_SIZE; + e->resp.status =3D MAX_AIE2_STATUS_CODE; + INIT_WORK(&e->work, aie2_error_worker); + } + + ndev->async_events =3D events; + + XDNA_DBG(xdna, "Async event count %d, buf total size 0x%x", + events->event_cnt, events->size); + return 0; + +free_buf: + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); +free_events: + kfree(events); + return ret; +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index 3dc4a9a8571e..6e65b611b691 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -307,6 +307,25 @@ int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u3= 2 context_id, u64 addr, u6 return 0; } =20 +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t = addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)) +{ + struct async_event_msg_req req =3D { 0 }; + struct xdna_mailbox_msg msg =3D { + .send_data =3D (u8 *)&req, + .send_size =3D sizeof(req), + .handle =3D handle, + .opcode =3D MSG_OP_REGISTER_ASYNC_EVENT_MSG, + .notify_cb =3D cb, + }; + + req.buf_addr =3D addr; + req.buf_size =3D size; + + XDNA_DBG(ndev->xdna, "Register addr 0x%llx size 0x%x", addr, size); + return xdna_mailbox_send_msg(ndev->mgmt_chann, &msg, TX_TIMEOUT); +} + int aie2_config_cu(struct amdxdna_hwctx *hwctx) { struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index 7a1729bed62d..ff872f2389f7 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -180,6 +180,15 @@ static int aie2_mgmt_fw_init(struct amdxdna_dev_hdl *n= dev) return ret; } =20 + if (!ndev->async_events) + return 0; + + ret =3D aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Send async events failed"); + return ret; + } + return 0; } =20 @@ -472,9 +481,30 @@ static int aie2_init(struct amdxdna_dev *xdna) goto stop_hw; } =20 + ret =3D aie2_error_async_events_alloc(ndev); + if (ret) { + XDNA_ERR(xdna, "Allocate async events failed, ret %d", ret); + goto stop_hw; + } + + ret =3D aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(xdna, "Send async events failed, ret %d", ret); + goto async_event_free; + } + + /* Issue a command to make sure firmware handled async events */ + ret =3D aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(xdna, "Re-query firmware version failed"); + goto async_event_free; + } + release_firmware(fw); return 0; =20 +async_event_free: + aie2_error_async_events_free(ndev); stop_hw: aie2_hw_stop(xdna); disable_sva: @@ -490,8 +520,10 @@ static int aie2_init(struct amdxdna_dev *xdna) static void aie2_fini(struct amdxdna_dev *xdna) { struct pci_dev *pdev =3D to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev =3D xdna->dev_handle; =20 aie2_hw_stop(xdna); + aie2_error_async_events_free(ndev); iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); pci_free_irq_vectors(pdev); } diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index bbcc3d85e13c..9634a7588650 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -164,6 +164,7 @@ struct amdxdna_dev_hdl { /* Mailbox and the management channel */ struct mailbox *mbox; struct mailbox_channel *mgmt_chann; + struct async_events *async_events; }; =20 #define DEFINE_BAR_OFFSET(reg_name, bar, reg_addr) \ @@ -204,6 +205,12 @@ struct psp_device *aie2m_psp_create(struct drm_device = *ddev, struct psp_config * int aie2_psp_start(struct psp_device *psp); void aie2_psp_stop(struct psp_device *psp); =20 +/* aie2_error.c */ +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev); +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_msg_thread(void *data); + /* aie2_message.c */ int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev); int aie2_resume_fw(struct amdxdna_dev_hdl *ndev); @@ -218,6 +225,8 @@ int aie2_query_firmware_version(struct amdxdna_dev_hdl = *ndev, int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx= *hwctx); int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size); +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t = addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)); int aie2_config_cu(struct amdxdna_hwctx *hwctx); int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, int (*notify_cb)(void *, const u32 *, size_t)); --=20 2.34.1 From nobody Wed Nov 27 06:28:33 2024 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2054.outbound.protection.outlook.com [40.107.237.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF28A1E7643 for ; Fri, 11 Oct 2024 23:13:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.237.54 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688396; cv=fail; b=L/YMr958EHWlDbIHx2Tvt1Zr88L7ziS9FEOMWtm15HikHF52O/iGAMYAirgFhVcJxKboBI6nk/NuPzvHFfUcgPeC1rcqHm1UKx8A3MssSlMSXBzv77C5q8YyN7KmewBKzQ3XmcD74v2j1iYOSdgBD2qYJ4PioWPBbytH5ySnFcU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688396; c=relaxed/simple; bh=b9BJvqaOT2/7V3wPpAqkkPs83k2iJc1g2GH/7BgFfQk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hwuHIi+GAyV99fdyDhPuEi/VC7A3mtCMwKCOk6O3uhnEsxYzYwtdKT/w9d7lgSPm2PXOLn87vozX/FEfF1WFxS9etjQPyhyg7faowh558PIOO9ja6MpbPu0e8xNodglnXg10LbItyGqkOaT/m+1YCKFgVyhZrUXMsVVq2J0vHQA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=GRFBjTuN; arc=fail smtp.client-ip=40.107.237.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="GRFBjTuN" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=kKlVqMaoTchGcqNLi2Ptk1O1S79wevtsDYV59kOmQN/bZKqNOaQdkF1pNh/BIpQBlgGknPb+qJAP5hyyBVJQ8S9T9hGRUmVgu6MAhg0V8nrLODTnCxizkVO/hxD3j9WXDmneK+BIe88yzrnVEKY1GpxMGzpgkWo7yumC1rW/CwjRxDV3QvtG6Z+JtjX48rxNQ451uCDJK9JRQYmO2zaefhy8fZaA7fmgjNXILFxKIP6MVWGwyek3ru6/X+XAS7vGeFudGIiWEe7iH5/CHkFtRadKEh+RdEVyRXEQ/IroUfFfAHS31Z2VwsadZfIhjq3XZMHuo+Ams+S/n0KBIil/Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UA9PyeDXqtdSD54fpJsmfgK+naoxw3wtDmndXFSyDKU=; b=dMaTTJ/2mH18DBDHOrNxLFgO+A/ieaZadPBJ2i+99hTsPFXam8mxsaFPE12WnunuRIuqWZo7v83WWlrTvq7Imry69+uKxIw5vlqWPT4ESznrUxelRDVRA0YYLam3DJGEU+CDRFtQWnB6r0djPU6PPAocCPnp7ovF7eyCfuj2Og9Jc8OapZzbJuD9Llbv7WAf2g8EZrBbwXf0Hky+t8CP4xbw+JRsqImklR2k4lUn1+GVlFbZiHqj5OFBak9EY22YUBgQtZ8w9SUaZ8+BFQHZhbI9d8RjPMwoFqeTpfWuylDUAYF2rccBOQx5L1DiS508Sds42P6SKa/KpPXupGvAyg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UA9PyeDXqtdSD54fpJsmfgK+naoxw3wtDmndXFSyDKU=; b=GRFBjTuNU5jDrQGsjkxFkN9E/rnJ3a3Oth/bFKaWE1fYjdMc24AusopeK79uXP/eHNRvyxepd81TC5tul7Rq1SPnLqEzuscMhU+8+u7i39bWf9PTxPQF8l8zXNwNqmdthF+5nwNYLpE8SopIZERFv+Z0m8CB6UBCaG3yE0y/hJE= Received: from SJ0PR03CA0014.namprd03.prod.outlook.com (2603:10b6:a03:33a::19) by DS7PR12MB8201.namprd12.prod.outlook.com (2603:10b6:8:ef::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.16; Fri, 11 Oct 2024 23:13:07 +0000 Received: from CO1PEPF000044FA.namprd21.prod.outlook.com (2603:10b6:a03:33a:cafe::2f) by SJ0PR03CA0014.outlook.office365.com (2603:10b6:a03:33a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.17 via Frontend Transport; Fri, 11 Oct 2024 23:13:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044FA.mail.protection.outlook.com (10.167.241.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:06 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:06 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:05 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:04 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 10/10] accel/amdxdna: Add query functions Date: Fri, 11 Oct 2024 16:12:44 -0700 Message-ID: <20241011231244.3182625-11-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044FA:EE_|DS7PR12MB8201:EE_ X-MS-Office365-Filtering-Correlation-Id: 4b6dc335-487e-4e5f-9d9f-08dcea4a441f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|36860700013|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?TD0hWSk8dnizH9BBDpWvM6XOWpO+GZVR8HIalmElD5MWoDQieYtscRvQJhzA?= =?us-ascii?Q?lYKXRCs5XDH2klmnO/5zmaaa40PzfgdPAXopmc4bnvShsUn/4/m0y8lZNPGt?= =?us-ascii?Q?A3xYQoPChX2wz/pXxmI4lYnsgQAbeaBnFM+YaxmMcTo8P0tmea1WrzMCrs0Z?= =?us-ascii?Q?xHRTmyuCPnQCJ2tihEy9ktKD6jkdg2c3NPOZQ3tKNZqOPOJ6V2qcrPwB8bXo?= =?us-ascii?Q?nYsGqlLqm/IW5ZPuS14Mlw4k+SRQeUZEcXBNrHo4nzY5MW0I/oAqaM+YqajN?= =?us-ascii?Q?APgf/xEFdat2ayDZTrDYt70YLYJgLjR/hgaeKKhdqJxhEsWQ8RjQL/SSlKMD?= =?us-ascii?Q?TYDGYfZKeQG6KrEEUwEDcMX0sFR4li9tbttg+WiAZLPz4Ohi6f5d2EJhTHBp?= =?us-ascii?Q?SfQEccSKAJKxD3nKHuwCFC2/NTyYRRhkjJKTVMlFgH9YUi5X/zKd+HDfEwbf?= =?us-ascii?Q?fFs6exi8cYNZggcXL9NPQMyS3j3S/6HRo7hYKs9IZX5H2S00DvW49nmMjBFQ?= =?us-ascii?Q?F7wOMF3ptx6gQTx7MwfT0kdBVe87DeP5r2H6z9CIfjPKNj1MnOsZtZqP5wh2?= =?us-ascii?Q?T+nSmU3N/VbfqA6QY23NvjC4MWxK5a7mic6hDnDrbdMUZ/grPEthC1xDmMB9?= =?us-ascii?Q?OoQSThjfochUZTbynuvyNwYtBh9uh91Qq7uxJc4lhkDQRwCGPfmGYIlxFb/9?= =?us-ascii?Q?yt5Gs0C2TvHQyxmnn07c3G+0IdqcJMHredc0vYt3iZRpo4z5RqmRjXGCXt8T?= =?us-ascii?Q?jzg2577gCq9/E4VRHhzmNwwmcLFdBhdDH70CAV62tmXdCPCGFu7eQS+zb86s?= =?us-ascii?Q?XzUjZi3GMCK91CVi2gIEPPHCSO6nL+oGhWRWQQ8CVgJj2MMxHUr02ut0PMgc?= =?us-ascii?Q?nXIq5uaKtddd7Ui4r/wMGH5AYIwWf3Alv8RxHy59sLN+rsaHjMZHNTviUQqc?= =?us-ascii?Q?ZX/zhrO0MYSGw8d5DN0W3GjcFw0stFlQscJmMHA+SN3TvydyMLZbHZ7gY0w2?= =?us-ascii?Q?TNhkF1HrpBoEuBXH20LwXp4nU2tTsuuKQ97JU5FRxUaMjGSngmzp9NHxhogl?= =?us-ascii?Q?fINZjo8w72HYbX3ZuOSThS0KM+99uWU+Q1ZY3kLPAQp04tO3XZV3vnd1gl5R?= =?us-ascii?Q?Sw7I+piNThe+XSVnKQlRWEcLtVMEp2IowGbG6RmYqQDKLmOyon3aLt5biLGX?= =?us-ascii?Q?neu0s0UcWvoC34StepkmEvD2kpf4YUea1rQMew5ChomTpzxMyPcGw0s2J17B?= =?us-ascii?Q?LyO5x2epeXbUJ0+EKhhEezA9gVKY8Eeo5C2Gkf+ncJZhW54vcgQUJm5+z51G?= =?us-ascii?Q?50tNzx8UoKEk0RGF4MTggsYCradd0cc1OMWsyyI+KyEXv/adaK+2eCbtLaNA?= =?us-ascii?Q?StPdFB1WqzVLNv9yqGgsZeRKwFjP?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(36860700013)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:06.9451 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4b6dc335-487e-4e5f-9d9f-08dcea4a441f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044FA.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB8201 Content-Type: text/plain; charset="utf-8" Add GET_INFO ioctl to retrieve hardware information, including AIE, clock, hardware context etc. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou Reviewed-by: Jeffrey Hugo --- drivers/accel/amdxdna/aie2_message.c | 65 +++++++ drivers/accel/amdxdna/aie2_pci.c | 222 ++++++++++++++++++++++++ drivers/accel/amdxdna/aie2_pci.h | 1 + drivers/accel/amdxdna/amdxdna_pci_drv.c | 19 ++ drivers/accel/amdxdna/amdxdna_pci_drv.h | 3 + include/uapi/drm/amdxdna_accel.h | 166 ++++++++++++++++++ 6 files changed, 476 insertions(+) diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index 6e65b611b691..11686f3c4464 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -307,6 +307,71 @@ int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u3= 2 context_id, u64 addr, u6 return 0; } =20 +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char __user *buf, + u32 size, u32 *cols_filled) +{ + DECLARE_AIE2_MSG(aie_column_info, MSG_OP_QUERY_COL_STATUS); + struct amdxdna_dev *xdna =3D ndev->xdna; + struct amdxdna_client *client; + struct amdxdna_hwctx *hwctx; + dma_addr_t dma_addr; + u32 aie_bitmap =3D 0; + u8 *buff_addr; + int next =3D 0; + int ret, idx; + + buff_addr =3D dma_alloc_noncoherent(xdna->ddev.dev, size, &dma_addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!buff_addr) + return -ENOMEM; + + /* Go through each hardware context and mark the AIE columns that are act= ive */ + list_for_each_entry(client, &xdna->client_list, node) { + idx =3D srcu_read_lock(&client->hwctx_srcu); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) + aie_bitmap |=3D amdxdna_hwctx_col_map(hwctx); + srcu_read_unlock(&client->hwctx_srcu, idx); + } + + *cols_filled =3D 0; + req.dump_buff_addr =3D dma_addr; + req.dump_buff_size =3D size; + req.num_cols =3D hweight32(aie_bitmap); + req.aie_bitmap =3D aie_bitmap; + + drm_clflush_virt_range(buff_addr, size); /* device can access */ + ret =3D aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(xdna, "Error during NPU query, status %d", ret); + goto fail; + } + + if (resp.status !=3D AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "Query NPU status failed, status 0x%x", resp.status); + ret =3D -EINVAL; + goto fail; + } + XDNA_DBG(xdna, "Query NPU status completed"); + + if (size < resp.size) { + ret =3D -EINVAL; + XDNA_ERR(xdna, "Bad buffer size. Available: %u. Needs: %u", size, resp.s= ize); + goto fail; + } + + if (copy_to_user(buf, buff_addr, resp.size)) { + ret =3D -EFAULT; + XDNA_ERR(xdna, "Failed to copy NPU status to user space"); + goto fail; + } + + *cols_filled =3D aie_bitmap; + +fail: + dma_free_noncoherent(xdna->ddev.dev, size, buff_addr, dma_addr, DMA_FROM_= DEVICE); + return ret; +} + int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t = addr, u32 size, void *handle, int (*cb)(void*, const u32 *, size_t)) { diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index ff872f2389f7..385a5939944d 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include #include #include @@ -528,11 +529,232 @@ static void aie2_fini(struct amdxdna_dev *xdna) pci_free_irq_vectors(pdev); } =20 +static int aie2_get_aie_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_status status; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret; + + ndev =3D xdna->dev_handle; + if (copy_from_user(&status, u64_to_user_ptr(args->buffer), sizeof(status)= )) { + XDNA_ERR(xdna, "Failed to copy AIE request into kernel"); + return -EFAULT; + } + + if (ndev->metadata.cols * ndev->metadata.size < status.buffer_size) { + XDNA_ERR(xdna, "Invalid buffer size. Given Size: %u. Need Size: %u.", + status.buffer_size, ndev->metadata.cols * ndev->metadata.size); + return -EINVAL; + } + + ret =3D aie2_query_status(ndev, u64_to_user_ptr(status.buffer), + status.buffer_size, &status.cols_filled); + if (ret) { + XDNA_ERR(xdna, "Failed to get AIE status info. Ret: %d", ret); + return ret; + } + + if (copy_to_user(u64_to_user_ptr(args->buffer), &status, sizeof(status)))= { + XDNA_ERR(xdna, "Failed to copy AIE request info to user space"); + return -EFAULT; + } + + return 0; +} + +static int aie2_get_aie_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_metadata *meta; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret =3D 0; + + ndev =3D xdna->dev_handle; + meta =3D kzalloc(sizeof(*meta), GFP_KERNEL); + if (!meta) + return -ENOMEM; + + meta->col_size =3D ndev->metadata.size; + meta->cols =3D ndev->metadata.cols; + meta->rows =3D ndev->metadata.rows; + + meta->version.major =3D ndev->metadata.version.major; + meta->version.minor =3D ndev->metadata.version.minor; + + meta->core.row_count =3D ndev->metadata.core.row_count; + meta->core.row_start =3D ndev->metadata.core.row_start; + meta->core.dma_channel_count =3D ndev->metadata.core.dma_channel_count; + meta->core.lock_count =3D ndev->metadata.core.lock_count; + meta->core.event_reg_count =3D ndev->metadata.core.event_reg_count; + + meta->mem.row_count =3D ndev->metadata.mem.row_count; + meta->mem.row_start =3D ndev->metadata.mem.row_start; + meta->mem.dma_channel_count =3D ndev->metadata.mem.dma_channel_count; + meta->mem.lock_count =3D ndev->metadata.mem.lock_count; + meta->mem.event_reg_count =3D ndev->metadata.mem.event_reg_count; + + meta->shim.row_count =3D ndev->metadata.shim.row_count; + meta->shim.row_start =3D ndev->metadata.shim.row_start; + meta->shim.dma_channel_count =3D ndev->metadata.shim.dma_channel_count; + meta->shim.lock_count =3D ndev->metadata.shim.lock_count; + meta->shim.event_reg_count =3D ndev->metadata.shim.event_reg_count; + + if (copy_to_user(u64_to_user_ptr(args->buffer), meta, sizeof(*meta))) + ret =3D -EFAULT; + + kfree(meta); + return ret; +} + +static int aie2_get_aie_version(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_version version; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_dev_hdl *ndev; + + ndev =3D xdna->dev_handle; + version.major =3D ndev->version.major; + version.minor =3D ndev->version.minor; + + if (copy_to_user(u64_to_user_ptr(args->buffer), &version, sizeof(version)= )) + return -EFAULT; + + return 0; +} + +static int aie2_get_clock_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_clock_metadata *clock; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret =3D 0; + + ndev =3D xdna->dev_handle; + clock =3D kzalloc(sizeof(*clock), GFP_KERNEL); + if (!clock) + return -ENOMEM; + + memcpy(clock->mp_npu_clock.name, ndev->mp_npu_clock.name, + sizeof(clock->mp_npu_clock.name)); + clock->mp_npu_clock.freq_mhz =3D ndev->mp_npu_clock.freq_mhz; + memcpy(clock->h_clock.name, ndev->h_clock.name, sizeof(clock->h_clock.nam= e)); + clock->h_clock.freq_mhz =3D ndev->h_clock.freq_mhz; + + if (copy_to_user(u64_to_user_ptr(args->buffer), clock, sizeof(*clock))) + ret =3D -EFAULT; + + kfree(clock); + return ret; +} + +static int aie2_get_hwctx_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_hwctx __user *buf; + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_drm_query_hwctx *tmp; + struct amdxdna_client *tmp_client; + struct amdxdna_hwctx *hwctx; + bool overflow =3D false; + u32 req_bytes =3D 0; + u32 hw_i =3D 0; + int ret =3D 0; + int next; + int idx; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + + tmp =3D kzalloc(sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + buf =3D u64_to_user_ptr(args->buffer); + list_for_each_entry(tmp_client, &xdna->client_list, node) { + idx =3D srcu_read_lock(&tmp_client->hwctx_srcu); + next =3D 0; + idr_for_each_entry_continue(&tmp_client->hwctx_idr, hwctx, next) { + req_bytes +=3D sizeof(*tmp); + if (args->buffer_size < req_bytes) { + /* Continue iterating to get the required size */ + overflow =3D true; + continue; + } + + memset(tmp, 0, sizeof(*tmp)); + tmp->pid =3D tmp_client->pid; + tmp->context_id =3D hwctx->id; + tmp->start_col =3D hwctx->start_col; + tmp->num_col =3D hwctx->num_col; + tmp->command_submissions =3D hwctx->priv->seq; + tmp->command_completions =3D hwctx->priv->completed; + + if (copy_to_user(&buf[hw_i], tmp, sizeof(*tmp))) { + ret =3D -EFAULT; + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + goto out; + } + hw_i++; + } + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + } + + if (overflow) { + XDNA_ERR(xdna, "Invalid buffer size. Given: %u Need: %u.", + args->buffer_size, req_bytes); + ret =3D -EINVAL; + } + +out: + kfree(tmp); + args->buffer_size =3D req_bytes; + return ret; +} + +static int aie2_get_info(struct amdxdna_client *client, struct amdxdna_drm= _get_info *args) +{ + struct amdxdna_dev *xdna =3D client->xdna; + int ret, idx; + + if (!drm_dev_enter(&xdna->ddev, &idx)) + return -ENODEV; + + switch (args->param) { + case DRM_AMDXDNA_QUERY_AIE_STATUS: + ret =3D aie2_get_aie_status(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_METADATA: + ret =3D aie2_get_aie_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_VERSION: + ret =3D aie2_get_aie_version(client, args); + break; + case DRM_AMDXDNA_QUERY_CLOCK_METADATA: + ret =3D aie2_get_clock_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_HW_CONTEXTS: + ret =3D aie2_get_hwctx_status(client, args); + break; + default: + XDNA_ERR(xdna, "Not supported request parameter %u", args->param); + ret =3D -EOPNOTSUPP; + } + XDNA_DBG(xdna, "Got param %d", args->param); + + drm_dev_exit(idx); + return ret; +} + const struct amdxdna_dev_ops aie2_ops =3D { .init =3D aie2_init, .fini =3D aie2_fini, .resume =3D aie2_hw_start, .suspend =3D aie2_hw_stop, + .get_aie_info =3D aie2_get_info, .hwctx_init =3D aie2_hwctx_init, .hwctx_fini =3D aie2_hwctx_fini, .hwctx_config =3D aie2_hwctx_config, diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 9634a7588650..734499bfb9f7 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -225,6 +225,7 @@ int aie2_query_firmware_version(struct amdxdna_dev_hdl = *ndev, int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx= *hwctx); int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size); +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char *buf, u32 size, u= 32 *cols_filled); int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t = addr, u32 size, void *handle, int (*cb)(void*, const u32 *, size_t)); int aie2_config_cu(struct amdxdna_hwctx *hwctx); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 92113d83e861..b8cb666e18c2 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -143,6 +143,23 @@ static int amdxdna_flush(struct file *f, fl_owner_t id) return 0; } =20 +static int amdxdna_drm_get_info_ioctl(struct drm_device *dev, void *data, = struct drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_drm_get_info *args =3D data; + int ret; + + if (!xdna->dev_info->ops->get_aie_info) + return -EOPNOTSUPP; + + XDNA_DBG(xdna, "Request parameter %u", args->param); + mutex_lock(&xdna->dev_lock); + ret =3D xdna->dev_info->ops->get_aie_info(client, args); + mutex_unlock(&xdna->dev_lock); + return ret; +} + static const struct drm_ioctl_desc amdxdna_drm_ioctls[] =3D { /* Context */ DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_HWCTX, amdxdna_drm_create_hwctx_ioctl, 0= ), @@ -155,6 +172,8 @@ static const struct drm_ioctl_desc amdxdna_drm_ioctls[]= =3D { /* Exectuion */ DRM_IOCTL_DEF_DRV(AMDXDNA_EXEC_CMD, amdxdna_drm_submit_cmd_ioctl, 0), DRM_IOCTL_DEF_DRV(AMDXDNA_WAIT_CMD, amdxdna_drm_wait_cmd_ioctl, 0), + /* AIE hardware */ + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_INFO, amdxdna_drm_get_info_ioctl, 0), }; =20 static const struct file_operations amdxdna_fops =3D { diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 01b516743a00..0e1f566f8ca8 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -17,7 +17,9 @@ =20 extern const struct drm_driver amdxdna_drm_drv; =20 +struct amdxdna_client; struct amdxdna_dev; +struct amdxdna_drm_get_info; struct amdxdna_gem_obj; struct amdxdna_hwctx; struct amdxdna_sched_job; @@ -38,6 +40,7 @@ struct amdxdna_dev_ops { void (*hwctx_resume)(struct amdxdna_hwctx *hwctx); int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job, u64 *seq); int (*cmd_wait)(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); + int (*get_aie_info)(struct amdxdna_client *client, struct amdxdna_drm_get= _info *args); }; =20 /* diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h index 08f3ec7146ab..a7e23c6de3da 100644 --- a/include/uapi/drm/amdxdna_accel.h +++ b/include/uapi/drm/amdxdna_accel.h @@ -32,6 +32,7 @@ enum amdxdna_drm_ioctl_id { DRM_AMDXDNA_SYNC_BO, DRM_AMDXDNA_EXEC_CMD, DRM_AMDXDNA_WAIT_CMD, + DRM_AMDXDNA_GET_INFO, }; =20 /** @@ -252,6 +253,167 @@ struct amdxdna_drm_wait_cmd { __u64 seq; }; =20 +/** + * struct amdxdna_drm_query_aie_status - Query the status of the AIE hardw= are + * @buffer: The user space buffer that will return the AIE status. + * @buffer_size: The size of the user space buffer. + * @cols_filled: A bitmap of AIE columns whose data has been returned in t= he buffer. + */ +struct amdxdna_drm_query_aie_status { + __u64 buffer; /* out */ + __u32 buffer_size; /* in */ + __u32 cols_filled; /* out */ +}; + +/** + * struct amdxdna_drm_query_aie_version - Query the version of the AIE har= dware + * @major: The major version number. + * @minor: The minor version number. + */ +struct amdxdna_drm_query_aie_version { + __u32 major; /* out */ + __u32 minor; /* out */ +}; + +/** + * struct amdxdna_drm_query_aie_tile_metadata - Query the metadata of AIE = tile (core, mem, shim) + * @row_count: The number of rows. + * @row_start: The starting row number. + * @dma_channel_count: The number of dma channels. + * @lock_count: The number of locks. + * @event_reg_count: The number of events. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_aie_tile_metadata { + __u16 row_count; + __u16 row_start; + __u16 dma_channel_count; + __u16 lock_count; + __u16 event_reg_count; + __u16 pad[3]; +}; + +/** + * struct amdxdna_drm_query_aie_metadata - Query the metadata of the AIE h= ardware + * @col_size: The size of a column in bytes. + * @cols: The total number of columns. + * @rows: The total number of rows. + * @version: The version of the AIE hardware. + * @core: The metadata for all core tiles. + * @mem: The metadata for all mem tiles. + * @shim: The metadata for all shim tiles. + */ +struct amdxdna_drm_query_aie_metadata { + __u32 col_size; + __u16 cols; + __u16 rows; + struct amdxdna_drm_query_aie_version version; + struct amdxdna_drm_query_aie_tile_metadata core; + struct amdxdna_drm_query_aie_tile_metadata mem; + struct amdxdna_drm_query_aie_tile_metadata shim; +}; + +/** + * struct amdxdna_drm_query_clock - Metadata for a clock + * @name: The clock name. + * @freq_mhz: The clock frequency. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_clock { + __u8 name[16]; + __u32 freq_mhz; + __u32 pad; +}; + +/** + * struct amdxdna_drm_query_clock_metadata - Query metadata for clocks + * @mp_npu_clock: The metadata for MP-NPU clock. + * @h_clock: The metadata for H clock. + */ +struct amdxdna_drm_query_clock_metadata { + struct amdxdna_drm_query_clock mp_npu_clock; + struct amdxdna_drm_query_clock h_clock; +}; + +enum amdxdna_sensor_type { + AMDXDNA_SENSOR_TYPE_POWER +}; + +/** + * struct amdxdna_drm_query_sensor - The data for single sensor. + * @label: The name for a sensor. + * @input: The current value of the sensor. + * @max: The maximum value possible for the sensor. + * @average: The average value of the sensor. + * @highest: The highest recorded sensor value for this driver load for th= e sensor. + * @status: The sensor status. + * @units: The sensor units. + * @unitm: Translates value member variables into the correct unit via (po= w(10, unitm) * value). + * @type: The sensor type from enum amdxdna_sensor_type. + * @pad: Structure padding. + */ +struct amdxdna_drm_query_sensor { + __u8 label[64]; + __u32 input; + __u32 max; + __u32 average; + __u32 highest; + __u8 status[64]; + __u8 units[16]; + __s8 unitm; + __u8 type; + __u8 pad[6]; +}; + +/** + * struct amdxdna_drm_query_hwctx - The data for single context. + * @context_id: The ID for this context. + * @start_col: The starting column for the partition assigned to this cont= ext. + * @num_col: The number of columns in the partition assigned to this conte= xt. + * @pad: Structure padding. + * @pid: The Process ID of the process that created this context. + * @command_submissions: The number of commands submitted to this context. + * @command_completions: The number of commands completed by this context. + * @migrations: The number of times this context has been moved to a diffe= rent partition. + * @preemptions: The number of times this context has been preempted by an= other context in the + * same partition. + * @errors: The errors for this context. + */ +struct amdxdna_drm_query_hwctx { + __u32 context_id; + __u32 start_col; + __u32 num_col; + __u32 pad; + __s64 pid; + __u64 command_submissions; + __u64 command_completions; + __u64 migrations; + __u64 preemptions; + __u64 errors; +}; + +enum amdxdna_drm_get_param { + DRM_AMDXDNA_QUERY_AIE_STATUS, + DRM_AMDXDNA_QUERY_AIE_METADATA, + DRM_AMDXDNA_QUERY_AIE_VERSION, + DRM_AMDXDNA_QUERY_CLOCK_METADATA, + DRM_AMDXDNA_QUERY_SENSORS, + DRM_AMDXDNA_QUERY_HW_CONTEXTS, + DRM_AMDXDNA_NUM_GET_PARAM, +}; + +/** + * struct amdxdna_drm_get_info - Get some information from the AIE hardwar= e. + * @param: Value in enum amdxdna_drm_get_param. Specifies the structure pa= ssed in the buffer. + * @buffer_size: Size of the input buffer. Size needed/written by the kern= el. + * @buffer: A structure specified by the param struct member. + */ +struct amdxdna_drm_get_info { + __u32 param; /* in */ + __u32 buffer_size; /* in/out */ + __u64 buffer; /* in/out */ +}; + #define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ struct amdxdna_drm_create_hwctx) @@ -284,6 +446,10 @@ struct amdxdna_drm_wait_cmd { DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_WAIT_CMD, \ struct amdxdna_drm_wait_cmd) =20 +#define DRM_IOCTL_AMDXDNA_GET_INFO \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_GET_INFO, \ + struct amdxdna_drm_get_info) + #if defined(__cplusplus) } /* extern c end */ #endif --=20 2.34.1