From nobody Wed Nov 27 08:26:25 2024 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2045.outbound.protection.outlook.com [40.107.237.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 724BB1E9082 for ; Fri, 11 Oct 2024 23:13:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.237.45 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688404; cv=fail; b=ld57k749jTE6/oar2SHR+nHTO67S+IEyQAaOpB58nev7kTvmDpveGoEKbAmEu1a11SmDfI/BvrYWlB3l9fVZtbDSYHWWc1w2H+WrmqQl9Vg4CnzZhEzoJ47tmld5TH2OdEqoXg69Ls1HSWKKz/85to+toxjd9gHbCjArA++P/iM= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728688404; c=relaxed/simple; bh=+ypV+dPxQqertegqhtbjv+/huMELAo6M7OdXPbt2lk8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JM9cT+3jTqsJFPXDw3WVVhTCpuVCEushwTZHCK8S0ouxDssajK6cLcYeh2FHaCR/t46zGV8EMzte8OTuIEUq0qQfh0xYnyVe7g0KxNypVLXT4UfKpYzu74ngjBrxpOyMvr7IgySVGOpe9paQMlqTTUlo4oxb0kxGJAHImAq3tls= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=m46OWF9X; arc=fail smtp.client-ip=40.107.237.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="m46OWF9X" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yLz7LZ3Wrtq2MAgnSoKQfCfvOQckFSlfR75NXtEtL5cIb+Q/gCGmQ63WPbGFTOQB2YQdJ0B0XSYL+NX7gGexGhFNZCcX6ztpfNXYh8RCi5KQMMeE1Zs41t4JR9wjv0ZYUnuzhtOFXYuLfuf/KeK/wUfCQv9Ws9jODWgkqBJaVO3rY0XG28p0dhIDM/KtwAYKRTV2CijSBSLxEqEIEIpmKCisElrW4fHqseLc6ElKA64cps0I8/CzmUE4KFOFaeUr2cbqh11T0Gt5QNbWo8lMcm+cIHrNF8eQNP5uuiApInYO5YIKom4l/KPMTWFH8MEYNEaoW0I16vZaGqvukmtNpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Via7aCsI+aqyyTTn9wZbnrb6ePJZMnRIiICKajA4dBE=; b=RayEPaEoW0xO77HYGsmn6zcsXmqYVUYffpsN0Fl3Fnk3x/XE7L8dHRpliozIT22vJ63cqdzAJ4fD40W76qyFiwM2d0WY6id4MAOEwYrujf4mwMeVIBoeg9Uh8sTec5ked9hN+q4CpLDaC/yZvZLL0HFl5yfaISjaWtaoXiMRZyX0N+dMstEgFWm0J+GhqUPXd04q9/DtDKMAN3UGNE/pf4rMhbqsDjMZX2nrILPIX5yGhJJVkgeICZYeQUwPU+7JOrie0RZ/lLkSv/PcDaWMKNlrdO9TaAk7jO6/a0inVgQonLtRvLCZsyKDgBVzC72chGpuBx2rC6Up9IkwlXgP4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Via7aCsI+aqyyTTn9wZbnrb6ePJZMnRIiICKajA4dBE=; b=m46OWF9XMPxmIFJlj7EY7fqz6d6yjP5uc2OX4pDjbaIO/JO2nDKCfLkktnIfoBUU5ttBPKwuBOsF78618NYaKSorXX36bZCi8JR9HjZS5qLLLZGoIu3ePTT2CAG7mmgeM+DYI0KWRH3eofOKS+f0jpoyTz3EJgd24hW21V0MsK8= Received: from BYAPR05CA0083.namprd05.prod.outlook.com (2603:10b6:a03:e0::24) by PH0PR12MB7471.namprd12.prod.outlook.com (2603:10b6:510:1e9::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.21; Fri, 11 Oct 2024 23:13:10 +0000 Received: from CO1PEPF000044F7.namprd21.prod.outlook.com (2603:10b6:a03:e0:cafe::2b) by BYAPR05CA0083.outlook.office365.com (2603:10b6:a03:e0::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8069.8 via Frontend Transport; Fri, 11 Oct 2024 23:13:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1PEPF000044F7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8093.1 via Frontend Transport; Fri, 11 Oct 2024 23:13:09 +0000 Received: from SATLEXMB06.amd.com (10.181.40.147) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:02 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB06.amd.com (10.181.40.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 11 Oct 2024 18:13:01 -0500 Received: from xsjlizhih51.xilinx.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 11 Oct 2024 18:13:00 -0500 From: Lizhi Hou To: , CC: Lizhi Hou , , , , , Subject: [PATCH V4 07/10] accel/amdxdna: Add command execution Date: Fri, 11 Oct 2024 16:12:41 -0700 Message-ID: <20241011231244.3182625-8-lizhi.hou@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241011231244.3182625-1-lizhi.hou@amd.com> References: <20241011231244.3182625-1-lizhi.hou@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044F7:EE_|PH0PR12MB7471:EE_ X-MS-Office365-Filtering-Correlation-Id: ada3550a-239d-4c4c-8274-08dcea4a45cd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?h+LkqTYjj3w1Q1wrZL3C+kVOE1BdUGVXbGqWILkzq38z9cqvp6xPdcW0gkX9?= =?us-ascii?Q?z8CkAYEequHUbGZoyagWn7omw9xIadamsuXdFYYJvd6EP6TxY9PoWHgG38Qk?= =?us-ascii?Q?J4btOqAktNgD11K0cQznpHTH0l8mmb+3SdTSmpa0y4EobgSMLfYw8+84YWlq?= =?us-ascii?Q?79wX5eq0PAQLKPr3OJlD2mTJtQCFfrpq3QbCma06ChZEVZovnDc+Fmhhn59P?= =?us-ascii?Q?97xKfa5kPc4rcX7H1IIona3Pei2fMlCFZ++87r3MyfemEGFKMzrEwY91KO7n?= =?us-ascii?Q?8jWQzrGxF6ssdJC0D8rsjnk2ZoDdYWi0GIRnj7uXgYWBFPuTMyvBgApCoU5r?= =?us-ascii?Q?Tvf8862+Db02DDaVPW5fu5oAykETObwV/leMJIDfYtBMD9RIF9ctnd1GQ7AP?= =?us-ascii?Q?LDwRMFtDj9/CZ5vC6MQCNfD+h21/79+bcIHctaTOUIM/iA5YqwZ7E8v+Rht6?= =?us-ascii?Q?6mQbcAYJDX+qk3QoIy2SCXJ5o3nW29t5bi5KX8+tyiEVJjs3GCsMn0J4mjSk?= =?us-ascii?Q?RAU4ZTUhvxSCUOsXGQPqvWSqXXvQLhkzxQIlTdSUsocU7D9Z+C6Sq83VqkLD?= =?us-ascii?Q?qNxfr0vN6djPc+/JvpFhi5gkhmzk5NbacLRaY3m9egnm+H4LCf4Jd1CPaTtt?= =?us-ascii?Q?Tws16qHsZG6owWAwsnHaEKsNOaMxw4hZ9/TTwl62IAi9CcFSj7P1pTzN+ZiX?= =?us-ascii?Q?cB5y+PDl5d7ijXIqVgnfDW5ZHpD7IhncEvANVl8qQeP1ym/TR3tQCNy5PpB7?= =?us-ascii?Q?Ij3FYp3Ks4k+ASfcQOSI27Z9N88aJBAFKlT/MMSfMhlYbOhDVdRC/iGJhZM7?= =?us-ascii?Q?WVRjq8UNR/oQpe95YT0p0zj21enbZSl4fSvkuJKbwdQWKWkRjIwtWxlu+fFV?= =?us-ascii?Q?ysHOo74zmgtoV3SDMQVrtQrsJExzWCQipdPbmSeWkGu7qjW2wHGbi/HvppQ9?= =?us-ascii?Q?N6/rLQAncJPFaItOynmocl2NUdoNgFfsDpe8pwm2N7kYcC9Alvy7OjR2SsYy?= =?us-ascii?Q?nHzzw0EDiHMs7h7WHDB3+8Q7N48Kg1IWoDhz6EzSIIqgtO7nwinM1TsvzTqo?= =?us-ascii?Q?IVZ6A8VvFD7DtpqAdMFmOeMXOM/CzhL0F9DbFIX9ATtuMrhk0MTfnPgdLYQt?= =?us-ascii?Q?AiqRRxrXcUHxuRefDqvUWQ9jNOdFqE9j0szrO52IUezHZUEuJn2wG7PyV0/O?= =?us-ascii?Q?iKpVBQ9lRyv0x4Yq0URs/LX8jGvv4PIneYqSygV9dyQujxjMjlGbARQsm7ej?= =?us-ascii?Q?x/XfQSN+LiXHCdit/+OIxNJ5YmfUluRqs1cF+O2AMLUzx4f8YvAFpDPuCrIg?= =?us-ascii?Q?jUYaxLa+VLtEPKT5rC524DKT04/wNbYN8vChsXJFUv6qTg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Oct 2024 23:13:09.7937 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ada3550a-239d-4c4c-8274-08dcea4a45cd X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044F7.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB7471 Content-Type: text/plain; charset="utf-8" Add interfaces for user application to submit command and wait for its completion. Co-developed-by: Min Ma Signed-off-by: Min Ma Signed-off-by: Lizhi Hou --- drivers/accel/amdxdna/aie2_ctx.c | 624 +++++++++++++++++- drivers/accel/amdxdna/aie2_message.c | 343 ++++++++++ drivers/accel/amdxdna/aie2_pci.c | 6 + drivers/accel/amdxdna/aie2_pci.h | 35 + drivers/accel/amdxdna/aie2_psp.c | 2 + drivers/accel/amdxdna/aie2_smu.c | 2 + drivers/accel/amdxdna/amdxdna_ctx.c | 375 ++++++++++- drivers/accel/amdxdna/amdxdna_ctx.h | 110 +++ drivers/accel/amdxdna/amdxdna_gem.c | 1 + .../accel/amdxdna/amdxdna_mailbox_helper.c | 5 + drivers/accel/amdxdna/amdxdna_pci_drv.c | 6 + drivers/accel/amdxdna/amdxdna_pci_drv.h | 5 + drivers/accel/amdxdna/amdxdna_sysfs.c | 5 + drivers/accel/amdxdna/npu1_regs.c | 1 + drivers/accel/amdxdna/npu2_regs.c | 1 + drivers/accel/amdxdna/npu4_regs.c | 1 + drivers/accel/amdxdna/npu5_regs.c | 1 + include/trace/events/amdxdna.h | 41 ++ include/uapi/drm/amdxdna_accel.h | 59 ++ 19 files changed, 1614 insertions(+), 9 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_= ctx.c index 617fc05077d9..f9010a902c99 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -8,8 +8,11 @@ #include #include #include +#include #include +#include =20 +#include "aie2_msg_priv.h" #include "aie2_pci.h" #include "aie2_solver.h" #include "amdxdna_ctx.h" @@ -17,6 +20,361 @@ #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 +bool force_cmdlist; +module_param(force_cmdlist, bool, 0600); +MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default false)"); + +#define HWCTX_MAX_TIMEOUT 60000 /* miliseconds */ + +static int +aie2_hwctx_add_job(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job) +{ + struct amdxdna_sched_job *other; + int idx; + + idx =3D get_job_idx(hwctx->priv->seq); + /* When pending list full, hwctx->seq points to oldest fence */ + other =3D hwctx->priv->pending[idx]; + if (other && other->fence) + return -EAGAIN; + + if (other) { + dma_fence_put(other->out_fence); + amdxdna_job_put(other); + } + + hwctx->priv->pending[idx] =3D job; + job->seq =3D hwctx->priv->seq++; + kref_get(&job->refcnt); + + return 0; +} + +static struct amdxdna_sched_job * +aie2_hwctx_get_job(struct amdxdna_hwctx *hwctx, u64 seq) +{ + int idx; + + /* Special sequence number for oldest fence if exist */ + if (seq =3D=3D AMDXDNA_INVALID_CMD_HANDLE) { + idx =3D get_job_idx(hwctx->priv->seq); + goto out; + } + + if (seq >=3D hwctx->priv->seq) + return ERR_PTR(-EINVAL); + + if (seq + HWCTX_MAX_CMDS < hwctx->priv->seq) + return NULL; + + idx =3D get_job_idx(seq); + +out: + return hwctx->priv->pending[idx]; +} + +/* The bad_job is used in aie2_sched_job_timedout, otherwise, set it to NU= LL */ +static void aie2_hwctx_stop(struct amdxdna_dev *xdna, struct amdxdna_hwctx= *hwctx, + struct drm_sched_job *bad_job) +{ + drm_sched_stop(&hwctx->priv->sched, bad_job); + aie2_destroy_context(xdna->dev_handle, hwctx); +} + +static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwc= tx *hwctx) +{ + struct amdxdna_gem_obj *heap =3D hwctx->priv->heap; + int ret; + + ret =3D aie2_create_context(xdna->dev_handle, hwctx); + if (ret) { + XDNA_ERR(xdna, "Create hwctx failed, ret %d", ret); + goto out; + } + + ret =3D aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buf failed, ret %d", ret); + goto out; + } + + if (hwctx->status !=3D HWCTX_STAT_READY) { + XDNA_DBG(xdna, "hwctx is not ready, status %d", hwctx->status); + goto out; + } + + ret =3D aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config cu failed, ret %d", ret); + goto out; + } + +out: + drm_sched_start(&hwctx->priv->sched); + XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); + return ret; +} + +void aie2_stop_ctx_by_col_map(struct amdxdna_client *client, u32 col_map) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + /* check if the HW context uses the error column */ + if (!(col_map & amdxdna_hwctx_col_map(hwctx))) + continue; + + aie2_hwctx_stop(xdna, hwctx, NULL); + hwctx->old_status =3D hwctx->status; + hwctx->status =3D HWCTX_STAT_STOP; + XDNA_DBG(xdna, "Stop %s", hwctx->name); + } + mutex_unlock(&client->hwctx_lock); +} + +void aie2_restart_ctx(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int next =3D 0; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + idr_for_each_entry_continue(&client->hwctx_idr, hwctx, next) { + if (hwctx->status !=3D HWCTX_STAT_STOP) + continue; + + hwctx->status =3D hwctx->old_status; + XDNA_DBG(xdna, "Resetting %s", hwctx->name); + aie2_hwctx_restart(xdna, hwctx); + } + mutex_unlock(&client->hwctx_lock); +} + +static int aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_sched_job *job; + + mutex_lock(&hwctx->priv->io_lock); + if (!hwctx->priv->seq) { + mutex_unlock(&hwctx->priv->io_lock); + return 0; + } + + job =3D aie2_hwctx_get_job(hwctx, hwctx->priv->seq - 1); + if (IS_ERR_OR_NULL(job)) { + mutex_unlock(&hwctx->priv->io_lock); + XDNA_WARN(hwctx->client->xdna, "Corrupted pending list"); + return 0; + } + mutex_unlock(&hwctx->priv->io_lock); + + wait_event(hwctx->priv->job_free_wq, !job->fence); + + return 0; +} + +static void +aie2_sched_notify(struct amdxdna_sched_job *job) +{ + struct dma_fence *fence =3D job->fence; + + job->hwctx->priv->completed++; + dma_fence_signal(fence); + trace_xdna_job(&job->base, job->hwctx->name, "signaled fence", job->seq); + dma_fence_put(fence); + mmput(job->mm); + amdxdna_job_put(job); +} + +static int +aie2_sched_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + struct amdxdna_gem_obj *cmd_abo; + u32 ret =3D 0; + u32 status; + + cmd_abo =3D job->cmd_bo; + + if (unlikely(!data)) + goto out; + + if (unlikely(size !=3D sizeof(u32))) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + + status =3D *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + if (status =3D=3D AIE2_STATUS_SUCCESS) + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + else + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ERROR); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_nocmd_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + u32 ret =3D 0; + u32 status; + + if (unlikely(!data)) + goto out; + + if (unlikely(size !=3D sizeof(u32))) { + ret =3D -EINVAL; + goto out; + } + + status =3D *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_cmdlist_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job =3D handle; + struct amdxdna_gem_obj *cmd_abo; + struct cmd_chain_resp *resp; + struct amdxdna_dev *xdna; + u32 fail_cmd_status; + u32 fail_cmd_idx; + u32 ret =3D 0; + + cmd_abo =3D job->cmd_bo; + if (unlikely(!data) || unlikely(size !=3D sizeof(u32) * 3)) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + + resp =3D (struct cmd_chain_resp *)data; + xdna =3D job->hwctx->client->xdna; + XDNA_DBG(xdna, "Status 0x%x", resp->status); + if (resp->status =3D=3D AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + goto out; + } + + /* Slow path to handle error, read from ringbuf on BAR */ + fail_cmd_idx =3D resp->fail_cmd_idx; + fail_cmd_status =3D resp->fail_cmd_status; + XDNA_DBG(xdna, "Failed cmd idx %d, status 0x%x", + fail_cmd_idx, fail_cmd_status); + + if (fail_cmd_status =3D=3D AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret =3D -EINVAL; + goto out; + } + amdxdna_cmd_set_state(cmd_abo, fail_cmd_status); + + if (amdxdna_cmd_get_op(cmd_abo) =3D=3D ERT_CMD_CHAIN) { + struct amdxdna_cmd_chain *cc =3D amdxdna_cmd_get_payload(cmd_abo, NULL); + + cc->error_index =3D fail_cmd_idx; + if (cc->error_index >=3D cc->command_count) + cc->error_index =3D 0; + } +out: + aie2_sched_notify(job); + return ret; +} + +static struct dma_fence * +aie2_sched_job_run(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct amdxdna_hwctx *hwctx =3D job->hwctx; + struct dma_fence *fence; + int ret; + + if (!mmget_not_zero(job->mm)) + return ERR_PTR(-ESRCH); + + kref_get(&job->refcnt); + fence =3D dma_fence_get(job->fence); + + if (unlikely(!cmd_abo)) { + ret =3D aie2_sync_bo(hwctx, job, aie2_sched_nocmd_resp_handler); + goto out; + } + + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_NEW); + + if (amdxdna_cmd_get_op(cmd_abo) =3D=3D ERT_CMD_CHAIN) + ret =3D aie2_cmdlist_multi_execbuf(hwctx, job, aie2_sched_cmdlist_resp_h= andler); + else if (force_cmdlist) + ret =3D aie2_cmdlist_single_execbuf(hwctx, job, aie2_sched_cmdlist_resp_= handler); + else + ret =3D aie2_execbuf(hwctx, job, aie2_sched_resp_handler); + +out: + if (ret) { + dma_fence_put(job->fence); + amdxdna_job_put(job); + mmput(job->mm); + fence =3D ERR_PTR(ret); + } + trace_xdna_job(sched_job, hwctx->name, "sent to device", job->seq); + + return fence; +} + +static void aie2_sched_job_free(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx =3D job->hwctx; + + trace_xdna_job(sched_job, hwctx->name, "job free", job->seq); + drm_sched_job_cleanup(sched_job); + job->fence =3D NULL; + amdxdna_job_put(job); + + wake_up(&hwctx->priv->job_free_wq); +} + +static enum drm_gpu_sched_stat +aie2_sched_job_timedout(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job =3D drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx =3D job->hwctx; + struct amdxdna_dev *xdna; + + xdna =3D hwctx->client->xdna; + trace_xdna_job(sched_job, hwctx->name, "job timedout", job->seq); + mutex_lock(&xdna->dev_lock); + aie2_hwctx_stop(xdna, hwctx, sched_job); + + aie2_hwctx_restart(xdna, hwctx); + mutex_unlock(&xdna->dev_lock); + + return DRM_GPU_SCHED_STAT_NOMINAL; +} + +const struct drm_sched_backend_ops sched_ops =3D { + .run_job =3D aie2_sched_job_run, + .free_job =3D aie2_sched_job_free, + .timedout_job =3D aie2_sched_job_timedout, +}; + static int aie2_hwctx_col_list(struct amdxdna_hwctx *hwctx) { struct amdxdna_dev *xdna =3D hwctx->client->xdna; @@ -130,9 +488,10 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) { struct amdxdna_client *client =3D hwctx->client; struct amdxdna_dev *xdna =3D client->xdna; + struct drm_gpu_scheduler *sched; struct amdxdna_hwctx_priv *priv; struct amdxdna_gem_obj *heap; - int ret; + int i, ret; =20 priv =3D kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); if (!priv) @@ -157,10 +516,48 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) goto put_heap; } =20 + for (i =3D 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + struct amdxdna_gem_obj *abo; + struct amdxdna_drm_create_bo args =3D { + .flags =3D 0, + .type =3D AMDXDNA_BO_DEV, + .vaddr =3D 0, + .size =3D MAX_CHAIN_CMDBUF_SIZE, + }; + + abo =3D amdxdna_drm_alloc_dev_bo(&xdna->ddev, &args, client->filp, true); + if (IS_ERR(abo)) { + ret =3D PTR_ERR(abo); + goto free_cmd_bufs; + } + + XDNA_DBG(xdna, "Command buf %d addr 0x%llx size 0x%lx", + i, abo->mem.dev_addr, abo->mem.size); + priv->cmd_buf[i] =3D abo; + } + + sched =3D &priv->sched; + mutex_init(&priv->io_lock); + ret =3D drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT, + HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT), + NULL, NULL, hwctx->name, xdna->ddev.dev); + if (ret) { + XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret); + goto free_cmd_bufs; + } + + ret =3D drm_sched_entity_init(&priv->entity, DRM_SCHED_PRIORITY_NORMAL, + &sched, 1, NULL); + if (ret) { + XDNA_ERR(xdna, "Failed to initial sched entiry. ret %d", ret); + goto free_sched; + } + init_waitqueue_head(&priv->job_free_wq); + ret =3D aie2_hwctx_col_list(hwctx); if (ret) { XDNA_ERR(xdna, "Create col list failed, ret %d", ret); - goto unpin; + goto free_entity; } =20 ret =3D aie2_alloc_resource(hwctx); @@ -185,7 +582,16 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) aie2_release_resource(hwctx); free_col_list: kfree(hwctx->col_list); -unpin: +free_entity: + drm_sched_entity_destroy(&priv->entity); +free_sched: + drm_sched_fini(&priv->sched); +free_cmd_bufs: + for (i =3D 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + if (!priv->cmd_buf[i]) + continue; + drm_gem_object_put(to_gobj(priv->cmd_buf[i])); + } amdxdna_gem_unpin(heap); put_heap: drm_gem_object_put(to_gobj(heap)); @@ -196,11 +602,43 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) =20 void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) { + struct amdxdna_sched_job *job; + struct amdxdna_dev *xdna; + int idx; + + xdna =3D hwctx->client->xdna; + drm_sched_wqueue_stop(&hwctx->priv->sched); + + /* Now, scheduler will not send command to device. */ aie2_release_resource(hwctx); =20 + /* + * All submitted commands are aborted. + * Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run() + * will return NODEV if it is called. + */ + drm_sched_wqueue_start(&hwctx->priv->sched); + + aie2_hwctx_wait_for_idle(hwctx); + drm_sched_entity_destroy(&hwctx->priv->entity); + drm_sched_fini(&hwctx->priv->sched); + + for (idx =3D 0; idx < HWCTX_MAX_CMDS; idx++) { + job =3D hwctx->priv->pending[idx]; + if (!job) + continue; + + dma_fence_put(job->out_fence); + amdxdna_job_put(job); + } + XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq); + + for (idx =3D 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++) + drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx])); amdxdna_gem_unpin(hwctx->priv->heap); drm_gem_object_put(to_gobj(hwctx->priv->heap)); =20 + mutex_destroy(&hwctx->priv->io_lock); kfree(hwctx->col_list); kfree(hwctx->priv); kfree(hwctx->cus); @@ -267,3 +705,183 @@ int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u3= 2 type, u64 value, void *bu return -EOPNOTSUPP; } } + +static int aie2_populate_range(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + struct mm_struct *mm =3D abo->mem.notifier.mm; + struct hmm_range range =3D { 0 }; + unsigned long timeout; + int ret; + + XDNA_INFO_ONCE(xdna, "populate memory range %llx size %lx", + abo->mem.userptr, abo->mem.size); + range.notifier =3D &abo->mem.notifier; + range.start =3D abo->mem.userptr; + range.end =3D abo->mem.userptr + abo->mem.size; + range.hmm_pfns =3D abo->mem.pfns; + range.default_flags =3D HMM_PFN_REQ_FAULT; + + if (!mmget_not_zero(mm)) + return -EFAULT; + + timeout =3D jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); +again: + range.notifier_seq =3D mmu_interval_read_begin(&abo->mem.notifier); + mmap_read_lock(mm); + ret =3D hmm_range_fault(&range); + mmap_read_unlock(mm); + if (ret) { + if (time_after(jiffies, timeout)) { + ret =3D -ETIME; + goto put_mm; + } + + if (ret =3D=3D -EBUSY) + goto again; + + goto put_mm; + } + + dma_resv_lock(to_gobj(abo)->resv, NULL); + if (mmu_interval_read_retry(&abo->mem.notifier, range.notifier_seq)) { + dma_resv_unlock(to_gobj(abo)->resv); + goto again; + } + abo->mem.map_invalid =3D false; + dma_resv_unlock(to_gobj(abo)->resv); + +put_mm: + mmput(mm); + return ret; +} + +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job = *job, u64 *seq) +{ + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct ww_acquire_ctx acquire_ctx; + struct amdxdna_gem_obj *abo; + unsigned long timeout =3D 0; + int ret, i; + + ret =3D drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx); + if (ret) { + XDNA_ERR(xdna, "DRM job init failed, ret %d", ret); + return ret; + } + + drm_sched_job_arm(&job->base); + job->out_fence =3D dma_fence_get(&job->base.s_fence->finished); + +retry: + ret =3D drm_gem_lock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (ret) { + XDNA_WARN(xdna, "Failed to reverve fence, ret %d", ret); + goto put_fence; + } + + for (i =3D 0; i < job->bo_cnt; i++) { + abo =3D to_xdna_obj(job->bos[i]); + if (abo->mem.map_invalid) { + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (!timeout) { + timeout =3D jiffies + + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); + } else if (time_after(jiffies, timeout)) { + ret =3D -ETIME; + goto put_fence; + } + + ret =3D aie2_populate_range(abo); + if (ret) + goto put_fence; + goto retry; + } + + ret =3D dma_resv_reserve_fences(job->bos[i]->resv, 1); + if (ret) { + XDNA_WARN(xdna, "Failed to reserve fences %d", ret); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + goto put_fence; + } + } + + for (i =3D 0; i < job->bo_cnt; i++) + dma_resv_add_fence(job->bos[i]->resv, job->out_fence, DMA_RESV_USAGE_WRI= TE); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + + mutex_lock(&hwctx->priv->io_lock); + ret =3D aie2_hwctx_add_job(hwctx, job); + if (ret) { + mutex_unlock(&hwctx->priv->io_lock); + goto signal_fence; + } + + *seq =3D job->seq; + drm_sched_entity_push_job(&job->base); + mutex_unlock(&hwctx->priv->io_lock); + + return 0; + +signal_fence: + dma_fence_signal(job->out_fence); +put_fence: + dma_fence_put(job->out_fence); + drm_sched_job_cleanup(&job->base); + return ret; +} + +int aie2_cmd_wait(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout) +{ + signed long remaining =3D MAX_SCHEDULE_TIMEOUT; + struct amdxdna_sched_job *job; + struct dma_fence *out_fence; + long ret; + + mutex_lock(&hwctx->priv->io_lock); + job =3D aie2_hwctx_get_job(hwctx, seq); + if (IS_ERR(job)) { + mutex_unlock(&hwctx->priv->io_lock); + ret =3D PTR_ERR(job); + goto out; + } + + if (unlikely(!job)) { + mutex_unlock(&hwctx->priv->io_lock); + ret =3D 0; + goto out; + } + out_fence =3D dma_fence_get(job->out_fence); + mutex_unlock(&hwctx->priv->io_lock); + + if (timeout) + remaining =3D msecs_to_jiffies(timeout); + + ret =3D dma_fence_wait_timeout(out_fence, true, remaining); + if (!ret) + ret =3D -ETIME; + else if (ret > 0) + ret =3D 0; + + dma_fence_put(out_fence); +out: + return ret; +} + +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, + unsigned long cur_seq) +{ + struct amdxdna_dev *xdna =3D to_xdna_dev(to_gobj(abo)->dev); + struct drm_gem_object *gobj =3D to_gobj(abo); + long ret; + + dma_resv_lock(gobj->resv, NULL); + abo->mem.map_invalid =3D true; + mmu_interval_set_seq(&abo->mem.notifier, cur_seq); + ret =3D dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, + true, MAX_SCHEDULE_TIMEOUT); + dma_resv_unlock(gobj->resv); + + if (!ret || ret =3D=3D -ERESTARTSYS) + XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/a= ie2_message.c index 28bd0560db61..3dc4a9a8571e 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -4,10 +4,12 @@ */ =20 #include +#include #include #include #include #include +#include #include #include #include @@ -361,3 +363,344 @@ int aie2_config_cu(struct amdxdna_hwctx *hwctx) msg.opcode, resp.status, ret); return ret; } + +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + union { + struct execute_buffer_req ebuf; + struct exec_dpu_req dpu; + } req; + struct xdna_mailbox_msg msg; + u32 payload_len; + void *payload; + int cu_idx; + int ret; + u32 op; + + if (!chann) + return -ENODEV; + + payload =3D amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (!payload) { + XDNA_ERR(xdna, "Invalid command, cannot get payload"); + return -EINVAL; + } + + cu_idx =3D amdxdna_cmd_get_cu_idx(cmd_abo); + if (cu_idx < 0) { + XDNA_DBG(xdna, "Invalid cu idx"); + return -EINVAL; + } + + op =3D amdxdna_cmd_get_op(cmd_abo); + switch (op) { + case ERT_START_CU: + if (unlikely(payload_len > sizeof(req.ebuf.payload))) + XDNA_DBG(xdna, "Invalid ebuf payload len: %d", payload_len); + req.ebuf.cu_idx =3D cu_idx; + memcpy(req.ebuf.payload, payload, sizeof(req.ebuf.payload)); + msg.send_size =3D sizeof(req.ebuf); + msg.opcode =3D MSG_OP_EXECUTE_BUFFER_CF; + break; + case ERT_START_NPU: { + struct amdxdna_cmd_start_npu *sn =3D payload; + + if (unlikely(payload_len - sizeof(*sn) > sizeof(req.dpu.payload))) + XDNA_DBG(xdna, "Invalid dpu payload len: %d", payload_len); + req.dpu.inst_buf_addr =3D sn->buffer; + req.dpu.inst_size =3D sn->buffer_size; + req.dpu.inst_prop_cnt =3D sn->prop_count; + req.dpu.cu_idx =3D cu_idx; + memcpy(req.dpu.payload, sn->prop_args, sizeof(req.dpu.payload)); + msg.send_size =3D sizeof(req.dpu); + msg.opcode =3D MSG_OP_EXEC_DPU; + break; + } + default: + XDNA_DBG(xdna, "Invalid ERT cmd op code: %d", op); + return -EINVAL; + } + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + print_hex_dump_debug("cmd: ", DUMP_PREFIX_OFFSET, 16, 4, &req, + 0x40, false); + + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_cf(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_execbuf_cf *buf =3D cmd_buf + offset; + int cu_idx =3D amdxdna_cmd_get_cu_idx(abo); + u32 payload_len; + void *payload; + + if (cu_idx < 0) + return -EINVAL; + + payload =3D amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + + if (!slot_cf_has_space(offset, payload_len)) + return -ENOSPC; + + buf->cu_idx =3D cu_idx; + buf->arg_cnt =3D payload_len / sizeof(u32); + memcpy(buf->args, payload, payload_len); + /* Accurate buf size to hint firmware to do necessary copy */ + *size =3D sizeof(*buf) + payload_len; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_dpu(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_dpu *buf =3D cmd_buf + offset; + int cu_idx =3D amdxdna_cmd_get_cu_idx(abo); + struct amdxdna_cmd_start_npu *sn; + u32 payload_len; + void *payload; + u32 arg_sz; + + if (cu_idx < 0) + return -EINVAL; + + payload =3D amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + sn =3D payload; + arg_sz =3D payload_len - sizeof(*sn); + if (payload_len < sizeof(*sn) || arg_sz > MAX_DPU_ARGS_SIZE) + return -EINVAL; + + if (!slot_dpu_has_space(offset, arg_sz)) + return -ENOSPC; + + buf->inst_buf_addr =3D sn->buffer; + buf->inst_size =3D sn->buffer_size; + buf->inst_prop_cnt =3D sn->prop_count; + buf->cu_idx =3D cu_idx; + buf->arg_cnt =3D arg_sz / sizeof(u32); + memcpy(buf->args, sn->prop_args, arg_sz); + + /* Accurate buf size to hint firmware to do necessary copy */ + *size +=3D sizeof(*buf) + arg_sz; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot(u32 op, struct amdxdna_gem_obj *cmdbuf_abo, u32= offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + u32 this_op =3D amdxdna_cmd_get_op(abo); + void *cmd_buf =3D cmdbuf_abo->mem.kva; + int ret; + + if (this_op !=3D op) { + ret =3D -EINVAL; + goto done; + } + + switch (op) { + case ERT_START_CU: + ret =3D aie2_cmdlist_fill_one_slot_cf(cmd_buf, offset, abo, size); + break; + case ERT_START_NPU: + ret =3D aie2_cmdlist_fill_one_slot_dpu(cmd_buf, offset, abo, size); + break; + default: + ret =3D -EOPNOTSUPP; + } + +done: + if (ret) { + XDNA_ERR(abo->client->xdna, "Can't fill slot for cmd op %d ret %d", + op, ret); + } + return ret; +} + +static inline struct amdxdna_gem_obj * +aie2_cmdlist_get_cmd_buf(struct amdxdna_sched_job *job) +{ + int idx =3D get_job_idx(job->seq); + + return job->hwctx->priv->cmd_buf[idx]; +} + +static void +aie2_cmdlist_prepare_request(struct cmd_chain_req *req, + struct amdxdna_gem_obj *cmdbuf_abo, u32 size, u32 cnt) +{ + req->buf_addr =3D cmdbuf_abo->mem.dev_addr; + req->buf_size =3D size; + req->count =3D cnt; + drm_clflush_virt_range(cmdbuf_abo->mem.kva, size); + XDNA_DBG(cmdbuf_abo->client->xdna, "Command buf addr 0x%llx size 0x%x cou= nt %d", + req->buf_addr, size, cnt); +} + +static inline u32 +aie2_cmd_op_to_msg_op(u32 op) +{ + switch (op) { + case ERT_START_CU: + return MSG_OP_CHAIN_EXEC_BUFFER_CF; + case ERT_START_NPU: + return MSG_OP_CHAIN_EXEC_DPU; + default: + return MSG_OP_MAX_OPCODE; + } +} + +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo =3D aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_client *client =3D hwctx->client; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct amdxdna_cmd_chain *payload; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 payload_len; + u32 offset =3D 0; + u32 size; + int ret; + u32 op; + u32 i; + + op =3D amdxdna_cmd_get_op(cmd_abo); + payload =3D amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (op !=3D ERT_CMD_CHAIN || !payload || + payload_len < struct_size(payload, data, payload->command_count)) + return -EINVAL; + + for (i =3D 0; i < payload->command_count; i++) { + u32 boh =3D (u32)(payload->data[i]); + struct amdxdna_gem_obj *abo; + + abo =3D amdxdna_gem_get_obj(client, boh, AMDXDNA_BO_CMD); + if (!abo) { + XDNA_ERR(client->xdna, "Failed to find cmd BO %d", boh); + return -ENOENT; + } + + /* All sub-cmd should have same op, use the first one. */ + if (i =3D=3D 0) + op =3D amdxdna_cmd_get_op(abo); + + ret =3D aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, offset, abo, &size); + amdxdna_gem_put_obj(abo); + if (ret) + return -EINVAL; + + offset +=3D size; + } + + /* The offset is the accumulated total size of the cmd buffer */ + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, offset, payload->command_c= ount); + + msg.opcode =3D aie2_cmd_op_to_msg_op(op); + if (msg.opcode =3D=3D MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo =3D aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *cmd_abo =3D job->cmd_bo; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 size; + int ret; + u32 op; + + op =3D amdxdna_cmd_get_op(cmd_abo); + ret =3D aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, 0, cmd_abo, &size); + if (ret) + return ret; + + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, size, 1); + + msg.opcode =3D aie2_cmd_op_to_msg_op(op); + if (msg.opcode =3D=3D MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann =3D hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *abo =3D to_xdna_obj(job->bos[0]); + struct amdxdna_dev *xdna =3D hwctx->client->xdna; + struct xdna_mailbox_msg msg; + struct sync_bo_req req; + int ret =3D 0; + + req.src_addr =3D 0; + req.dst_addr =3D abo->mem.dev_addr - hwctx->client->dev_heap->mem.dev_add= r; + req.size =3D abo->mem.size; + + /* Device to Host */ + req.type =3D FIELD_PREP(AIE2_MSG_SYNC_BO_SRC_TYPE, SYNC_BO_DEV_MEM) | + FIELD_PREP(AIE2_MSG_SYNC_BO_DST_TYPE, SYNC_BO_HOST_MEM); + + XDNA_DBG(xdna, "sync %d bytes src(0x%llx) to dst(0x%llx) completed", + req.size, req.src_addr, req.dst_addr); + + msg.handle =3D job; + msg.notify_cb =3D notify_cb; + msg.send_data =3D (u8 *)&req; + msg.send_size =3D sizeof(req); + msg.opcode =3D MSG_OP_SYNC_BO; + + ret =3D xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_= pci.c index ee9f114bc229..6017826a7104 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -5,8 +5,10 @@ =20 #include #include +#include #include #include +#include #include #include #include @@ -17,6 +19,7 @@ #include "aie2_pci.h" #include "aie2_solver.h" #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_pci_drv.h" =20 @@ -499,4 +502,7 @@ const struct amdxdna_dev_ops aie2_ops =3D { .hwctx_init =3D aie2_hwctx_init, .hwctx_fini =3D aie2_hwctx_fini, .hwctx_config =3D aie2_hwctx_config, + .cmd_submit =3D aie2_cmd_submit, + .cmd_wait =3D aie2_cmd_wait, + .hmm_invalidate =3D aie2_hmm_invalidate, }; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_= pci.h index 3ac936e2c9d1..81877d9c0542 100644 --- a/drivers/accel/amdxdna/aie2_pci.h +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -76,6 +76,7 @@ enum psp_reg_idx { PSP_MAX_REGS /* Keep this at the end */ }; =20 +struct amdxdna_client; struct amdxdna_fw_ver; struct amdxdna_hwctx; =20 @@ -118,9 +119,28 @@ struct rt_config { u32 value; }; =20 +/* + * Define the maximum number of pending commands in a hardware context. + * Must be power of 2! + */ +#define HWCTX_MAX_CMDS 4 +#define get_job_idx(seq) ((seq) & (HWCTX_MAX_CMDS - 1)) struct amdxdna_hwctx_priv { struct amdxdna_gem_obj *heap; void *mbox_chann; + + struct drm_gpu_scheduler sched; + struct drm_sched_entity entity; + + struct mutex io_lock; /* protect seq and cmd order */ + struct wait_queue_head job_free_wq; + struct amdxdna_sched_job *pending[HWCTX_MAX_CMDS]; + u32 num_pending; + u64 seq; + /* Completed job counter */ + u64 completed; + + struct amdxdna_gem_obj *cmd_buf[HWCTX_MAX_CMDS]; }; =20 struct amdxdna_dev_hdl { @@ -199,10 +219,25 @@ int aie2_create_context(struct amdxdna_dev_hdl *ndev,= struct amdxdna_hwctx *hwct int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwct= x *hwctx); int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 ad= dr, u64 size); int aie2_config_cu(struct amdxdna_hwctx *hwctx); +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *jo= b, + int (*notify_cb)(void *, const u32 *, size_t)); =20 /* aie2_hwctx.c */ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, vo= id *buf, u32 size); +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job = *job, u64 *seq); +int aie2_cmd_wait(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, unsigned long cur_se= q); +void aie2_stop_ctx_by_col_map(struct amdxdna_client *client, u32 col_map); +void aie2_restart_ctx(struct amdxdna_client *client); =20 #endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_= psp.c index 2efcfd1941bf..35ba55e2ab1e 100644 --- a/drivers/accel/amdxdna/aie2_psp.c +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -4,8 +4,10 @@ */ =20 #include +#include #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_= smu.c index 3fa7064649aa..91893d438da7 100644 --- a/drivers/accel/amdxdna/aie2_smu.c +++ b/drivers/accel/amdxdna/aie2_smu.c @@ -4,7 +4,9 @@ */ =20 #include +#include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/am= dxdna_ctx.c index 8acf8bfe0db9..b76640e1fdd0 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.c +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -7,17 +7,65 @@ #include #include #include +#include +#include #include +#include +#include =20 #include "amdxdna_ctx.h" +#include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 #define MAX_HWCTX_ID 255 +#define MAX_ARG_COUNT 4095 =20 -static void amdxdna_hwctx_destroy(struct amdxdna_hwctx *hwctx) +struct amdxdna_fence { + struct dma_fence base; + spinlock_t lock; /* for base */ + struct amdxdna_hwctx *hwctx; +}; + +static const char *amdxdna_fence_get_driver_name(struct dma_fence *fence) +{ + return KBUILD_MODNAME; +} + +static const char *amdxdna_fence_get_timeline_name(struct dma_fence *fence) +{ + struct amdxdna_fence *xdna_fence; + + xdna_fence =3D container_of(fence, struct amdxdna_fence, base); + + return xdna_fence->hwctx->name; +} + +static const struct dma_fence_ops fence_ops =3D { + .get_driver_name =3D amdxdna_fence_get_driver_name, + .get_timeline_name =3D amdxdna_fence_get_timeline_name, +}; + +static struct dma_fence *amdxdna_fence_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_fence *fence; + + fence =3D kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return NULL; + + fence->hwctx =3D hwctx; + spin_lock_init(&fence->lock); + dma_fence_init(&fence->base, &fence_ops, &fence->lock, hwctx->id, 0); + return &fence->base; +} + +static void amdxdna_hwctx_destroy_rcu(struct amdxdna_hwctx *hwctx, + struct srcu_struct *ss) { struct amdxdna_dev *xdna =3D hwctx->client->xdna; =20 + synchronize_srcu(ss); + /* At this point, user is not able to submit new commands */ mutex_lock(&xdna->dev_lock); xdna->dev_info->ops->hwctx_fini(hwctx); @@ -27,6 +75,46 @@ static void amdxdna_hwctx_destroy(struct amdxdna_hwctx *= hwctx) kfree(hwctx); } =20 +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + u32 num_masks, count; + + if (amdxdna_cmd_get_op(abo) =3D=3D ERT_CMD_CHAIN) + num_masks =3D 0; + else + num_masks =3D 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + + if (size) { + count =3D FIELD_GET(AMDXDNA_CMD_COUNT, cmd->header); + if (unlikely(count <=3D num_masks)) { + *size =3D 0; + return NULL; + } + *size =3D (count - num_masks) * sizeof(u32); + } + return &cmd->data[num_masks]; +} + +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + u32 num_masks, i; + u32 *cu_mask; + + if (amdxdna_cmd_get_op(abo) =3D=3D ERT_CMD_CHAIN) + return -1; + + num_masks =3D 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + cu_mask =3D cmd->data; + for (i =3D 0; i < num_masks; i++) { + if (cu_mask[i]) + return ffs(cu_mask[i]) - 1; + } + + return -1; +} + /* * This should be called in close() and remove(). DO NOT call in other sys= calls. * This guarantee that when hwctx and resources will be released, if user @@ -43,7 +131,7 @@ void amdxdna_hwctx_remove_all(struct amdxdna_client *cli= ent) client->pid, hwctx->id); idr_remove(&client->hwctx_idr, hwctx->id); mutex_unlock(&client->hwctx_lock); - amdxdna_hwctx_destroy(hwctx); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); mutex_lock(&client->hwctx_lock); } mutex_unlock(&client->hwctx_lock); @@ -134,6 +222,12 @@ int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device = *dev, void *data, struct d if (!drm_dev_enter(dev, &idx)) return -ENODEV; =20 + /* + * Use hwctx_lock to achieve exclusion with other hwctx writers, + * SRCU to synchronize with exec/wait command ioctls. + * + * The pushed jobs are handled by DRM scheduler during destroy. + */ mutex_lock(&client->hwctx_lock); hwctx =3D idr_find(&client->hwctx_idr, args->handle); if (!hwctx) { @@ -146,7 +240,7 @@ int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *= dev, void *data, struct d idr_remove(&client->hwctx_idr, hwctx->id); mutex_unlock(&client->hwctx_lock); =20 - amdxdna_hwctx_destroy(hwctx); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); =20 XDNA_DBG(xdna, "PID %d destroyed HW context %d", client->pid, args->handl= e); out: @@ -160,10 +254,10 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device = *dev, void *data, struct dr struct amdxdna_drm_config_hwctx *args =3D data; struct amdxdna_dev *xdna =3D to_xdna_dev(dev); struct amdxdna_hwctx *hwctx; + int ret, idx; u32 buf_size; void *buf; u64 val; - int ret; =20 if (!xdna->dev_info->ops->hwctx_config) return -EOPNOTSUPP; @@ -202,17 +296,286 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device= *dev, void *data, struct dr } =20 mutex_lock(&xdna->dev_lock); + idx =3D srcu_read_lock(&client->hwctx_srcu); hwctx =3D idr_find(&client->hwctx_idr, args->handle); if (!hwctx) { XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handl= e); ret =3D -EINVAL; - goto unlock; + goto unlock_srcu; } =20 ret =3D xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, b= uf, buf_size); =20 -unlock: +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); mutex_unlock(&xdna->dev_lock); kfree(buf); return ret; } + +static void +amdxdna_arg_bos_put(struct amdxdna_sched_job *job) +{ + int i; + + for (i =3D 0; i < job->bo_cnt; i++) { + if (!job->bos[i]) + break; + drm_gem_object_put(job->bos[i]); + } +} + +static int +amdxdna_arg_bos_lookup(struct amdxdna_client *client, + struct amdxdna_sched_job *job, + u32 *bo_hdls, u32 bo_cnt) +{ + struct drm_gem_object *gobj; + int i, ret; + + job->bo_cnt =3D bo_cnt; + for (i =3D 0; i < job->bo_cnt; i++) { + struct amdxdna_gem_obj *abo; + + gobj =3D drm_gem_object_lookup(client->filp, bo_hdls[i]); + if (!gobj) { + ret =3D -ENOENT; + goto put_shmem_bo; + } + abo =3D to_xdna_obj(gobj); + + mutex_lock(&abo->lock); + if (abo->pinned) { + mutex_unlock(&abo->lock); + job->bos[i] =3D gobj; + continue; + } + + ret =3D amdxdna_gem_pin_nolock(abo); + if (ret) { + mutex_unlock(&abo->lock); + drm_gem_object_put(gobj); + goto put_shmem_bo; + } + abo->pinned =3D true; + mutex_unlock(&abo->lock); + + job->bos[i] =3D gobj; + } + + return 0; + +put_shmem_bo: + amdxdna_arg_bos_put(job); + return ret; +} + +static void amdxdna_sched_job_release(struct kref *ref) +{ + struct amdxdna_sched_job *job; + + job =3D container_of(ref, struct amdxdna_sched_job, refcnt); + + trace_amdxdna_debug_point(job->hwctx->name, job->seq, "job release"); + amdxdna_arg_bos_put(job); + amdxdna_gem_put_obj(job->cmd_bo); + kfree(job); +} + +void amdxdna_job_put(struct amdxdna_sched_job *job) +{ + kref_put(&job->refcnt, amdxdna_sched_job_release); +} + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdl, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_sched_job *job; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + XDNA_DBG(xdna, "Command BO hdl %d, Arg BO count %d", cmd_bo_hdl, arg_bo_c= nt); + job =3D kzalloc(struct_size(job, bos, arg_bo_cnt), GFP_KERNEL); + if (!job) + return -ENOMEM; + + if (cmd_bo_hdl !=3D AMDXDNA_INVALID_BO_HANDLE) { + job->cmd_bo =3D amdxdna_gem_get_obj(client, cmd_bo_hdl, AMDXDNA_BO_CMD); + if (!job->cmd_bo) { + XDNA_ERR(xdna, "Failed to get cmd bo from %d", cmd_bo_hdl); + ret =3D -EINVAL; + goto free_job; + } + } else { + job->cmd_bo =3D NULL; + } + + ret =3D amdxdna_arg_bos_lookup(client, job, arg_bo_hdls, arg_bo_cnt); + if (ret) { + XDNA_ERR(xdna, "Argument BOs lookup failed, ret %d", ret); + goto cmd_put; + } + + idx =3D srcu_read_lock(&client->hwctx_srcu); + hwctx =3D idr_find(&client->hwctx_idr, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret =3D -EINVAL; + goto unlock_srcu; + } + + if (hwctx->status !=3D HWCTX_STAT_READY) { + XDNA_ERR(xdna, "HW Context is not ready"); + ret =3D -EINVAL; + goto unlock_srcu; + } + + job->hwctx =3D hwctx; + job->mm =3D current->mm; + + job->fence =3D amdxdna_fence_create(hwctx); + if (!job->fence) { + XDNA_ERR(xdna, "Failed to create fence"); + ret =3D -ENOMEM; + goto unlock_srcu; + } + kref_init(&job->refcnt); + + ret =3D xdna->dev_info->ops->cmd_submit(hwctx, job, seq); + if (ret) + goto put_fence; + + /* + * The amdxdna_hwctx_destroy_rcu() will release hwctx and associated + * resource after synchronize_srcu(). The submitted jobs should be + * handled by the queue, for example DRM scheduler, in device layer. + * For here we can unlock SRCU. + */ + srcu_read_unlock(&client->hwctx_srcu, idx); + trace_amdxdna_debug_point(hwctx->name, *seq, "job pushed"); + + return 0; + +put_fence: + dma_fence_put(job->fence); +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + amdxdna_arg_bos_put(job); +cmd_put: + amdxdna_gem_put_obj(job->cmd_bo); +free_job: + kfree(job); + return ret; +} + +/* + * The submit command ioctl submits a command to firmware. One firmware co= mmand + * may contain multiple command BOs for processing as a whole. + * The command sequence number is returned which can be used for wait comm= and ioctl. + */ +static int amdxdna_drm_submit_execbuf(struct amdxdna_client *client, + struct amdxdna_drm_exec_cmd *args) +{ + struct amdxdna_dev *xdna =3D client->xdna; + u32 *arg_bo_hdls; + u32 cmd_bo_hdl; + int ret; + + if (!args->arg_count || args->arg_count > MAX_ARG_COUNT) { + XDNA_ERR(xdna, "Invalid arg bo count %d", args->arg_count); + return -EINVAL; + } + + /* Only support single command for now. */ + if (args->cmd_count !=3D 1) { + XDNA_ERR(xdna, "Invalid cmd bo count %d", args->cmd_count); + return -EINVAL; + } + + cmd_bo_hdl =3D (u32)args->cmd_handles; + arg_bo_hdls =3D kcalloc(args->arg_count, sizeof(u32), GFP_KERNEL); + if (!arg_bo_hdls) + return -ENOMEM; + ret =3D copy_from_user(arg_bo_hdls, u64_to_user_ptr(args->args), + args->arg_count * sizeof(u32)); + if (ret) { + ret =3D -EFAULT; + goto free_cmd_bo_hdls; + } + + ret =3D amdxdna_cmd_submit(client, cmd_bo_hdl, arg_bo_hdls, + args->arg_count, args->hwctx, &args->seq); + if (ret) + XDNA_DBG(xdna, "Submit cmds failed, ret %d", ret); + +free_cmd_bo_hdls: + kfree(arg_bo_hdls); + if (!ret) + XDNA_DBG(xdna, "Pushed cmd %lld to scheduler", args->seq); + return ret; +} + +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struc= t drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_drm_exec_cmd *args =3D data; + + if (args->ext || args->ext_flags) + return -EINVAL; + + switch (args->type) { + case AMDXDNA_CMD_SUBMIT_EXEC_BUF: + return amdxdna_drm_submit_execbuf(client, args); + } + + XDNA_ERR(client->xdna, "Invalid command type %d", args->type); + return -EINVAL; +} + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout) +{ + struct amdxdna_dev *xdna =3D client->xdna; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + if (!xdna->dev_info->ops->cmd_wait) + return -EOPNOTSUPP; + + /* For locking concerns, see amdxdna_drm_exec_cmd_ioctl. */ + idx =3D srcu_read_lock(&client->hwctx_srcu); + hwctx =3D idr_find(&client->hwctx_idr, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret =3D -EINVAL; + goto unlock_hwctx_srcu; + } + + ret =3D xdna->dev_info->ops->cmd_wait(hwctx, seq, timeout); + +unlock_hwctx_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + return ret; +} + +int amdxdna_drm_wait_cmd_ioctl(struct drm_device *dev, void *data, struct = drm_file *filp) +{ + struct amdxdna_client *client =3D filp->driver_priv; + struct amdxdna_dev *xdna =3D to_xdna_dev(dev); + struct amdxdna_drm_wait_cmd *args =3D data; + int ret; + + XDNA_DBG(xdna, "PID %d hwctx %d timeout set %d ms for cmd %lld", + client->pid, args->hwctx, args->timeout, args->seq); + + ret =3D amdxdna_cmd_wait(client, args->hwctx, args->seq, args->timeout); + + XDNA_DBG(xdna, "PID %d hwctx %d cmd %lld wait finished, ret %d", + client->pid, args->hwctx, args->seq, ret); + + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/am= dxdna_ctx.h index 665b3208897d..65f9c1dfe32c 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.h +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -6,6 +6,52 @@ #ifndef _AMDXDNA_CTX_H_ #define _AMDXDNA_CTX_H_ =20 +#include "amdxdna_gem.h" + +struct amdxdna_hwctx_priv; + +enum ert_cmd_opcode { + ERT_START_CU =3D 0, + ERT_CMD_CHAIN =3D 19, + ERT_START_NPU =3D 20, +}; + +enum ert_cmd_state { + ERT_CMD_STATE_INVALID, + ERT_CMD_STATE_NEW, + ERT_CMD_STATE_QUEUED, + ERT_CMD_STATE_RUNNING, + ERT_CMD_STATE_COMPLETED, + ERT_CMD_STATE_ERROR, + ERT_CMD_STATE_ABORT, + ERT_CMD_STATE_SUBMITTED, + ERT_CMD_STATE_TIMEOUT, + ERT_CMD_STATE_NORESPONSE, +}; + +/* + * Interpretation of the beginning of data payload for ERT_START_NPU in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is regular kernel a= rgs. + */ +struct amdxdna_cmd_start_npu { + u64 buffer; /* instruction buffer address */ + u32 buffer_size; /* size of buffer in bytes */ + u32 prop_count; /* properties count */ + u32 prop_args[]; /* properties and regular kernel arguments */ +}; + +/* + * Interpretation of the beginning of data payload for ERT_CMD_CHAIN in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is cmd BO handles. + */ +struct amdxdna_cmd_chain { + u32 command_count; + u32 submit_index; + u32 error_index; + u32 reserved[3]; + u64 data[] __counted_by(command_count); +}; + /* Exec buffer command header format */ #define AMDXDNA_CMD_STATE GENMASK(3, 0) #define AMDXDNA_CMD_EXTRA_CU_MASK GENMASK(11, 10) @@ -40,9 +86,73 @@ struct amdxdna_hwctx { struct amdxdna_hwctx_param_config_cu *cus; }; =20 +#define drm_job_to_xdna_job(j) \ + container_of(j, struct amdxdna_sched_job, base) + +struct amdxdna_sched_job { + struct drm_sched_job base; + struct kref refcnt; + struct amdxdna_hwctx *hwctx; + struct mm_struct *mm; + /* The fence to notice DRM scheduler that job is done by hardware */ + struct dma_fence *fence; + /* user can wait on this fence */ + struct dma_fence *out_fence; + u64 seq; + struct amdxdna_gem_obj *cmd_bo; + size_t bo_cnt; + struct drm_gem_object *bos[] __counted_by(bo_cnt); +}; + +static inline u32 +amdxdna_cmd_get_op(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header); +} + +static inline void +amdxdna_cmd_set_state(struct amdxdna_gem_obj *abo, enum ert_cmd_state s) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + cmd->header &=3D ~AMDXDNA_CMD_STATE; + cmd->header |=3D FIELD_PREP(AMDXDNA_CMD_STATE, s); +} + +static inline enum ert_cmd_state +amdxdna_cmd_get_state(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd =3D abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_STATE, cmd->header); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size); +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo); + +static inline u32 amdxdna_hwctx_col_map(struct amdxdna_hwctx *hwctx) +{ + return GENMASK(hwctx->start_col + hwctx->num_col - 1, + hwctx->start_col); +} + +void amdxdna_job_put(struct amdxdna_sched_job *job); + void amdxdna_hwctx_remove_all(struct amdxdna_client *client); + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdls, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq); + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout); + int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, str= uct drm_file *filp); int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, st= ruct drm_file *filp); +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struc= t drm_file *filp); +int amdxdna_drm_wait_cmd_ioctl(struct drm_device *dev, void *data, struct = drm_file *filp); =20 #endif /* _AMDXDNA_CTX_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/am= dxdna_gem.c index 66373baa4600..091674a4bbfa 100644 --- a/drivers/accel/amdxdna/amdxdna_gem.c +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include =20 diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c b/drivers/accel= /amdxdna/amdxdna_mailbox_helper.c index 42b615394605..5139a9c96a91 100644 --- a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c @@ -3,10 +3,15 @@ * Copyright (C) 2024, Advanced Micro Devices, Inc. */ =20 +#include #include #include +#include +#include +#include #include =20 +#include "amdxdna_gem.h" #include "amdxdna_mailbox.h" #include "amdxdna_mailbox_helper.h" #include "amdxdna_pci_drv.h" diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdn= a/amdxdna_pci_drv.c index 47ea79d4a021..5c1e863825e0 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include =20 @@ -64,6 +65,7 @@ static int amdxdna_drm_open(struct drm_device *ddev, stru= ct drm_file *filp) goto unbind_sva; } mutex_init(&client->hwctx_lock); + init_srcu_struct(&client->hwctx_srcu); idr_init_base(&client->hwctx_idr, AMDXDNA_INVALID_CTX_HANDLE + 1); mutex_init(&client->mm_lock); =20 @@ -93,6 +95,7 @@ static void amdxdna_drm_close(struct drm_device *ddev, st= ruct drm_file *filp) XDNA_DBG(xdna, "closing pid %d", client->pid); =20 idr_destroy(&client->hwctx_idr); + cleanup_srcu_struct(&client->hwctx_srcu); mutex_destroy(&client->hwctx_lock); mutex_destroy(&client->mm_lock); if (client->dev_heap) @@ -133,6 +136,9 @@ static const struct drm_ioctl_desc amdxdna_drm_ioctls[]= =3D { DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_BO, amdxdna_drm_create_bo_ioctl, 0), DRM_IOCTL_DEF_DRV(AMDXDNA_GET_BO_INFO, amdxdna_drm_get_bo_info_ioctl, 0), DRM_IOCTL_DEF_DRV(AMDXDNA_SYNC_BO, amdxdna_drm_sync_bo_ioctl, 0), + /* Exectuion */ + DRM_IOCTL_DEF_DRV(AMDXDNA_EXEC_CMD, amdxdna_drm_submit_cmd_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_WAIT_CMD, amdxdna_drm_wait_cmd_ioctl, 0), }; =20 static const struct file_operations amdxdna_fops =3D { diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdn= a/amdxdna_pci_drv.h index 3dddde4ac12a..0324e73094b2 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.h +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -20,6 +20,7 @@ extern const struct drm_driver amdxdna_drm_drv; struct amdxdna_dev; struct amdxdna_gem_obj; struct amdxdna_hwctx; +struct amdxdna_sched_job; =20 /* * struct amdxdna_dev_ops - Device hardware operation callbacks @@ -31,6 +32,8 @@ struct amdxdna_dev_ops { void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, voi= d *buf, u32 size); void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq= ); + int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *= job, u64 *seq); + int (*cmd_wait)(struct amdxdna_hwctx *hwctx, u64 seq, u32 timeout); }; =20 /* @@ -88,6 +91,8 @@ struct amdxdna_client { struct list_head node; pid_t pid; struct mutex hwctx_lock; /* protect hwctx */ + /* do NOT wait this srcu when hwctx_lock is hold */ + struct srcu_struct hwctx_srcu; struct idr hwctx_idr; struct amdxdna_dev *xdna; struct drm_file *filp; diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/= amdxdna_sysfs.c index 668b94b92714..f27e4ee960a0 100644 --- a/drivers/accel/amdxdna/amdxdna_sysfs.c +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -3,9 +3,14 @@ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. */ =20 +#include #include +#include #include +#include +#include =20 +#include "amdxdna_gem.h" #include "amdxdna_pci_drv.h" =20 static ssize_t vbnv_show(struct device *dev, struct device_attribute *attr= , char *buf) diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1= _regs.c index 720aab0ed7c4..f00c50461b09 100644 --- a/drivers/accel/amdxdna/npu1_regs.c +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2= _regs.c index f3ea18bcf294..00cb381031d2 100644 --- a/drivers/accel/amdxdna/npu2_regs.c +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4= _regs.c index db61142f0d4e..b6dae9667cca 100644 --- a/drivers/accel/amdxdna/npu4_regs.c +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5= _regs.c index debf4e95b9bb..bed1baf8e160 100644 --- a/drivers/accel/amdxdna/npu5_regs.c +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -5,6 +5,7 @@ =20 #include #include +#include #include =20 #include "aie2_pci.h" diff --git a/include/trace/events/amdxdna.h b/include/trace/events/amdxdna.h index 33343d8f0622..c6cb2da7b706 100644 --- a/include/trace/events/amdxdna.h +++ b/include/trace/events/amdxdna.h @@ -9,8 +9,49 @@ #if !defined(_TRACE_AMDXDNA_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_AMDXDNA_H =20 +#include #include =20 +TRACE_EVENT(amdxdna_debug_point, + TP_PROTO(const char *name, u64 number, const char *str), + + TP_ARGS(name, number, str), + + TP_STRUCT__entry(__string(name, name) + __field(u64, number) + __string(str, str)), + + TP_fast_assign(__assign_str(name); + __entry->number =3D number; + __assign_str(str);), + + TP_printk("%s:%llu %s", __get_str(name), __entry->number, + __get_str(str)) +); + +TRACE_EVENT(xdna_job, + TP_PROTO(struct drm_sched_job *sched_job, const char *name, const cha= r *str, u64 seq), + + TP_ARGS(sched_job, name, str, seq), + + TP_STRUCT__entry(__string(name, name) + __string(str, str) + __field(u64, fence_context) + __field(u64, fence_seqno) + __field(u64, seq)), + + TP_fast_assign(__assign_str(name); + __assign_str(str); + __entry->fence_context =3D sched_job->s_fence->finished.context; + __entry->fence_seqno =3D sched_job->s_fence->finished.seqno; + __entry->seq =3D seq;), + + TP_printk("fence=3D(context:%llu, seqno:%lld), %s seq#:%lld %s", + __entry->fence_context, __entry->fence_seqno, + __get_str(name), __entry->seq, + __get_str(str)) +); + DECLARE_EVENT_CLASS(xdna_mbox_msg, TP_PROTO(char *name, u8 chann_id, u32 opcode, u32 msg_id), =20 diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_ac= cel.h index 3792750834b2..08f3ec7146ab 100644 --- a/include/uapi/drm/amdxdna_accel.h +++ b/include/uapi/drm/amdxdna_accel.h @@ -13,6 +13,7 @@ extern "C" { #endif =20 +#define AMDXDNA_INVALID_CMD_HANDLE (~0UL) #define AMDXDNA_INVALID_ADDR (~0UL) #define AMDXDNA_INVALID_CTX_HANDLE 0 #define AMDXDNA_INVALID_BO_HANDLE 0 @@ -29,6 +30,8 @@ enum amdxdna_drm_ioctl_id { DRM_AMDXDNA_CREATE_BO, DRM_AMDXDNA_GET_BO_INFO, DRM_AMDXDNA_SYNC_BO, + DRM_AMDXDNA_EXEC_CMD, + DRM_AMDXDNA_WAIT_CMD, }; =20 /** @@ -201,6 +204,54 @@ struct amdxdna_drm_sync_bo { __u64 size; }; =20 +enum amdxdna_cmd_type { + AMDXDNA_CMD_SUBMIT_EXEC_BUF =3D 0, + AMDXDNA_CMD_SUBMIT_DEPENDENCY, + AMDXDNA_CMD_SUBMIT_SIGNAL, +}; + +/** + * struct amdxdna_drm_exec_cmd - Execute command. + * @ext: MBZ. + * @ext_flags: MBZ. + * @hwctx: Hardware context handle. + * @type: One of command type in enum amdxdna_cmd_type. + * @cmd_handles: Array of command handles or the command handle itself + * in case of just one. + * @args: Array of arguments for all command handles. + * @cmd_count: Number of command handles in the cmd_handles array. + * @arg_count: Number of arguments in the args array. + * @seq: Returned sequence number for this command. + */ +struct amdxdna_drm_exec_cmd { + __u64 ext; + __u64 ext_flags; + __u32 hwctx; + __u32 type; + __u64 cmd_handles; + __u64 args; + __u32 cmd_count; + __u32 arg_count; + __u64 seq; +}; + +/** + * struct amdxdna_drm_wait_cmd - Wait exectuion command. + * + * @hwctx: hardware context handle. + * @timeout: timeout in ms, 0 implies infinite wait. + * @seq: sequence number of the command returned by execute command. + * + * Wait a command specified by seq to be completed. + * Using AMDXDNA_INVALID_CMD_HANDLE as seq means wait till there is a free= slot + * to submit a new command. + */ +struct amdxdna_drm_wait_cmd { + __u32 hwctx; + __u32 timeout; + __u64 seq; +}; + #define DRM_IOCTL_AMDXDNA_CREATE_HWCTX \ DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_CREATE_HWCTX, \ struct amdxdna_drm_create_hwctx) @@ -225,6 +276,14 @@ struct amdxdna_drm_sync_bo { DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_SYNC_BO, \ struct amdxdna_drm_sync_bo) =20 +#define DRM_IOCTL_AMDXDNA_EXEC_CMD \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_EXEC_CMD, \ + struct amdxdna_drm_exec_cmd) + +#define DRM_IOCTL_AMDXDNA_WAIT_CMD \ + DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDXDNA_WAIT_CMD, \ + struct amdxdna_drm_wait_cmd) + #if defined(__cplusplus) } /* extern c end */ #endif --=20 2.34.1