From nobody Fri Dec 19 04:19:28 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 245E933062; Thu, 5 Jun 2025 18:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749146983; cv=none; b=hRqmPVAN5K/RwlQoTxIhnb/ZibPL7Yo2vg8+5HW6Q9WFuwqcChh4znP/DBQvLVpVLBbUJvv09YGy6fhJcCoEUBNPGJn6WjoyySa0AQf/i4HgGyaGsrbPVkIgiKDMutwGqpzL8MhjlyS4OF+XJ0LMQMX/zUsQVBsuYbN+tpM2rW0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749146983; c=relaxed/simple; bh=Rp2VDa72MCyKBoxXJqaWDSMumT5wb8LDONRr5pQZTqc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=UmtQX3dNRZPWOzxpLZ9FuTLDwbluocWaLCJYtduqgGepNxc6o1pAlZXdBoLRPxQkLsDwIXOopscWas8jwXiqo7xKJqEVaZsNWEutOhHpBS8m61lelWcSK2dQVl7OrIYhrtLB95zmh2jyZLqs4GdhFVbnBhiUuQ75OKqG8/tw05w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=h/1gYSKT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h/1gYSKT" Received: by smtp.kernel.org (Postfix) with ESMTPS id EE71DC4CEF0; Thu, 5 Jun 2025 18:09:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1749146983; bh=Rp2VDa72MCyKBoxXJqaWDSMumT5wb8LDONRr5pQZTqc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=h/1gYSKT8EpyqssVMG4psXkPPj6CLurGETBSJ5OGMCzvlSI4CcpfTs2xZOTrKM6RU l45Ejquii55LDJPI1yN6Lx/LLQo29Al1WNlMVYWz7mmW7TbMv4JF3ldN5/KuKQvrlV J8rPPBVTx6bdi5sZQK+BPkhYbZzK537EITiclBNWxeW+o3MBg0Ylsz96PQSGCKpXgN zcT1YcdqtIGxkQ8YulMiErs8tMKsWjcfEelgXXs7ZEf3VnJbHieSZ/qeNUUItrhmnM qbS0S+b0rIAO4trDVIfChYkj+iVSStMKQ6FR2aJUlQXrTqFf8SkCdIuT6/3JS/F1tV 8FMeRGolEY3sw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8DE5C5B552; Thu, 5 Jun 2025 18:09:42 +0000 (UTC) From: Jake Hillion via B4 Relay Date: Thu, 05 Jun 2025 19:09:26 +0100 Subject: [PATCH v2 1/2] x86/platform/amd: move final timeout check to after final sleep Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250605-amd-hsmp-v2-1-a811bc3dd74a@hillion.co.uk> References: <20250605-amd-hsmp-v2-0-a811bc3dd74a@hillion.co.uk> In-Reply-To: <20250605-amd-hsmp-v2-0-a811bc3dd74a@hillion.co.uk> To: Naveen Krishna Chatradhi , Carlos Bilbao , Hans de Goede , =?utf-8?q?Ilpo_J=C3=A4rvinen?= Cc: platform-driver-x86@vger.kernel.org, linux-kernel@vger.kernel.org, sched-ext@meta.com, Jake Hillion , Blaise Sanouillet , Suma Hegde X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1749146981; l=1850; i=jake@hillion.co.uk; s=20250530; h=from:subject:message-id; bh=DcrUhCb55n8ASrZIC4qqjZnNnLTtVJQObbV09RavYQs=; b=QHsNbh+TvHcLUDzYRlqrQXjev8/VDGsMktMbzSUpCFE2NN8u+GBR9hb0BMT4zBOYj2Yg5N6Fx vBH3Hn04eEND0TSP3MCWZ8656DBJwaxeUwYxevnfHx6GHvQEQaMqzp2 X-Developer-Key: i=jake@hillion.co.uk; a=ed25519; pk=8cznmqtMcMEcU8QH55k8DrySboD889OBB/BEUMJh3dw= X-Endpoint-Received: by B4 Relay for jake@hillion.co.uk/20250530 with auth_id=419 X-Original-From: Jake Hillion Reply-To: jake@hillion.co.uk From: Jake Hillion __hsmp_send_message sleeps between result read attempts and has a timeout of 100ms. Under extreme load it's possible for these sleeps to take a long time, exceeding the 100ms. In this case the current code does not check the register and fails with ETIMEDOUT. Refactor the loop to ensure there is at least one read of the register after a sleep of any duration. This removes instances of ETIMEDOUT with a single caller, even with a misbehaving scheduler. Tested on AMD Bergamo machines. Suggested-by: Blaise Sanouillet Reviewed-by: Suma Hegde Tested-by: Suma Hegde Signed-off-by: Jake Hillion --- drivers/platform/x86/amd/hsmp/hsmp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/am= d/hsmp/hsmp.c index e262e8a97b4542a389e09a82dad71f7d2e8b2449..f35c639457ac425e79dead2515c= 0eddea0759323 100644 --- a/drivers/platform/x86/amd/hsmp/hsmp.c +++ b/drivers/platform/x86/amd/hsmp/hsmp.c @@ -99,7 +99,7 @@ static int __hsmp_send_message(struct hsmp_socket *sock, = struct hsmp_message *ms short_sleep =3D jiffies + msecs_to_jiffies(HSMP_SHORT_SLEEP); timeout =3D jiffies + msecs_to_jiffies(HSMP_MSG_TIMEOUT); =20 - while (time_before(jiffies, timeout)) { + while (true) { ret =3D sock->amd_hsmp_rdwr(sock, mbinfo->msg_resp_off, &mbox_status, HS= MP_RD); if (ret) { dev_err(sock->dev, "Error %d reading mailbox status\n", ret); @@ -108,6 +108,10 @@ static int __hsmp_send_message(struct hsmp_socket *soc= k, struct hsmp_message *ms =20 if (mbox_status !=3D HSMP_STATUS_NOT_READY) break; + + if (!time_before(jiffies, timeout)) + break; + if (time_before(jiffies, short_sleep)) usleep_range(50, 100); else --=20 2.47.2 From nobody Fri Dec 19 04:19:28 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C047278773; Thu, 5 Jun 2025 18:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749146983; cv=none; b=P5+nf+Nr4qHHumYykBx3wWsX0bmOf6MuQiexazlPfpH/W+YWebfqyGYEiJWSJdQk5RNFJwDL0IJBsMYvKINQMw3T0NG6xJ9xhc5d6YJ5sWagD59LeKDT6UM58E6iilmat7cPhq4ufVOD9+AnOtir6PLxViBMdypLewY3PVchJWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749146983; c=relaxed/simple; bh=dnpNnbKCVL2gci2y0adQ9yPu5V+GAIsm4hL4ztLtBlQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mom+pWrKZEC8WSarHAprcN1pTGsGi/I0ej1ITALUw63ig5C4rTegfUL8tS+Crx+2QE6MZZWUYjkJaTCelsjoa3i4eobbexM5ipdv/qD5+fzhvVZZ7ZMl+L4WzQcGJdJhFWTmuSM8/3iCnz8UhX5CyNSDv/9In8l4jRWANdaVo1I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qRc/y2Gn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qRc/y2Gn" Received: by smtp.kernel.org (Postfix) with ESMTPS id 0AFB8C4CEEB; Thu, 5 Jun 2025 18:09:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1749146983; bh=dnpNnbKCVL2gci2y0adQ9yPu5V+GAIsm4hL4ztLtBlQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=qRc/y2GnaBrjOGWD6VQgZkB9uaepY/WPZzJ0sOJxmrv0ybhN0ROqyTn/sj0ddfuV/ LycQI2O6Od0n89ZsuJWdcRXRavn4UbLc0zovSTh2QvjV5DjEBcZpiSVBOLi+WqeTc7 OzSWt3MqKrNP2/MxQRqpF+gR6am+9Cb8tcRoLQ4Y8EG6nBMZhcEaVTETNTguLJ2Ilf ykPKeDV8Pkk6yehW6WY7zZg+q/i9thgbJrV5EFath8hCUrfAY65aEpuQISElCQ+XCP LATuHSUUGy2npt3RBWOpdVrP0cdIbNsdI4rTFs4gMTZcU3oEaCAQYvlO5FZaqdgTBU Jf6BsQvl9Ekng== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF4E1C61CE7; Thu, 5 Jun 2025 18:09:42 +0000 (UTC) From: Jake Hillion via B4 Relay Date: Thu, 05 Jun 2025 19:09:27 +0100 Subject: [PATCH v2 2/2] x86/platform/amd: replace down_timeout with down_interruptible Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250605-amd-hsmp-v2-2-a811bc3dd74a@hillion.co.uk> References: <20250605-amd-hsmp-v2-0-a811bc3dd74a@hillion.co.uk> In-Reply-To: <20250605-amd-hsmp-v2-0-a811bc3dd74a@hillion.co.uk> To: Naveen Krishna Chatradhi , Carlos Bilbao , Hans de Goede , =?utf-8?q?Ilpo_J=C3=A4rvinen?= Cc: platform-driver-x86@vger.kernel.org, linux-kernel@vger.kernel.org, sched-ext@meta.com, Jake Hillion , Suma Hegde X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1749146981; l=1781; i=jake@hillion.co.uk; s=20250530; h=from:subject:message-id; bh=BYAzdfHYrA5yva9uiI+ppxClpMwD9wLzjRUsigvAH2U=; b=2kCxMz7JbRGpl/vAOojbeOUSTrmddSQMKB3C7jJl191JIm6tNZXphXGRbOuSo3vRYQ8ks2k2a /HgXDOzRthWDeVmqC9h/1xHqZ/Bd6npLihO8FO0Nh+MumO4TY9XAKpr X-Developer-Key: i=jake@hillion.co.uk; a=ed25519; pk=8cznmqtMcMEcU8QH55k8DrySboD889OBB/BEUMJh3dw= X-Endpoint-Received: by B4 Relay for jake@hillion.co.uk/20250530 with auth_id=419 X-Original-From: Jake Hillion Reply-To: jake@hillion.co.uk From: Jake Hillion Currently hsmp_send_message uses down_timeout with a 100ms timeout to take the semaphore. However __hsmp_send_message, the content of the critical section, has a sleep in it. On systems with significantly delayed scheduling behaviour this may take over 100ms. Convert this method to down_interruptible. Leave the error handling the same as the documentation currently is not specific about what error is returned. Previous behaviour: a caller who competes with another caller stuck in the critical section due to scheduler delays would receive -ETIME. New behaviour: a caller who competes with another caller stuck in the critical section due to scheduler delays will complete successfully. Reviewed-by: Suma Hegde Tested-by: Suma Hegde Signed-off-by: Jake Hillion --- drivers/platform/x86/amd/hsmp/hsmp.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/am= d/hsmp/hsmp.c index f35c639457ac425e79dead2515c0eddea0759323..6c30bb3edc1d77939b10047b771= a5c574e5f2a1e 100644 --- a/drivers/platform/x86/amd/hsmp/hsmp.c +++ b/drivers/platform/x86/amd/hsmp/hsmp.c @@ -216,13 +216,7 @@ int hsmp_send_message(struct hsmp_message *msg) return -ENODEV; sock =3D &hsmp_pdev.sock[msg->sock_ind]; =20 - /* - * The time taken by smu operation to complete is between - * 10us to 1ms. Sometime it may take more time. - * In SMP system timeout of 100 millisecs should - * be enough for the previous thread to finish the operation - */ - ret =3D down_timeout(&sock->hsmp_sem, msecs_to_jiffies(HSMP_MSG_TIMEOUT)); + ret =3D down_interruptible(&sock->hsmp_sem); if (ret < 0) return ret; =20 --=20 2.47.2