From nobody Mon Jun 15 02:48:15 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF8E93A1CE6 for ; Tue, 7 Apr 2026 17:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775584304; cv=none; b=ddYsnIu6ZO49QV4l93VOOGNm67HM8GuXpTzkBMYfrOrcbqMj8HF1MBhciNFwpFj6FjYZZul7zUuAEnFXAXbiZwF21MiZhpu/9MVBBeQ16Mjs+Sypr/suujuxak1l9k4F4nYlcZVQ3TWr0hhIbK+jh3MO8BSpasyngM3zm7D58mo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775584304; c=relaxed/simple; bh=/qRoqUVQqab/1Zqf1oeH5uiG0CWHcYi/NOvtmpDkuc0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CMlBwKqm3BUNBeq6aZazO8yJMYWZBmJAddp52QXt7r8SMPnhhQgsg8+yDOuJV2ZUaIfJNGrLWbEhWXGNbhaHxaI/Zsh49s6OKkZRXnbG8PViOOBr+zDJhRQE649/9ZCq1jDrEEhVDYZE9pfocEuPNitXgdL0QLJ54pfXDwWzFAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TWt6meEp; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TWt6meEp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775584299; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f+twoVwdNXfHf13CedAqdMJEXuMqs/2V95Av5J0U1Bg=; b=TWt6meEpPpxXcTkyal0Eg0TODH420ceU832dlBEhkr42v1d7DFdx4HI4VFeGSR7+Mw6EKT A2IPaYEVpIBSSDF1jLL6xEPVFwFxwXyQazx3i8ls79frFMZGgo8+2fAzEQxk1kbEiIARlh CpdTOp+q8yKjrViyqXB+OZVvlY0pRv8= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-4ErzDG65Ob2CKKvEEJLIHw-1; Tue, 07 Apr 2026 13:51:37 -0400 X-MC-Unique: 4ErzDG65Ob2CKKvEEJLIHw-1 X-Mimecast-MFC-AGG-ID: 4ErzDG65Ob2CKKvEEJLIHw_1775584297 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E15CF1956059; Tue, 7 Apr 2026 17:51:36 +0000 (UTC) Received: from hp-dl380pgen9-07.khw.eng.rdu2.dc.redhat.com (hp-dl380pgen9-07.khw.eng.rdu2.dc.redhat.com [10.6.10.143]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 50F55300019F; Tue, 7 Apr 2026 17:51:36 +0000 (UTC) From: Tony Camuso To: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org Cc: minyard@acm.org, tcamuso@redhat.com Subject: [PATCH 1/2] ipmi:watchdog: Reboot cleanly on BMC reset Date: Tue, 7 Apr 2026 13:51:33 -0400 Message-ID: <20260407175134.3367345-2-tcamuso@redhat.com> In-Reply-To: <20260407175134.3367345-1-tcamuso@redhat.com> References: <20260407175134.3367345-1-tcamuso@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" When the BMC resets while the IPMI watchdog is active, three problems can occur: 1. The static smi_msg and recv_msg structures remain queued in the IPMI layer after a response timeout. If the watchdog daemon retries, the code reuses these structures while still on the IPMI layer's internal lists, causing: list_add double add: new=3Dffffffffc10063e0, prev=3Dffffffffc10063e0, = ... kernel BUG at lib/list_debug.c:29! 2. Both __ipmi_heartbeat() and _ipmi_set_timeout() use wait_for_completion() with no timeout, blocking indefinitely if the BMC is unresponsive, leaving tasks stuck in D state. 3. When the BMC loses the watchdog timer state, the driver's internal state becomes inconsistent, causing subsequent writes to /dev/watchdog to return -EINVAL, leaving the system without watchdog protection. Fix all three issues: - Add msg_in_flight atomic flag to prevent re-entry into __ipmi_heartbeat() and _ipmi_set_timeout() while message structures are still queued in the IPMI layer. - Convert wait_for_completion() to wait_for_completion_timeout() in both functions to prevent indefinite blocking. - Add reinit_completion() before each use to prevent stale completion events from allowing premature wakeup. - Detect BMC communication failure in ipmi_wdog_msg_handler() via non-zero completion codes and initiate orderly_reboot() when the watchdog is active. This ensures the system reboots cleanly rather than being left without watchdog protection. Error classification distinguishes TIMER_NOT_INIT (0x80), vendor-specific codes (0x81-0xBE), and standard IPMI completion codes. - Guard all BMC communication paths (_ipmi_set_timeout, __ipmi_heartbeat, wdog_reboot_handler) with bmc_reset_shutdown flag to prevent further IPMI operations during shutdown. Signed-off-by: Tony Camuso --- drivers/char/ipmi/ipmi_watchdog.c | 101 ++++++++++++++++++++++++------ 1 file changed, 83 insertions(+), 18 deletions(-) diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_wat= chdog.c index a013ddbf1466..1d8277cbe598 100644 --- a/drivers/char/ipmi/ipmi_watchdog.c +++ b/drivers/char/ipmi/ipmi_watchdog.c @@ -123,6 +123,16 @@ =20 #define IPMI_WDOG_TIMER_NOT_INIT_RESP 0x80 =20 +/* Timeout for waiting for a heartbeat response (in jiffies). */ +#define IPMI_HEARTBEAT_WAIT_TIMEOUT (HZ * 5) + +/* + * Set when the BMC becomes unreachable while the watchdog is active. + * Once set, all BMC communication is skipped and an orderly reboot + * is in progress. + */ +static bool bmc_reset_shutdown; + static DEFINE_MUTEX(ipmi_watchdog_mutex); static bool nowayout =3D WATCHDOG_NOWAYOUT; =20 @@ -339,12 +349,14 @@ static int __ipmi_heartbeat(void); * and freed when both the send and receive messages are free. */ static atomic_t msg_tofree =3D ATOMIC_INIT(0); +static atomic_t msg_in_flight =3D ATOMIC_INIT(0); static DECLARE_COMPLETION(msg_wait); static void msg_free_smi(struct ipmi_smi_msg *msg) { if (atomic_dec_and_test(&msg_tofree)) { if (!oops_in_progress) complete(&msg_wait); + atomic_set(&msg_in_flight, 0); } } static void msg_free_recv(struct ipmi_recv_msg *msg) @@ -352,6 +364,7 @@ static void msg_free_recv(struct ipmi_recv_msg *msg) if (atomic_dec_and_test(&msg_tofree)) { if (!oops_in_progress) complete(&msg_wait); + atomic_set(&msg_in_flight, 0); } } static struct ipmi_smi_msg smi_msg =3D INIT_IPMI_SMI_MSG(msg_free_smi); @@ -429,19 +442,34 @@ static int _ipmi_set_timeout(int do_heartbeat) { int send_heartbeat_now; int rv; + unsigned long ret; =20 if (!watchdog_user) return -ENODEV; =20 + if (bmc_reset_shutdown) + return -ENODEV; + + if (atomic_read(&msg_in_flight)) + return -EBUSY; + + reinit_completion(&msg_wait); + atomic_set(&msg_in_flight, 1); atomic_set(&msg_tofree, 2); =20 rv =3D __ipmi_set_timeout(&smi_msg, &recv_msg, &send_heartbeat_now); if (rv) { atomic_set(&msg_tofree, 0); + atomic_set(&msg_in_flight, 0); return rv; } =20 - wait_for_completion(&msg_wait); + ret =3D wait_for_completion_timeout(&msg_wait, + IPMI_HEARTBEAT_WAIT_TIMEOUT); + if (ret =3D=3D 0) { + atomic_set(&msg_tofree, 0); + return -ETIMEDOUT; + } =20 if ((do_heartbeat =3D=3D IPMI_SET_TIMEOUT_FORCE_HB) || ((send_heartbeat_now) @@ -510,10 +538,17 @@ static int __ipmi_heartbeat(void) { struct kernel_ipmi_msg msg; int rv; + unsigned long ret; struct ipmi_system_interface_addr addr; int timeout_retries =3D 0; =20 restart: + if (bmc_reset_shutdown) + return -ENODEV; + + if (atomic_read(&msg_in_flight)) + return -EBUSY; + /* * Don't reset the timer if we have the timer turned off, that * re-enables the watchdog. @@ -521,6 +556,8 @@ static int __ipmi_heartbeat(void) if (ipmi_watchdog_state =3D=3D WDOG_TIMEOUT_NONE) return 0; =20 + reinit_completion(&msg_wait); + atomic_set(&msg_in_flight, 1); atomic_set(&msg_tofree, 2); =20 addr.addr_type =3D IPMI_SYSTEM_INTERFACE_ADDR_TYPE; @@ -541,14 +578,17 @@ static int __ipmi_heartbeat(void) 1); if (rv) { atomic_set(&msg_tofree, 0); - pr_warn("heartbeat send failure: %d\n", rv); + atomic_set(&msg_in_flight, 0); return rv; } =20 - /* Wait for the heartbeat to be sent. */ - wait_for_completion(&msg_wait); + ret =3D wait_for_completion_timeout(&msg_wait, IPMI_HEARTBEAT_WAIT_TIMEOU= T); + if (ret =3D=3D 0) { + atomic_set(&msg_tofree, 0); + return -ETIMEDOUT; + } =20 - if (recv_msg.msg.data[0] =3D=3D IPMI_WDOG_TIMER_NOT_INIT_RESP) { + if (recv_msg.msg.data[0] >=3D 0x80) { timeout_retries++; if (timeout_retries > 3) { pr_err("Unable to restore the IPMI watchdog's settings, giving up\n"); @@ -557,12 +597,11 @@ static int __ipmi_heartbeat(void) } =20 /* - * The timer was not initialized, that means the BMC was - * probably reset and lost the watchdog information. Attempt - * to restore the timer's info. Note that we still hold - * the heartbeat lock, to keep a heartbeat from happening - * in this process, so must say no heartbeat to avoid a - * deadlock on this mutex + * The BMC was probably reset and lost the watchdog + * information. Attempt to restore the timer's info. + * Note that we still hold the heartbeat lock, to keep + * a heartbeat from happening in this process, so must + * say no heartbeat to avoid a deadlock on this mutex. */ rv =3D _ipmi_set_timeout(IPMI_SET_TIMEOUT_NO_HB); if (rv) { @@ -876,15 +915,38 @@ static struct miscdevice ipmi_wdog_miscdev =3D { static void ipmi_wdog_msg_handler(struct ipmi_recv_msg *msg, void *handler_data) { - if (msg->msg.cmd =3D=3D IPMI_WDOG_RESET_TIMER && - msg->msg.data[0] =3D=3D IPMI_WDOG_TIMER_NOT_INIT_RESP) - pr_info("response: The IPMI controller appears to have been reset, will = attempt to reinitialize the watchdog timer\n"); - else if (msg->msg.data[0] !=3D 0) - pr_err("response: Error %x on cmd %x\n", - msg->msg.data[0], - msg->msg.cmd); + if (msg->msg.data[0] !=3D 0) { + if (msg->msg.data[0] =3D=3D IPMI_WDOG_TIMER_NOT_INIT_RESP) + pr_crit("BMC error: watchdog timer not initialized " + "(0x%02x on cmd 0x%02x)\n", + msg->msg.data[0], msg->msg.cmd); + else if (msg->msg.data[0] > 0x80 && + msg->msg.data[0] <=3D 0xBE) + pr_crit("BMC error: vendor-specific completion code " + "0x%02x on cmd 0x%02x\n", + msg->msg.data[0], msg->msg.cmd); + else + pr_crit("BMC error: completion code 0x%02x " + "on cmd 0x%02x\n", + msg->msg.data[0], msg->msg.cmd); + + if (ipmi_watchdog_state !=3D WDOG_TIMEOUT_NONE && + !bmc_reset_shutdown) { + bmc_reset_shutdown =3D true; + pr_crit("BMC communication lost with watchdog active, " + "initiating system reboot\n"); + orderly_reboot(); + } + } =20 ipmi_free_recv_msg(msg); + /* + * Ensure the in-flight flag is cleared after the message is freed. + * In the normal path this is redundant (already cleared by the + * recv_msg destructor). For late responses arriving after a + * completion timeout, this is the only path that clears the flag. + */ + atomic_set(&msg_in_flight, 0); } =20 static void ipmi_wdog_pretimeout_handler(void *handler_data) @@ -1106,6 +1168,9 @@ static int wdog_reboot_handler(struct notifier_block = *this, /* Make sure we only do this once. */ reboot_event_handled =3D 1; =20 + if (bmc_reset_shutdown) + return NOTIFY_OK; + if (code =3D=3D SYS_POWER_OFF || code =3D=3D SYS_HALT) { /* Disable the WDT if we are shutting down. */ ipmi_watchdog_state =3D WDOG_TIMEOUT_NONE; --=20 2.53.0 From nobody Mon Jun 15 02:48:15 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E8C63B27CC for ; Tue, 7 Apr 2026 17:51:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775584304; cv=none; b=IW1kVbz2qGPKiw4j2U944unqRBlsvjq51Zqnvc2g4VoxSY/nCKYnMZopkDpHyc+nfe0Am3QQvKvDnFr23CnlVRlo1s1otRSJzi+vN/QfWPKqyzRVTtofEnFpUyq1e/qxwWXl0K9OfYnsrouqNNCpC5Y1dqfmiPqSVYl7zG37F+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775584304; c=relaxed/simple; bh=rCRd9xh2E6AYMTXvbWlnPHIZt8POyzE6O3D3gY1rbFs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LgLcs1DeyRVcdUTrLHhGv9i7+hX4rbAhQ0jdTDoKStKVfeI03IIBHDvmoKXKTRSLMZnjD9b9f3TXz5mU5Cp294MQKGhnXrMEJKZLAlT8jDFaSvMIdLYBn8b7VUv3UiTCeUG8uW+3QroJzGKrSXbOXuDSUWMe3XLTM8cbE2UK204= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gmByzjuD; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gmByzjuD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775584299; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5ASjBu3QvMlzkMJwSBe8z5P/+CHNBllSdP3WKPgRQhY=; b=gmByzjuDEmOhqdbvcNvfBTSiiq7u9185x+EFlE+eQUuo2Y5+rK4tEumLpZDD5UgQ06BKep fAzHJg0C3nRo8CnzaGx7AZMB6cKqTRVBn6pYv7aIGQeAICAQJc6nUmZBxgSIyZiwnhR06h knoMsp00w3dQq1W9o2R1EFQl0Ax9HBY= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-582-KSZ3tVLoOqaMA4JWU3LfDA-1; Tue, 07 Apr 2026 13:51:38 -0400 X-MC-Unique: KSZ3tVLoOqaMA4JWU3LfDA-1 X-Mimecast-MFC-AGG-ID: KSZ3tVLoOqaMA4JWU3LfDA_1775584297 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9E0021956058; Tue, 7 Apr 2026 17:51:37 +0000 (UTC) Received: from hp-dl380pgen9-07.khw.eng.rdu2.dc.redhat.com (hp-dl380pgen9-07.khw.eng.rdu2.dc.redhat.com [10.6.10.143]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 14661300019F; Tue, 7 Apr 2026 17:51:36 +0000 (UTC) From: Tony Camuso To: openipmi-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org Cc: minyard@acm.org, tcamuso@redhat.com Subject: [PATCH 2/2] Documentation: ipmi: Update BMC reset behavior for watchdog Date: Tue, 7 Apr 2026 13:51:34 -0400 Message-ID: <20260407175134.3367345-3-tcamuso@redhat.com> In-Reply-To: <20260407175134.3367345-1-tcamuso@redhat.com> References: <20260407175134.3367345-1-tcamuso@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" Update the IPMI watchdog BMC reset documentation to describe the current behavior: when the BMC resets while the watchdog is active, the driver detects the communication failure and initiates an orderly system reboot rather than attempting to retry and recover. Document the panic and hang prevention mechanisms, BMC failure detection via completion code classification in the message handler, the bmc_reset_shutdown guard that prevents further IPMI operations during shutdown, and the late response handling for the msg_in_flight flag. Signed-off-by: Tony Camuso --- Documentation/driver-api/ipmi.rst | 61 +++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/Documentation/driver-api/ipmi.rst b/Documentation/driver-api/i= pmi.rst index f52ab2df2569..dbdc1440d16e 100644 --- a/Documentation/driver-api/ipmi.rst +++ b/Documentation/driver-api/ipmi.rst @@ -734,6 +734,67 @@ device to close it, or the timer will not stop. This = is a new semantic for the driver, but makes it consistent with the rest of the watchdog drivers in Linux. =20 +BMC Reset Behavior +------------------ + +When the BMC (Baseboard Management Controller) resets while the IPMI +watchdog is active, the hardware watchdog timer state on the BMC is +lost. The driver detects this condition and initiates a clean system +reboot rather than leaving the system running without watchdog +protection. + +The driver handles BMC resets as follows: + +1. **Panic prevention:** The static message structures (``smi_msg`` and + ``recv_msg``) are guarded by an ``msg_in_flight`` atomic flag. If a + previous message is still queued in the IPMI layer, new operations + return ``-EBUSY`` instead of reusing the structures (which would cause + a ``list_add`` corruption BUG). + +2. **Hang prevention:** ``wait_for_completion_timeout()`` with a 5-second + timeout replaces the indefinite ``wait_for_completion()`` in both + ``__ipmi_heartbeat()`` and ``_ipmi_set_timeout()``. This prevents + tasks from blocking in D state when the BMC is unresponsive. + +3. **BMC failure detection:** When ``ipmi_wdog_msg_handler()`` receives + a non-zero completion code while the watchdog is active, it sets the + ``bmc_reset_shutdown`` flag and calls ``orderly_reboot()``. Error + classification distinguishes three categories: + + - ``TIMER_NOT_INIT`` (0x80): the BMC lost the watchdog timer state. + - Vendor-specific codes (0x81-0xBE): BMC-specific error responses. + - Standard IPMI completion codes (0xC0+): general BMC errors. + + All produce a critical-level log message:: + + IPMI Watchdog: BMC error: watchdog timer not initialized (0x80 on cmd= 0x22) + IPMI Watchdog: BMC communication lost with watchdog active, initiatin= g system reboot + +4. **Clean shutdown:** Once ``bmc_reset_shutdown`` is set, all BMC + communication paths (``_ipmi_set_timeout()``, ``__ipmi_heartbeat()``, + ``wdog_reboot_handler()``) return immediately without attempting + further IPMI operations. This prevents panics, stack traces, and + hangs during the reboot sequence. + +5. **Late response handling:** The ``msg_in_flight`` flag is cleared in + ``ipmi_wdog_msg_handler()`` after the message is freed. This handles + late responses arriving after a completion timeout, ensuring the flag + does not remain set permanently. + +The system reboot after a BMC reset is the expected and correct +behavior. The hardware watchdog timer lives on the BMC, and when +that timer state is lost, the system must be restarted to restore +watchdog protection. + +Administrators performing supervised BMC maintenance (firmware updates, +manual resets) should disarm the watchdog before the operation:: + + systemctl stop watchdog + +And restart it after the BMC has fully recovered:: + + systemctl start watchdog + =20 Panic Timeouts -------------- --=20 2.53.0