drivers/net/wireless/ath/ath12k/mhi.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-)
Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
system suspend/resume support but caused a critical regression where
CMA pages are corrupted during resume.
1. CMA page corruption:
Calling mhi_unprepare_after_power_down() during suspend (via
ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
DMA buffers. When these pages are accessed during resume, the kernel
detects corruption (Bad page state).
To fix this corruption, the driver must skip ATH12K_MHI_DEINIT during
suspend, preserving the DMA buffers. However, implementing this fix
exposes a second issue in the state machine:
2. Resume failure due to MHI state mismatch:
When DEINIT is skipped during suspend to protect the memory, the
ATH12K_MHI_INIT bit remains set. On resume, ath12k_mhi_start()
blindly attempts to set INIT again, but the state machine rejects
the transition:
ath12k_wifi7_pci ...: failed to set mhi state INIT(0) in current
mhi state (0x1)
Fix the corruption and enable the correct suspend flow by:
1. In ath12k_mhi_stop(), skipping ATH12K_MHI_DEINIT if suspending.
This prevents the memory corruption by keeping the device context
valid (MHI_POWER_OFF_KEEP_DEV).
2. In ath12k_mhi_start(), checking if MHI_INIT is already set.
This accommodates the new suspend flow where the device remains
initialized, allowing the driver to proceed directly to POWER_ON.
Tested with suspend/resume cycles on Qualcomm Snapdragon X Elite
(SC8380XP) with WCN7850 WiFi. No CMA corruption observed, WiFi resumes
successfully, and deep sleep works correctly.
Fixes: 8d5f4da8d70b ("wifi: ath12k: support suspend/resume")
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302 (Lenovo Yoga Slim 7x)
Signed-off-by: Saikiran <bjsaikiran@gmail.com>
---
drivers/net/wireless/ath/ath12k/mhi.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/mhi.c b/drivers/net/wireless/ath/ath12k/mhi.c
index 45c0f66dcc5e..1a0b3bcc6bbf 100644
--- a/drivers/net/wireless/ath/ath12k/mhi.c
+++ b/drivers/net/wireless/ath/ath12k/mhi.c
@@ -485,9 +485,14 @@ int ath12k_mhi_start(struct ath12k_pci *ab_pci)
ab_pci->mhi_ctrl->timeout_ms = MHI_TIMEOUT_DEFAULT_MS;
- ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_INIT);
- if (ret)
- goto out;
+ /* In case of suspend/resume, MHI INIT is already done.
+ * So check if MHI INIT is set or not.
+ */
+ if (!test_bit(ATH12K_MHI_INIT, &ab_pci->mhi_state)) {
+ ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_INIT);
+ if (ret)
+ goto out;
+ }
ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_ON);
if (ret)
@@ -501,16 +506,21 @@ int ath12k_mhi_start(struct ath12k_pci *ab_pci)
void ath12k_mhi_stop(struct ath12k_pci *ab_pci, bool is_suspend)
{
- /* During suspend we need to use mhi_power_down_keep_dev()
- * workaround, otherwise ath12k_core_resume() will timeout
- * during resume.
+ /* During suspend, we need to use mhi_power_down_keep_dev()
+ * and avoid calling MHI_DEINIT. The deinit frees BHIE tables
+ * which causes memory corruption when those pages are
+ * accessed/freed again during resume. We want to keep the
+ * device prepared for resume, otherwise ath12k_core_resume()
+ * will timeout.
*/
if (is_suspend)
ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF_KEEP_DEV);
else
ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF);
- ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
+ /* Only deinit when doing full power down, not during suspend */
+ if (!is_suspend)
+ ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
}
void ath12k_mhi_suspend(struct ath12k_pci *ab_pci)
--
2.51.0
On 2/2/2026 11:17 PM, Saikiran wrote:
> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
> system suspend/resume support but caused a critical regression where
> CMA pages are corrupted during resume.
>
> 1. CMA page corruption:
> Calling mhi_unprepare_after_power_down() during suspend (via
> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
> DMA buffers. When these pages are accessed during resume, the kernel
> detects corruption (Bad page state).
How, FBC image and RDDM image get re-allocated at resume, no?
>
> To fix this corruption, the driver must skip ATH12K_MHI_DEINIT during
> suspend, preserving the DMA buffers. However, implementing this fix
> exposes a second issue in the state machine:
>
> 2. Resume failure due to MHI state mismatch:
> When DEINIT is skipped during suspend to protect the memory, the
> ATH12K_MHI_INIT bit remains set. On resume, ath12k_mhi_start()
> blindly attempts to set INIT again, but the state machine rejects
> the transition:
>
> ath12k_wifi7_pci ...: failed to set mhi state INIT(0) in current
> mhi state (0x1)
>
> Fix the corruption and enable the correct suspend flow by:
>
> 1. In ath12k_mhi_stop(), skipping ATH12K_MHI_DEINIT if suspending.
> This prevents the memory corruption by keeping the device context
> valid (MHI_POWER_OFF_KEEP_DEV).
>
> 2. In ath12k_mhi_start(), checking if MHI_INIT is already set.
> This accommodates the new suspend flow where the device remains
> initialized, allowing the driver to proceed directly to POWER_ON.
>
> Tested with suspend/resume cycles on Qualcomm Snapdragon X Elite
> (SC8380XP) with WCN7850 WiFi. No CMA corruption observed, WiFi resumes
> successfully, and deep sleep works correctly.
>
> Fixes: 8d5f4da8d70b ("wifi: ath12k: support suspend/resume")
> Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302 (Lenovo Yoga Slim 7x)
> Signed-off-by: Saikiran <bjsaikiran@gmail.com>
> ---
> drivers/net/wireless/ath/ath12k/mhi.c | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath12k/mhi.c b/drivers/net/wireless/ath/ath12k/mhi.c
> index 45c0f66dcc5e..1a0b3bcc6bbf 100644
> --- a/drivers/net/wireless/ath/ath12k/mhi.c
> +++ b/drivers/net/wireless/ath/ath12k/mhi.c
> @@ -485,9 +485,14 @@ int ath12k_mhi_start(struct ath12k_pci *ab_pci)
>
> ab_pci->mhi_ctrl->timeout_ms = MHI_TIMEOUT_DEFAULT_MS;
>
> - ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_INIT);
> - if (ret)
> - goto out;
> + /* In case of suspend/resume, MHI INIT is already done.
> + * So check if MHI INIT is set or not.
> + */
> + if (!test_bit(ATH12K_MHI_INIT, &ab_pci->mhi_state)) {
> + ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_INIT);
> + if (ret)
> + goto out;
> + }
>
> ret = ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_ON);
> if (ret)
> @@ -501,16 +506,21 @@ int ath12k_mhi_start(struct ath12k_pci *ab_pci)
>
> void ath12k_mhi_stop(struct ath12k_pci *ab_pci, bool is_suspend)
> {
> - /* During suspend we need to use mhi_power_down_keep_dev()
> - * workaround, otherwise ath12k_core_resume() will timeout
> - * during resume.
> + /* During suspend, we need to use mhi_power_down_keep_dev()
> + * and avoid calling MHI_DEINIT. The deinit frees BHIE tables
> + * which causes memory corruption when those pages are
> + * accessed/freed again during resume. We want to keep the
> + * device prepared for resume, otherwise ath12k_core_resume()
> + * will timeout.
> */
> if (is_suspend)
> ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF_KEEP_DEV);
> else
> ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF);
>
> - ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
> + /* Only deinit when doing full power down, not during suspend */
> + if (!is_suspend)
> + ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
> }
>
> void ath12k_mhi_suspend(struct ath12k_pci *ab_pci)
On 2/3/26 08:21, Baochen Qiang wrote:
>
> On 2/2/2026 11:17 PM, Saikiran wrote:
>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>> system suspend/resume support but caused a critical regression where
>> CMA pages are corrupted during resume.
>>
>> 1. CMA page corruption:
>> Calling mhi_unprepare_after_power_down() during suspend (via
>> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>> DMA buffers. When these pages are accessed during resume, the kernel
>> detects corruption (Bad page state).
> How, FBC image and RDDM image get re-allocated at resume, no?
>
> To clarify, the BUG: Bad page state crash actually occurs during the
> suspend phase, specifically when ath12k_mhi_stop() calls
> mhi_unprepare_after_power_down().
>
> The stack trace shows the panic happens inside mhi_free_bhie_table()
> while trying to free the pages:
>
> mhi_free_bhie_table+0x50/0xa0 [mhi]
> mhi_unprepare_after_power_down+0x30/0x70 [mhi]
> ath12k_mhi_stop+0xf8/0x210 [ath12k]
> ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>
> The kernel reports nonzero _refcount when attempting to free the CMA
> pages (fbc_image/rddm_image). This suggests that something is still
> holding a reference to these pages when DEINIT attempts to free them,
> causing the kernel to panic before we reach the resume stage.
>
> Since the pages cannot be safely freed during suspend, skipping DEINIT
> (and using MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation.
> This also aligns with the existing comment in ath12k_mhi_stop which
> suggests using mhi_power_down_keep_dev() for suspend.
>
> Thanks & Regards,
> Saikiran
On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
>
> On 2/3/26 08:21, Baochen Qiang wrote:
>>
>> On 2/2/2026 11:17 PM, Saikiran wrote:
>>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>>> system suspend/resume support but caused a critical regression where
>>> CMA pages are corrupted during resume.
>>>
>>> 1. CMA page corruption:
>>> Calling mhi_unprepare_after_power_down() during suspend (via
>>> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>>> DMA buffers. When these pages are accessed during resume, the kernel
>>> detects corruption (Bad page state).
>> How, FBC image and RDDM image get re-allocated at resume, no?
>>
>> To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
>> specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().
>>
>> The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
>> free the pages:
>>
>> mhi_free_bhie_table+0x50/0xa0 [mhi]
>> mhi_unprepare_after_power_down+0x30/0x70 [mhi]
>> ath12k_mhi_stop+0xf8/0x210 [ath12k]
>> ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>>
>> The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
>> rddm_image). This suggests that something is still holding a reference to these pages
>> when DEINIT attempts to free them, causing the kernel to panic before we reach the
>> resume stage.
this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
ath12k
>>
>> Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
>> MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
>> existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
>> suspend.
first of all, this is a workaround rather than fix. Ideally we should try to root cause
the issue and fix it in the right way.
Secondly the workaround here seems problematic: you skip INIT druing resume. However note
several hardware registers need to be re-programmed during this stage, how could the
target work if its power is cutoff during suspend and the register context is not restored
during resume?
>>
>> Thanks & Regards,
>> Saikiran
On 2/3/26 11:00, Baochen Qiang wrote:
>
> On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
>> On 2/3/26 08:21, Baochen Qiang wrote:
>>> On 2/2/2026 11:17 PM, Saikiran wrote:
>>>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>>>> system suspend/resume support but caused a critical regression where
>>>> CMA pages are corrupted during resume.
>>>>
>>>> 1. CMA page corruption:
>>>> Calling mhi_unprepare_after_power_down() during suspend (via
>>>> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>>>> DMA buffers. When these pages are accessed during resume, the kernel
>>>> detects corruption (Bad page state).
>>> How, FBC image and RDDM image get re-allocated at resume, no?
>>>
>>> To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
>>> specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().
>>>
>>> The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
>>> free the pages:
>>>
>>> mhi_free_bhie_table+0x50/0xa0 [mhi]
>>> mhi_unprepare_after_power_down+0x30/0x70 [mhi]
>>> ath12k_mhi_stop+0xf8/0x210 [ath12k]
>>> ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>>>
>>> The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
>>> rddm_image). This suggests that something is still holding a reference to these pages
>>> when DEINIT attempts to free them, causing the kernel to panic before we reach the
>>> resume stage.
> this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
> ath12k
>
>>> Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
>>> MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
>>> existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
>>> suspend.
> first of all, this is a workaround rather than fix. Ideally we should try to root cause
> the issue and fix it in the right way.
The original comment in existing code:
/* During suspend we need to use mhi_power_down_keep_dev()
* workaround, otherwise ath12k_core_resume() will timeout
* during resume.
*/
This patch aligns the code with this existing intent. The driver was
previously
calling DEINIT (and freeing resources) despite the comment advising to
use keep_dev.
If the intention of the driver authors was to use keep_dev for suspend,
then my understanding is DEINIT is incorrect here (Correct me if I am
wrong)
regardless of the underlying MM behavior.
>
> Secondly the workaround here seems problematic: you skip INIT druing resume. However note
> several hardware registers need to be re-programmed during this stage, how could the
> target work if its power is cutoff during suspend and the register context is not restored
> during resume?
In my testing, WiFi functionality was fully restored after resume.
The device associates and passes traffic immediately.
My understanding is that:
ATH12K_MHI_INIT primarily handles host memory allocation (which we
preserved by skipping DEINIT).
ATH12K_MHI_POWER_ON calls mhi_sync_power_up(). This function triggers
the MHI state machine,
which handles the necessary BHI/BHIE programming and firmware download
(SBL) sequence.
Since mhi_sync_power_up() is still called during resume, the target is
correctly re-initialized and
registers are programmed, even if we skip the redundant host memory
allocation step (INIT).
Thanks & Regards,
Saikiran
On 2/3/2026 1:51 PM, Jayasaikiran Banigallapati wrote:
>
> On 2/3/26 11:00, Baochen Qiang wrote:
>>
>> On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
>>> On 2/3/26 08:21, Baochen Qiang wrote:
>>>> On 2/2/2026 11:17 PM, Saikiran wrote:
>>>>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>>>>> system suspend/resume support but caused a critical regression where
>>>>> CMA pages are corrupted during resume.
>>>>>
>>>>> 1. CMA page corruption:
>>>>> Calling mhi_unprepare_after_power_down() during suspend (via
>>>>> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>>>>> DMA buffers. When these pages are accessed during resume, the kernel
>>>>> detects corruption (Bad page state).
>>>> How, FBC image and RDDM image get re-allocated at resume, no?
>>>>
>>>> To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
>>>> specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().
>>>>
>>>> The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
>>>> free the pages:
>>>>
>>>> mhi_free_bhie_table+0x50/0xa0 [mhi]
>>>> mhi_unprepare_after_power_down+0x30/0x70 [mhi]
>>>> ath12k_mhi_stop+0xf8/0x210 [ath12k]
>>>> ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>>>>
>>>> The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
>>>> rddm_image). This suggests that something is still holding a reference to these pages
>>>> when DEINIT attempts to free them, causing the kernel to panic before we reach the
>>>> resume stage.
>> this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
>> ath12k
>>
>>>> Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
>>>> MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
>>>> existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
>>>> suspend.
>> first of all, this is a workaround rather than fix. Ideally we should try to root cause
>> the issue and fix it in the right way.
>
>
> The original comment in existing code:
>
>
> /* During suspend we need to use mhi_power_down_keep_dev()
> * workaround, otherwise ath12k_core_resume() will timeout
> * during resume.
> */
>
> This patch aligns the code with this existing intent. The driver was previously
>
> calling DEINIT (and freeing resources) despite the comment advising to use keep_dev.
>
> If the intention of the driver authors was to use keep_dev for suspend,
>
> then my understanding is DEINIT is incorrect here (Correct me if I am wrong)
>
> regardless of the underlying MM behavior.
keep_dev means not to destroy the mhi_device instance while going to suspend. The purpose
is to get rid of the PROBE_DEFER problem in MHI during resume. You may want to check the
upstream discussion to learn about the history.
>
>>
>> Secondly the workaround here seems problematic: you skip INIT druing resume. However note
>> several hardware registers need to be re-programmed during this stage, how could the
>> target work if its power is cutoff during suspend and the register context is not restored
>> during resume?
>
>
> In my testing, WiFi functionality was fully restored after resume.
>
> The device associates and passes traffic immediately.
I can imagine two reasons: either WLAN target's power is not cutoff during suspend, or you
did not get into the issue scenario. For the latter, I mean you may need to trigger a
firmware crash to see if RDDM works normally, since you skip RDDM register context restore
during resume.
>
> My understanding is that:
>
> ATH12K_MHI_INIT primarily handles host memory allocation (which we preserved by skipping
> DEINIT).
In addition to memory allocation, there is also register programming. See
mhi_prepare_for_power_up() and mhi_rddm_prepare().
>
> ATH12K_MHI_POWER_ON calls mhi_sync_power_up(). This function triggers the MHI state machine,
>
> which handles the necessary BHI/BHIE programming and firmware download (SBL) sequence.
>
> Since mhi_sync_power_up() is still called during resume, the target is correctly re-
> initialized and
>
> registers are programmed, even if we skip the redundant host memory allocation step (INIT).
>
> Thanks & Regards,
> Saikiran
>
© 2016 - 2026 Red Hat, Inc.