[PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code

Lukas Wunner posted 1 patch 3 weeks ago
arch/x86/pci/fixup.c     | 19 +++++++++++++++++++
drivers/pci/pci-driver.c | 19 -------------------
2 files changed, 19 insertions(+), 19 deletions(-)
[PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Lukas Wunner 3 weeks ago
In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
clearing the Command register if the suspended device is a USB EHCI host
controller.

Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
computers") subsequently amended pci_pm_poweroff_noirq() to do the same.

Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
quirk pass") introduced the ability to execute arbitrary quirks
specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().

This allows moving the ASUS workaround out of generic code and into a
proper quirk to improve maintainability and readability.  Constrain to x86
since the ASUS BIOS doesn't seem to have been used on other arches.

lspci output of affected EHCI host controllers reveals that the only bits
set in the Command register are Memory Space Enable and Bus Master Enable:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778

The latter is cleared by:
  hcd_pci_suspend()
    suspend_common()
      pci_disable_device()

pci_disable_device() does not clear I/O and Memory Space Enable, although
its name suggests otherwise.  The kernel has never disabled these bits
once they're enabled.  Doing so would avoid the need for the quirk, but it
is unclear what will break if this fundamental behavior is changed.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 arch/x86/pci/fixup.c     | 19 +++++++++++++++++++
 drivers/pci/pci-driver.c | 19 -------------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index e7e71490bd25..c34ff72434f2 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -1041,3 +1041,22 @@ static void quirk_tuxeo_rp_d3(struct pci_dev *pdev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1502, quirk_tuxeo_rp_d3);
 #endif /* CONFIG_SUSPEND */
+
+#ifdef CONFIG_PM_SLEEP
+/*
+ * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's Command
+ * register is not 0 on suspend, the BIOS assumes that the controller has not
+ * been quiesced and tries to turn it off.  If the controller is already in D3,
+ * this can hang or cause memory corruption.
+ *
+ * Since the value of the Command register does not matter once the device has
+ * been suspended, it can safely be set to 0.
+ */
+static void quirk_clear_command_reg(struct pci_dev *pdev)
+{
+	pci_write_config_word(pdev, PCI_COMMAND, 0);
+}
+DECLARE_PCI_FIXUP_CLASS_SUSPEND_LATE(PCI_ANY_ID, PCI_ANY_ID,
+				     PCI_CLASS_SERIAL_USB_EHCI, 0,
+				     quirk_clear_command_reg);
+#endif /* CONFIG_PM_SLEEP */
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 63665240ae87..e1089dfeb419 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -914,18 +914,6 @@ static int pci_pm_suspend_noirq(struct device *dev)
 
 	pci_pm_set_unknown_state(pci_dev);
 
-	/*
-	 * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's
-	 * PCI COMMAND register isn't 0, the BIOS assumes that the controller
-	 * hasn't been quiesced and tries to turn it off.  If the controller
-	 * is already in D3, this can hang or cause memory corruption.
-	 *
-	 * Since the value of the COMMAND register doesn't matter once the
-	 * device has been suspended, we can safely set it to 0 here.
-	 */
-	if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		pci_write_config_word(pci_dev, PCI_COMMAND, 0);
-
 Fixup:
 	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
@@ -1205,13 +1193,6 @@ static int pci_pm_poweroff_noirq(struct device *dev)
 	if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev))
 		pci_prepare_to_sleep(pci_dev);
 
-	/*
-	 * The reason for doing this here is the same as for the analogous code
-	 * in pci_pm_suspend_noirq().
-	 */
-	if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-		pci_write_config_word(pci_dev, PCI_COMMAND, 0);
-
 	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
 	return 0;
-- 
2.51.0
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Mario Limonciello 3 weeks ago
On 9/11/25 8:11 AM, Lukas Wunner wrote:
> In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
> computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
> clearing the Command register if the suspended device is a USB EHCI host
> controller.
> 
> Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
> computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
> 
> Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
> quirk pass") introduced the ability to execute arbitrary quirks
> specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
> 
> This allows moving the ASUS workaround out of generic code and into a
> proper quirk to improve maintainability and readability.  Constrain to x86
> since the ASUS BIOS doesn't seem to have been used on other arches.
> 
> lspci output of affected EHCI host controllers reveals that the only bits
> set in the Command register are Memory Space Enable and Bus Master Enable:
>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
> 
> The latter is cleared by:
>    hcd_pci_suspend()
>      suspend_common()
>        pci_disable_device()
> 
> pci_disable_device() does not clear I/O and Memory Space Enable, although
> its name suggests otherwise.  

That was my gut reaction as well.

> The kernel has never disabled these bits
> once they're enabled.  Doing so would avoid the need for the quirk, but it
> is unclear what will break if this fundamental behavior is changed.
> 

It's too late for this cycle to do so, but how would you feel about 
making this change at the start of the next cycle so it had a whole 
cycle to bake in linux-next and see if there is a problem in doing so?
If there is it could certainly be moved back to a quirk.

> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>> ---
>   arch/x86/pci/fixup.c     | 19 +++++++++++++++++++
>   drivers/pci/pci-driver.c | 19 -------------------
>   2 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index e7e71490bd25..c34ff72434f2 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -1041,3 +1041,22 @@ static void quirk_tuxeo_rp_d3(struct pci_dev *pdev)
>   }
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1502, quirk_tuxeo_rp_d3);
>   #endif /* CONFIG_SUSPEND */
> +
> +#ifdef CONFIG_PM_SLEEP
> +/*
> + * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's Command
> + * register is not 0 on suspend, the BIOS assumes that the controller has not
> + * been quiesced and tries to turn it off.  If the controller is already in D3,
> + * this can hang or cause memory corruption.
> + *
> + * Since the value of the Command register does not matter once the device has
> + * been suspended, it can safely be set to 0.
> + */
> +static void quirk_clear_command_reg(struct pci_dev *pdev)
> +{
> +	pci_write_config_word(pdev, PCI_COMMAND, 0);
> +}
> +DECLARE_PCI_FIXUP_CLASS_SUSPEND_LATE(PCI_ANY_ID, PCI_ANY_ID,
> +				     PCI_CLASS_SERIAL_USB_EHCI, 0,
> +				     quirk_clear_command_reg);
> +#endif /* CONFIG_PM_SLEEP */
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 63665240ae87..e1089dfeb419 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -914,18 +914,6 @@ static int pci_pm_suspend_noirq(struct device *dev)
>   
>   	pci_pm_set_unknown_state(pci_dev);
>   
> -	/*
> -	 * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's
> -	 * PCI COMMAND register isn't 0, the BIOS assumes that the controller
> -	 * hasn't been quiesced and tries to turn it off.  If the controller
> -	 * is already in D3, this can hang or cause memory corruption.
> -	 *
> -	 * Since the value of the COMMAND register doesn't matter once the
> -	 * device has been suspended, we can safely set it to 0 here.
> -	 */
> -	if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		pci_write_config_word(pci_dev, PCI_COMMAND, 0);
> -
>   Fixup:
>   	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>   
> @@ -1205,13 +1193,6 @@ static int pci_pm_poweroff_noirq(struct device *dev)
>   	if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev))
>   		pci_prepare_to_sleep(pci_dev);
>   
> -	/*
> -	 * The reason for doing this here is the same as for the analogous code
> -	 * in pci_pm_suspend_noirq().
> -	 */
> -	if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -		pci_write_config_word(pci_dev, PCI_COMMAND, 0);
> -
>   	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>   
>   	return 0;
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Lukas Wunner 2 weeks, 6 days ago
On Thu, Sep 11, 2025 at 08:34:56AM -0500, Mario Limonciello wrote:
> On 9/11/25 8:11 AM, Lukas Wunner wrote:
> > pci_disable_device() does not clear I/O and Memory Space Enable, although
> > its name suggests otherwise.  The kernel has never disabled these bits
> > once they're enabled.  Doing so would avoid the need for the quirk, but it
> > is unclear what will break if this fundamental behavior is changed.
> 
> It's too late for this cycle to do so, but how would you feel about making
> this change at the start of the next cycle so it had a whole cycle to bake
> in linux-next and see if there is a problem in doing so?

I can look into it.

The change could be justified as a security enhancement to prevent
unauthorized traffic between devices through peer-to-peer transactions.

pci_disable_device() was introduced with v2.4.3.5 in 2002:
https://git.kernel.org/tglx/history/c/9102e0eb3e9e

I suspect back in the day, clearing Bus Master Enable seemed sufficient
because the only concern was to prevent DMA (and by extension MSIs)
from broken devices.  Attacks *between* devices were probably not
considered realistic.

ACS is meant to prevent such attacks, but is an optional capability
and might be configured incorrectly.  A zero trust, defense in depth
approach as is common today requires not leaving doors open without need.

If the kernel would clear Memory Space Enable, a malicious device could
not re-enable it on its own because "propagation of Configuration Requests
from Downstream to Upstream as well as peer-to-peer are not supported"
(PCIe r7.0 sec 7.3.3).

It seemed too risky to make such a sweeping change only to get rid of
the EHCI quirk.  The present patch is meant as a low-risk refactoring,
but we can consider clearing IO + Memory Space Enable as a long-term
solution.  I've cc'ed all the people who reported issues with ASUS
machines back in 2012 in the hope that some of them still have the
(now 13 years old) hardware to test the patch.  They might also be
able to test whether the long-term change doesn't regress anything.

Thanks,

Lukas
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Rafael J. Wysocki 2 weeks, 6 days ago
On Fri, Sep 12, 2025 at 9:26 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Thu, Sep 11, 2025 at 08:34:56AM -0500, Mario Limonciello wrote:
> > On 9/11/25 8:11 AM, Lukas Wunner wrote:
> > > pci_disable_device() does not clear I/O and Memory Space Enable, although
> > > its name suggests otherwise.  The kernel has never disabled these bits
> > > once they're enabled.  Doing so would avoid the need for the quirk, but it
> > > is unclear what will break if this fundamental behavior is changed.
> >
> > It's too late for this cycle to do so, but how would you feel about making
> > this change at the start of the next cycle so it had a whole cycle to bake
> > in linux-next and see if there is a problem in doing so?
>
> I can look into it.
>
> The change could be justified as a security enhancement to prevent
> unauthorized traffic between devices through peer-to-peer transactions.
>
> pci_disable_device() was introduced with v2.4.3.5 in 2002:
> https://git.kernel.org/tglx/history/c/9102e0eb3e9e
>
> I suspect back in the day, clearing Bus Master Enable seemed sufficient
> because the only concern was to prevent DMA (and by extension MSIs)
> from broken devices.  Attacks *between* devices were probably not
> considered realistic.
>
> ACS is meant to prevent such attacks, but is an optional capability
> and might be configured incorrectly.  A zero trust, defense in depth
> approach as is common today requires not leaving doors open without need.
>
> If the kernel would clear Memory Space Enable, a malicious device could
> not re-enable it on its own because "propagation of Configuration Requests
> from Downstream to Upstream as well as peer-to-peer are not supported"
> (PCIe r7.0 sec 7.3.3).
>
> It seemed too risky to make such a sweeping change only to get rid of
> the EHCI quirk.  The present patch is meant as a low-risk refactoring,
> but we can consider clearing IO + Memory Space Enable as a long-term
> solution.

Yes, we can.

Obviously, the additional reason for doing it, which appears to be
more significant than avoiding the EHCI quirk alone, can be regarded
as sufficient justification IMV.
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Rafael J. Wysocki 3 weeks ago
On Thu, Sep 11, 2025 at 3:34 PM Mario Limonciello <superm1@kernel.org> wrote:
>
> On 9/11/25 8:11 AM, Lukas Wunner wrote:
> > In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
> > computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
> > clearing the Command register if the suspended device is a USB EHCI host
> > controller.
> >
> > Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
> > computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
> >
> > Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
> > quirk pass") introduced the ability to execute arbitrary quirks
> > specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
> >
> > This allows moving the ASUS workaround out of generic code and into a
> > proper quirk to improve maintainability and readability.  Constrain to x86
> > since the ASUS BIOS doesn't seem to have been used on other arches.
> >
> > lspci output of affected EHCI host controllers reveals that the only bits
> > set in the Command register are Memory Space Enable and Bus Master Enable:
> >    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
> >
> > The latter is cleared by:
> >    hcd_pci_suspend()
> >      suspend_common()
> >        pci_disable_device()
> >
> > pci_disable_device() does not clear I/O and Memory Space Enable, although
> > its name suggests otherwise.
>
> That was my gut reaction as well.
>
> > The kernel has never disabled these bits
> > once they're enabled.  Doing so would avoid the need for the quirk, but it
> > is unclear what will break if this fundamental behavior is changed.
> >
>
> It's too late for this cycle to do so, but how would you feel about
> making this change at the start of the next cycle so it had a whole
> cycle to bake in linux-next and see if there is a problem in doing so?

One cycle in linux-next may not be sufficient I'm afraid because
linux-next is not tested on the majority of systems running Linux.

We'd probably learn about the breakage from distro vendors.

> If there is it could certainly be moved back to a quirk.

Most likely, it would work on the majority of systems, but there would
be a tail of systems where it would break.  That tail would then need
to be quirked somehow and it may be worse than just one quirk we have
today.
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Mario Limonciello 3 weeks ago
On 9/11/25 8:43 AM, Rafael J. Wysocki wrote:
> On Thu, Sep 11, 2025 at 3:34 PM Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 9/11/25 8:11 AM, Lukas Wunner wrote:
>>> In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
>>> computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
>>> clearing the Command register if the suspended device is a USB EHCI host
>>> controller.
>>>
>>> Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
>>> computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
>>>
>>> Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
>>> quirk pass") introduced the ability to execute arbitrary quirks
>>> specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
>>>
>>> This allows moving the ASUS workaround out of generic code and into a
>>> proper quirk to improve maintainability and readability.  Constrain to x86
>>> since the ASUS BIOS doesn't seem to have been used on other arches.
>>>
>>> lspci output of affected EHCI host controllers reveals that the only bits
>>> set in the Command register are Memory Space Enable and Bus Master Enable:
>>>     https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
>>>
>>> The latter is cleared by:
>>>     hcd_pci_suspend()
>>>       suspend_common()
>>>         pci_disable_device()
>>>
>>> pci_disable_device() does not clear I/O and Memory Space Enable, although
>>> its name suggests otherwise.
>>
>> That was my gut reaction as well.
>>
>>> The kernel has never disabled these bits
>>> once they're enabled.  Doing so would avoid the need for the quirk, but it
>>> is unclear what will break if this fundamental behavior is changed.
>>>
>>
>> It's too late for this cycle to do so, but how would you feel about
>> making this change at the start of the next cycle so it had a whole
>> cycle to bake in linux-next and see if there is a problem in doing so?
> 
> One cycle in linux-next may not be sufficient I'm afraid because
> linux-next is not tested on the majority of systems running Linux.
> 
> We'd probably learn about the breakage from distro vendors.
> 
>> If there is it could certainly be moved back to a quirk.
> 
> Most likely, it would work on the majority of systems, but there would
> be a tail of systems where it would break.  That tail would then need
> to be quirked somehow and it may be worse than just one quirk we have
> today.

But is that a reason not to *try* and rid the tech debt?

We could just all agree that *if* there is breakage we revert back to 
the quirk just for EHCI.
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Rafael J. Wysocki 3 weeks ago
On Thu, Sep 11, 2025 at 3:46 PM Mario Limonciello <superm1@kernel.org> wrote:
>
> On 9/11/25 8:43 AM, Rafael J. Wysocki wrote:
> > On Thu, Sep 11, 2025 at 3:34 PM Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 9/11/25 8:11 AM, Lukas Wunner wrote:
> >>> In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
> >>> computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
> >>> clearing the Command register if the suspended device is a USB EHCI host
> >>> controller.
> >>>
> >>> Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
> >>> computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
> >>>
> >>> Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
> >>> quirk pass") introduced the ability to execute arbitrary quirks
> >>> specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
> >>>
> >>> This allows moving the ASUS workaround out of generic code and into a
> >>> proper quirk to improve maintainability and readability.  Constrain to x86
> >>> since the ASUS BIOS doesn't seem to have been used on other arches.
> >>>
> >>> lspci output of affected EHCI host controllers reveals that the only bits
> >>> set in the Command register are Memory Space Enable and Bus Master Enable:
> >>>     https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
> >>>
> >>> The latter is cleared by:
> >>>     hcd_pci_suspend()
> >>>       suspend_common()
> >>>         pci_disable_device()
> >>>
> >>> pci_disable_device() does not clear I/O and Memory Space Enable, although
> >>> its name suggests otherwise.
> >>
> >> That was my gut reaction as well.
> >>
> >>> The kernel has never disabled these bits
> >>> once they're enabled.  Doing so would avoid the need for the quirk, but it
> >>> is unclear what will break if this fundamental behavior is changed.
> >>>
> >>
> >> It's too late for this cycle to do so, but how would you feel about
> >> making this change at the start of the next cycle so it had a whole
> >> cycle to bake in linux-next and see if there is a problem in doing so?
> >
> > One cycle in linux-next may not be sufficient I'm afraid because
> > linux-next is not tested on the majority of systems running Linux.
> >
> > We'd probably learn about the breakage from distro vendors.
> >
> >> If there is it could certainly be moved back to a quirk.
> >
> > Most likely, it would work on the majority of systems, but there would
> > be a tail of systems where it would break.  That tail would then need
> > to be quirked somehow and it may be worse than just one quirk we have
> > today.
>
> But is that a reason not to *try* and rid the tech debt?
>
> We could just all agree that *if* there is breakage we revert back to
> the quirk just for EHCI.

Well, it's not that simple because how much time do you want to wait?

The distro installed on the system I'm using right now ships with a
6.4-based kernel, so it potentially sees and may report breakage
introduced into the mainline 2 years ago.

Will you decide to go back to the EHCI quirk if breakage is reported 2
years after dropping it?

IMV, if a decision is made to change the pci_disable_device() behavior
in this respect, we'll need to stick to it unless the breakage is
common and overwhelming (which I don't really expect to be the case).
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Mario Limonciello 3 weeks ago
On 9/11/25 8:56 AM, Rafael J. Wysocki wrote:
> On Thu, Sep 11, 2025 at 3:46 PM Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 9/11/25 8:43 AM, Rafael J. Wysocki wrote:
>>> On Thu, Sep 11, 2025 at 3:34 PM Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> On 9/11/25 8:11 AM, Lukas Wunner wrote:
>>>>> In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
>>>>> computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
>>>>> clearing the Command register if the suspended device is a USB EHCI host
>>>>> controller.
>>>>>
>>>>> Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
>>>>> computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
>>>>>
>>>>> Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
>>>>> quirk pass") introduced the ability to execute arbitrary quirks
>>>>> specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
>>>>>
>>>>> This allows moving the ASUS workaround out of generic code and into a
>>>>> proper quirk to improve maintainability and readability.  Constrain to x86
>>>>> since the ASUS BIOS doesn't seem to have been used on other arches.
>>>>>
>>>>> lspci output of affected EHCI host controllers reveals that the only bits
>>>>> set in the Command register are Memory Space Enable and Bus Master Enable:
>>>>>      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
>>>>>
>>>>> The latter is cleared by:
>>>>>      hcd_pci_suspend()
>>>>>        suspend_common()
>>>>>          pci_disable_device()
>>>>>
>>>>> pci_disable_device() does not clear I/O and Memory Space Enable, although
>>>>> its name suggests otherwise.
>>>>
>>>> That was my gut reaction as well.
>>>>
>>>>> The kernel has never disabled these bits
>>>>> once they're enabled.  Doing so would avoid the need for the quirk, but it
>>>>> is unclear what will break if this fundamental behavior is changed.
>>>>>
>>>>
>>>> It's too late for this cycle to do so, but how would you feel about
>>>> making this change at the start of the next cycle so it had a whole
>>>> cycle to bake in linux-next and see if there is a problem in doing so?
>>>
>>> One cycle in linux-next may not be sufficient I'm afraid because
>>> linux-next is not tested on the majority of systems running Linux.
>>>
>>> We'd probably learn about the breakage from distro vendors.
>>>
>>>> If there is it could certainly be moved back to a quirk.
>>>
>>> Most likely, it would work on the majority of systems, but there would
>>> be a tail of systems where it would break.  That tail would then need
>>> to be quirked somehow and it may be worse than just one quirk we have
>>> today.
>>
>> But is that a reason not to *try* and rid the tech debt?
>>
>> We could just all agree that *if* there is breakage we revert back to
>> the quirk just for EHCI.
> 
> Well, it's not that simple because how much time do you want to wait?
> 
> The distro installed on the system I'm using right now ships with a
> 6.4-based kernel, so it potentially sees and may report breakage
> introduced into the mainline 2 years ago.
> 
> Will you decide to go back to the EHCI quirk if breakage is reported 2
> years after dropping it?
> 
> IMV, if a decision is made to change the pci_disable_device() behavior
> in this respect, we'll need to stick to it unless the breakage is
> common and overwhelming (which I don't really expect to be the case).

Thanks, I see your point.  We would only know soon if there was major 
breakage from it and thus it is not an easy call that it was successful 
or not.

Bjorn, what are your thoughts here?
Re: [PATCH] PCI/PM: Move ASUS EHCI workaround out of generic code
Posted by Rafael J. Wysocki 3 weeks ago
On Thu, Sep 11, 2025 at 3:12 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> In 2012, commit dbf0e4c7257f ("PCI: EHCI: fix crash during suspend on ASUS
> computers") amended pci_pm_suspend_noirq() to work around a BIOS issue by
> clearing the Command register if the suspended device is a USB EHCI host
> controller.
>
> Commit 0b68c8e2c3af ("PCI: EHCI: Fix crash during hibernation on ASUS
> computers") subsequently amended pci_pm_poweroff_noirq() to do the same.
>
> Two years later, commit 7d2a01b87f16 ("PCI: Add pci_fixup_suspend_late
> quirk pass") introduced the ability to execute arbitrary quirks
> specifically in pci_pm_suspend_noirq() and pci_pm_poweroff_noirq().
>
> This allows moving the ASUS workaround out of generic code and into a
> proper quirk to improve maintainability and readability.  Constrain to x86
> since the ASUS BIOS doesn't seem to have been used on other arches.
>
> lspci output of affected EHCI host controllers reveals that the only bits
> set in the Command register are Memory Space Enable and Bus Master Enable:
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658778
>
> The latter is cleared by:
>   hcd_pci_suspend()
>     suspend_common()
>       pci_disable_device()
>
> pci_disable_device() does not clear I/O and Memory Space Enable, although
> its name suggests otherwise.  The kernel has never disabled these bits
> once they're enabled.  Doing so would avoid the need for the quirk, but it
> is unclear what will break if this fundamental behavior is changed.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
>  arch/x86/pci/fixup.c     | 19 +++++++++++++++++++
>  drivers/pci/pci-driver.c | 19 -------------------
>  2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index e7e71490bd25..c34ff72434f2 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -1041,3 +1041,22 @@ static void quirk_tuxeo_rp_d3(struct pci_dev *pdev)
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1502, quirk_tuxeo_rp_d3);
>  #endif /* CONFIG_SUSPEND */
> +
> +#ifdef CONFIG_PM_SLEEP
> +/*
> + * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's Command
> + * register is not 0 on suspend, the BIOS assumes that the controller has not
> + * been quiesced and tries to turn it off.  If the controller is already in D3,
> + * this can hang or cause memory corruption.
> + *
> + * Since the value of the Command register does not matter once the device has
> + * been suspended, it can safely be set to 0.
> + */
> +static void quirk_clear_command_reg(struct pci_dev *pdev)
> +{
> +       pci_write_config_word(pdev, PCI_COMMAND, 0);
> +}
> +DECLARE_PCI_FIXUP_CLASS_SUSPEND_LATE(PCI_ANY_ID, PCI_ANY_ID,
> +                                    PCI_CLASS_SERIAL_USB_EHCI, 0,
> +                                    quirk_clear_command_reg);
> +#endif /* CONFIG_PM_SLEEP */
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 63665240ae87..e1089dfeb419 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -914,18 +914,6 @@ static int pci_pm_suspend_noirq(struct device *dev)
>
>         pci_pm_set_unknown_state(pci_dev);
>
> -       /*
> -        * Some BIOSes from ASUS have a bug: If a USB EHCI host controller's
> -        * PCI COMMAND register isn't 0, the BIOS assumes that the controller
> -        * hasn't been quiesced and tries to turn it off.  If the controller
> -        * is already in D3, this can hang or cause memory corruption.
> -        *
> -        * Since the value of the COMMAND register doesn't matter once the
> -        * device has been suspended, we can safely set it to 0 here.
> -        */
> -       if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -               pci_write_config_word(pci_dev, PCI_COMMAND, 0);
> -
>  Fixup:
>         pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>
> @@ -1205,13 +1193,6 @@ static int pci_pm_poweroff_noirq(struct device *dev)
>         if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev))
>                 pci_prepare_to_sleep(pci_dev);
>
> -       /*
> -        * The reason for doing this here is the same as for the analogous code
> -        * in pci_pm_suspend_noirq().
> -        */
> -       if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
> -               pci_write_config_word(pci_dev, PCI_COMMAND, 0);
> -
>         pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>
>         return 0;
> --
> 2.51.0
>