[PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV

Jason Andryuk posted 1 patch 2 years, 6 months ago
Failed in applying to current master (apply log)
drivers/pci/msi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
[PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Jason Andryuk 2 years, 6 months ago
commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
functions") introduce functions pci_msi_update_mask() and
pci_msix_write_vector_ctrl() that is missing checks for
pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
new mask/unmask functions").  The checks are in place at the high level
__pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
directly to the helpers.

Push the pci_msi_ignore_mask check down to the functions that make
the actual writes.  This keeps the logic local to the writes that need
to be bypassed.

With Xen PV, the hypervisor is responsible for masking and unmasking the
interrupts, which pci_msi_ignore_mask is used to indicate.

This change avoids lockups in amdgpu drivers under Xen during boot.

Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
Reported-by: Josef Johansson <josef@oderland.se>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 drivers/pci/msi.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 4b4792940e86..478536bafc39 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s
 	raw_spinlock_t *lock = &desc->dev->msi_lock;
 	unsigned long flags;
 
+	if (pci_msi_ignore_mask)
+		return;
+
 	raw_spin_lock_irqsave(lock, flags);
 	desc->msi_mask &= ~clear;
 	desc->msi_mask |= set;
@@ -181,6 +184,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
 {
 	void __iomem *desc_addr = pci_msix_desc_addr(desc);
 
+	if (pci_msi_ignore_mask)
+		return;
+
 	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
@@ -200,7 +206,7 @@ static inline void pci_msix_unmask(struct msi_desc *desc)
 
 static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
+	if (desc->msi_attrib.is_virtual)
 		return;
 
 	if (desc->msi_attrib.is_msix)
@@ -211,7 +217,7 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 
 static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
+	if (desc->msi_attrib.is_virtual)
 		return;
 
 	if (desc->msi_attrib.is_msix)
-- 
2.30.2


Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Jason Andryuk 2 years, 6 months ago
On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
> functions") introduce functions pci_msi_update_mask() and
> pci_msix_write_vector_ctrl() that is missing checks for
> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
> new mask/unmask functions").  The checks are in place at the high level
> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
> directly to the helpers.
>
> Push the pci_msi_ignore_mask check down to the functions that make
> the actual writes.  This keeps the logic local to the writes that need
> to be bypassed.
>
> With Xen PV, the hypervisor is responsible for masking and unmasking the
> interrupts, which pci_msi_ignore_mask is used to indicate.
>
> This change avoids lockups in amdgpu drivers under Xen during boot.
>
> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
> Reported-by: Josef Johansson <josef@oderland.se>
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> ---

I should have written that this is untested.  If this is the desired
approach, Josef should test that it solves his boot hangs.

Regards,
Jason

Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Josef Johansson 2 years, 6 months ago
On 10/25/21 14:27, Jason Andryuk wrote:
> On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>> functions") introduce functions pci_msi_update_mask() and
>> pci_msix_write_vector_ctrl() that is missing checks for
>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
>> new mask/unmask functions").  The checks are in place at the high level
>> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
>> directly to the helpers.
>>
>> Push the pci_msi_ignore_mask check down to the functions that make
>> the actual writes.  This keeps the logic local to the writes that need
>> to be bypassed.
>>
>> With Xen PV, the hypervisor is responsible for masking and unmasking the
>> interrupts, which pci_msi_ignore_mask is used to indicate.
>>
>> This change avoids lockups in amdgpu drivers under Xen during boot.
>>
>> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
>> Reported-by: Josef Johansson <josef@oderland.se>
>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
>> ---
> I should have written that this is untested.  If this is the desired
> approach, Josef should test that it solves his boot hangs.
>
> Regards,
> Jason

I've tested this today, both the above patch, but also my own below
where I'm patching inside __pci_write_msi_msg,
which is the outcome of the patch above.

I found that both solves the boot hang, but both have severe effects
on suspends/resume later on. The below patch without patching
__pci_write_msi_msg seems to be ideal, solves boot problems but not
causing too much others. There seems to me that there's undocumented
dragons here. Doing more test later today.

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 4b4792940e86..e97eea1bc93a 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s
 	raw_spinlock_t *lock = &desc->dev->msi_lock;
 	unsigned long flags;
 
+	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
+		return;
+
 	raw_spin_lock_irqsave(lock, flags);
 	desc->msi_mask &= ~clear;
 	desc->msi_mask |= set;
@@ -186,6 +189,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
 
 static inline void pci_msix_mask(struct msi_desc *desc)
 {
+	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
+		return;
+
 	desc->msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
 	pci_msix_write_vector_ctrl(desc, desc->msix_ctrl);
 	/* Flush write to device */
@@ -194,15 +200,15 @@ static inline void pci_msix_mask(struct msi_desc *desc)
 
 static inline void pci_msix_unmask(struct msi_desc *desc)
 {
+	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
+		return;
+
 	desc->msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
 	pci_msix_write_vector_ctrl(desc, desc->msix_ctrl);
 }
 
 static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_mask(desc);
 	else if (desc->msi_attrib.maskbit)
@@ -211,9 +217,6 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 
 static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_unmask(desc);
 	else if (desc->msi_attrib.maskbit)
@@ -307,7 +310,7 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 		 * entry while the entry is unmasked, the result is
 		 * undefined."
 		 */
-		if (unmasked)
+		if (unmasked && !pci_msi_ignore_mask)
 			pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
 
 		writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
@@ -450,8 +453,9 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
 				PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
 	arch_restore_msi_irqs(dev);
-	for_each_pci_msi_entry(entry, dev)
-		pci_msix_write_vector_ctrl(entry, entry->msix_ctrl);
+	if (!(pci_msi_ignore_mask || entry->msi_attrib.is_virtual))
+		for_each_pci_msi_entry(entry, dev)
+			pci_msix_write_vector_ctrl(entry, entry->msix_ctrl);
 
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 }


Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Josef Johansson 2 years, 6 months ago
On 10/25/21 18:46, Josef Johansson wrote:
> On 10/25/21 14:27, Jason Andryuk wrote:
>> On Sun, Oct 24, 2021 at 9:26 PM Jason Andryuk <jandryuk@gmail.com> wrote:
>>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>>> functions") introduce functions pci_msi_update_mask() and
>>> pci_msix_write_vector_ctrl() that is missing checks for
>>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
>>> new mask/unmask functions").  The checks are in place at the high level
>>> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
>>> directly to the helpers.
>>>
>>> Push the pci_msi_ignore_mask check down to the functions that make
>>> the actual writes.  This keeps the logic local to the writes that need
>>> to be bypassed.
>>>
>>> With Xen PV, the hypervisor is responsible for masking and unmasking the
>>> interrupts, which pci_msi_ignore_mask is used to indicate.
>>>
>>> This change avoids lockups in amdgpu drivers under Xen during boot.
>>>
>>> Fixes: commit 446a98b19fd6 ("PCI/MSI: Use new mask/unmask functions")
>>> Reported-by: Josef Johansson <josef@oderland.se>
>>> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
>>> ---
>> I should have written that this is untested.  If this is the desired
>> approach, Josef should test that it solves his boot hangs.
>>
>> Regards,
>> Jason
> I've tested this today, both the above patch, but also my own below
> where I'm patching inside __pci_write_msi_msg,
> which is the outcome of the patch above.
I tested a lot of kernels today. To create a good baseline I compiled
without any of our patches here
and with my config flags set.

CONFIG_AMD_PMC=y
# CONFIG_HSA_AMD is not set
# CONFIG_CRYPTO_DEV_CCP is not set

The kernel stopped as before, and hung.

Test number 2 was to boot with amdgpu.msi=0.
This still resulted in a bad boot since all the xhcd drivers complained.
We can be sure that it's not amdgpu per se.

Test number 3 was with Jason's patch. It worked, but suspend/resume is
not working well.
Generally it's not behaving like other kernels do, which makes it
actually change the behavior.
Now with test 4 I tried that thought, maybe this is still a good change?
I'm deprived of a good baseline in all this, so it's very hard to
navigate between all the variables.

Test number 4 was with Jason's patch plus the amdgpu-patch below.
It worked, even suspend/resume, 2 times, but then it all crashed and
burn with quite interesting stacktraces. Are amdgpu doing it wrong here
or is it just me nitpicking?

index cc2e0c9cfe0a..f125597eb991 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -279,17 +279,8 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
 
 static void amdgpu_restore_msix(struct amdgpu_device *adev)
 {
-	u16 ctrl;
-
-	pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
-	if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
-		return;
-
-	/* VF FLR */
-	ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
-	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
-	ctrl |= PCI_MSIX_FLAGS_ENABLE;
-	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+	// checks if msix is enabled also
+	pci_restore_msi_state(adev->pdev);
 }
 
 /**

During the tests I fiddled with dpm settings, and it does have an effect
one the graphical output during suspend/resume. So maybe there's
hardware problems at play here as well.

I also looked through the code before and after Thomas' changes, and I
can't see that
this patch should make any functional difference compared to before the
MSI series of patches.
It's even such that is_virtual should be checked withing vector_ctrl. I
find Jason's patch
quite nice since it really places the checks on few places making it
easier not to slip.
Compared to my attempt that even failed because I forgot one more place
to put the checks.

With that said I would really like some more tests on this with
different chipsets, on Xen.
Any takers?

What I'm seeing is that there's no readl() in pci_msix_unmask(), it was
one in the code path before.
I'm very much unsure if there should be one there though.

We can really do a better job at the documentation for
pci_msi_ignore_mask, at least in msi.c,
maybe that should be a different patch adding some comments such that
driver folks really see
the benefits of using the built in restore functions e.g.

This became so much bigger project than I thought, thanks all for
chiming in on it.

Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Thomas Gleixner 2 years, 6 months ago
On Sun, Oct 24 2021 at 21:25, Jason Andryuk wrote:
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 4b4792940e86..478536bafc39 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s
>  	raw_spinlock_t *lock = &desc->dev->msi_lock;
>  	unsigned long flags;
>  
> +	if (pci_msi_ignore_mask)
> +		return;
> +
>  	raw_spin_lock_irqsave(lock, flags);
>  	desc->msi_mask &= ~clear;
>  	desc->msi_mask |= set;
> @@ -181,6 +184,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
>  {
>  	void __iomem *desc_addr = pci_msix_desc_addr(desc);
>  
> +	if (pci_msi_ignore_mask)
> +		return;
> +
>  	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>  }
>  
> @@ -200,7 +206,7 @@ static inline void pci_msix_unmask(struct msi_desc *desc)
>  
>  static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>  {
> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
> +	if (desc->msi_attrib.is_virtual)
>  		return;
>  
>  	if (desc->msi_attrib.is_msix)
> @@ -211,7 +217,7 @@ static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>  
>  static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
>  {
> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
> +	if (desc->msi_attrib.is_virtual)
>  		return;
>  
>  	if (desc->msi_attrib.is_msix)

No, really. This is horrible and incomplete. The right thing to do is to
move the check back into the low level accessors and remove it from the
call sites simply because the low level accessors can be reached not
only from the mask/unmask functions. But the above also fails to respect
msi_attrib.maskbit... I'll send out a proper fix in a few.

Thanks,

        tglx

[PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Thomas Gleixner 2 years, 6 months ago
The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
low level accessors into the higher level mask/unmask functions.

This missed the fact that these accessors can be invoked from other places
as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
also violates the virtual MSIX and the msi_attrib.maskbit protections.

Instead of sprinkling checks all over the place, lift them back into the
low level accessor functions. To avoid checking three different conditions
combine them into one property of msi_desc::msi_attrib.

Reported-by: Josef Johansson <josef@oderland.se>
Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Andryuk <jandryuk@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org
Cc: xen-devel <xen-devel@lists.xenproject.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
---
 drivers/pci/msi.c   |   26 ++++++++++++++------------
 include/linux/msi.h |    2 +-
 2 files changed, 15 insertions(+), 13 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask
 	raw_spinlock_t *lock = &desc->dev->msi_lock;
 	unsigned long flags;
 
+	if (!desc->msi_attrib.can_mask)
+		return;
+
 	raw_spin_lock_irqsave(lock, flags);
 	desc->msi_mask &= ~clear;
 	desc->msi_mask |= set;
@@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(s
 {
 	void __iomem *desc_addr = pci_msix_desc_addr(desc);
 
-	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	if (desc->msi_attrib.can_mask)
+		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
 static inline void pci_msix_mask(struct msi_desc *desc)
@@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struc
 
 static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_mask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_mask(desc, mask);
 }
 
 static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_unmask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_unmask(desc, mask);
 }
 
@@ -484,7 +482,8 @@ msi_setup_entry(struct pci_dev *dev, int
 	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
 	entry->msi_attrib.is_virtual    = 0;
 	entry->msi_attrib.entry_nr	= 0;
-	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
+	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+					  !!(control & PCI_MSI_FLAGS_MASKBIT);
 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
 	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
@@ -495,7 +494,7 @@ msi_setup_entry(struct pci_dev *dev, int
 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
 
 	/* Save the initial mask status */
-	if (entry->msi_attrib.maskbit)
+	if (entry->msi_attrib.can_mask)
 		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
 
 out:
@@ -638,10 +637,13 @@ static int msix_setup_entries(struct pci
 		entry->msi_attrib.is_virtual =
 			entry->msi_attrib.entry_nr >= vec_count;
 
+		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+						  !entry->msi_attrib.is_virtual;
+
 		entry->msi_attrib.default_irq	= dev->irq;
 		entry->mask_base		= base;
 
-		if (!entry->msi_attrib.is_virtual) {
+		if (!entry->msi_attrib.can_mask) {
 			addr = pci_msix_desc_addr(entry);
 			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 		}
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -148,7 +148,7 @@ struct msi_desc {
 				u8	is_msix		: 1;
 				u8	multiple	: 3;
 				u8	multi_cap	: 3;
-				u8	maskbit		: 1;
+				u8	can_mask	: 1;
 				u8	is_64		: 1;
 				u8	is_virtual	: 1;
 				u16	entry_nr;

Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 6 months ago
On 10/27/21 11:50, Thomas Gleixner wrote:
> The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
> low level accessors into the higher level mask/unmask functions.
>
> This missed the fact that these accessors can be invoked from other places
> as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
> also violates the virtual MSIX and the msi_attrib.maskbit protections.
>
> Instead of sprinkling checks all over the place, lift them back into the
> low level accessor functions. To avoid checking three different conditions
> combine them into one property of msi_desc::msi_attrib.
>
> Reported-by: Josef Johansson <josef@oderland.se>
> Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jason Andryuk <jandryuk@gmail.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: linux-pci@vger.kernel.org
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> ---
>  drivers/pci/msi.c   |   26 ++++++++++++++------------
>  include/linux/msi.h |    2 +-
>  2 files changed, 15 insertions(+), 13 deletions(-)
>
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask
>  	raw_spinlock_t *lock = &desc->dev->msi_lock;
>  	unsigned long flags;
>  
> +	if (!desc->msi_attrib.can_mask)
> +		return;
> +
>  	raw_spin_lock_irqsave(lock, flags);
>  	desc->msi_mask &= ~clear;
>  	desc->msi_mask |= set;
> @@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(s
>  {
>  	void __iomem *desc_addr = pci_msix_desc_addr(desc);
>  
> -	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> +	if (desc->msi_attrib.can_mask)
> +		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>  }
>  
>  static inline void pci_msix_mask(struct msi_desc *desc)
> @@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struc
>  
>  static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>  {
> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
> -		return;
> -
>  	if (desc->msi_attrib.is_msix)
>  		pci_msix_mask(desc);
> -	else if (desc->msi_attrib.maskbit)
> +	else
>  		pci_msi_mask(desc, mask);
>  }
>  
>  static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
>  {
> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
> -		return;
> -
>  	if (desc->msi_attrib.is_msix)
>  		pci_msix_unmask(desc);
> -	else if (desc->msi_attrib.maskbit)
> +	else
>  		pci_msi_unmask(desc, mask);
>  }
>  
> @@ -484,7 +482,8 @@ msi_setup_entry(struct pci_dev *dev, int
>  	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
>  	entry->msi_attrib.is_virtual    = 0;
>  	entry->msi_attrib.entry_nr	= 0;
> -	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
> +	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
> +					  !!(control & PCI_MSI_FLAGS_MASKBIT);
>  	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>  	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
>  	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> @@ -495,7 +494,7 @@ msi_setup_entry(struct pci_dev *dev, int
>  		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
>  
>  	/* Save the initial mask status */
> -	if (entry->msi_attrib.maskbit)
> +	if (entry->msi_attrib.can_mask)
>  		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
>  
>  out:
> @@ -638,10 +637,13 @@ static int msix_setup_entries(struct pci
>  		entry->msi_attrib.is_virtual =
>  			entry->msi_attrib.entry_nr >= vec_count;
>  
> +		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
> +						  !entry->msi_attrib.is_virtual;
> +
>  		entry->msi_attrib.default_irq	= dev->irq;
>  		entry->mask_base		= base;
>  
> -		if (!entry->msi_attrib.is_virtual) {
> +		if (!entry->msi_attrib.can_mask) {
>  			addr = pci_msix_desc_addr(entry);
>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>  		}
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -148,7 +148,7 @@ struct msi_desc {
>  				u8	is_msix		: 1;
>  				u8	multiple	: 3;
>  				u8	multi_cap	: 3;
> -				u8	maskbit		: 1;
> +				u8	can_mask	: 1;
>  				u8	is_64		: 1;
>  				u8	is_virtual	: 1;
>  				u16	entry_nr;
Thanks,
I'll test this out ASAP.

Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 6 months ago
On 10/27/21 11:54, Josef Johansson wrote:
> On 10/27/21 11:50, Thomas Gleixner wrote:
>> The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
>> low level accessors into the higher level mask/unmask functions.
>>
>> This missed the fact that these accessors can be invoked from other places
>> as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
>> also violates the virtual MSIX and the msi_attrib.maskbit protections.
>>
>> Instead of sprinkling checks all over the place, lift them back into the
>> low level accessor functions. To avoid checking three different conditions
>> combine them into one property of msi_desc::msi_attrib.
>>
>> Reported-by: Josef Johansson <josef@oderland.se>
>> Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions")
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Jason Andryuk <jandryuk@gmail.com>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Bjorn Helgaas <helgaas@kernel.org>
>> Cc: linux-pci@vger.kernel.org
>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: David Woodhouse <dwmw2@infradead.org>
>> ---
>>  drivers/pci/msi.c   |   26 ++++++++++++++------------
>>  include/linux/msi.h |    2 +-
>>  2 files changed, 15 insertions(+), 13 deletions(-)
>>
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask
>>  	raw_spinlock_t *lock = &desc->dev->msi_lock;
>>  	unsigned long flags;
>>  
>> +	if (!desc->msi_attrib.can_mask)
>> +		return;
>> +
>>  	raw_spin_lock_irqsave(lock, flags);
>>  	desc->msi_mask &= ~clear;
>>  	desc->msi_mask |= set;
>> @@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(s
>>  {
>>  	void __iomem *desc_addr = pci_msix_desc_addr(desc);
>>  
>> -	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>> +	if (desc->msi_attrib.can_mask)
>> +		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>  }
>>  
>>  static inline void pci_msix_mask(struct msi_desc *desc)
>> @@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struc
>>  
>>  static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>>  {
>> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>> -		return;
>> -
>>  	if (desc->msi_attrib.is_msix)
>>  		pci_msix_mask(desc);
>> -	else if (desc->msi_attrib.maskbit)
>> +	else
>>  		pci_msi_mask(desc, mask);
>>  }
>>  
>>  static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
>>  {
>> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>> -		return;
>> -
>>  	if (desc->msi_attrib.is_msix)
>>  		pci_msix_unmask(desc);
>> -	else if (desc->msi_attrib.maskbit)
>> +	else
>>  		pci_msi_unmask(desc, mask);
>>  }
>>  
>> @@ -484,7 +482,8 @@ msi_setup_entry(struct pci_dev *dev, int
>>  	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
>>  	entry->msi_attrib.is_virtual    = 0;
>>  	entry->msi_attrib.entry_nr	= 0;
>> -	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>> +	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
>> +					  !!(control & PCI_MSI_FLAGS_MASKBIT);
>>  	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>>  	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
>>  	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
>> @@ -495,7 +494,7 @@ msi_setup_entry(struct pci_dev *dev, int
>>  		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
>>  
>>  	/* Save the initial mask status */
>> -	if (entry->msi_attrib.maskbit)
>> +	if (entry->msi_attrib.can_mask)
>>  		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
>>  
>>  out:
>> @@ -638,10 +637,13 @@ static int msix_setup_entries(struct pci
>>  		entry->msi_attrib.is_virtual =
>>  			entry->msi_attrib.entry_nr >= vec_count;
>>  
>> +		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
>> +						  !entry->msi_attrib.is_virtual;
>> +
>>  		entry->msi_attrib.default_irq	= dev->irq;
>>  		entry->mask_base		= base;
>>  
>> -		if (!entry->msi_attrib.is_virtual) {
>> +		if (!entry->msi_attrib.can_mask) {
>>  			addr = pci_msix_desc_addr(entry);
>>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>  		}
>> --- a/include/linux/msi.h
>> +++ b/include/linux/msi.h
>> @@ -148,7 +148,7 @@ struct msi_desc {
>>  				u8	is_msix		: 1;
>>  				u8	multiple	: 3;
>>  				u8	multi_cap	: 3;
>> -				u8	maskbit		: 1;
>> +				u8	can_mask	: 1;
>>  				u8	is_64		: 1;
>>  				u8	is_virtual	: 1;
>>  				u16	entry_nr;
> Thanks,
> I'll test this out ASAP.
I'm adding

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 6a5ecee6e567..28d509452958 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -529,10 +529,10 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
 
 	/*
 	 * Checking the first MSI descriptor is sufficient. MSIX supports
-	 * masking and MSI does so when the maskbit is set.
+	 * masking and MSI does so when the can_mask is set.
 	 */
 	desc = first_msi_entry(dev);
-	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
+	return desc->msi_attrib.is_msix || desc->msi_attrib.can_mask;
 }
 
 int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 6 months ago
On 10/27/21 14:01, Josef Johansson wrote:
> On 10/27/21 11:54, Josef Johansson wrote:
>> On 10/27/21 11:50, Thomas Gleixner wrote:
>>> The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
>>> low level accessors into the higher level mask/unmask functions.
>>>
>>> This missed the fact that these accessors can be invoked from other places
>>> as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
>>> also violates the virtual MSIX and the msi_attrib.maskbit protections.
>>>
>>> Instead of sprinkling checks all over the place, lift them back into the
>>> low level accessor functions. To avoid checking three different conditions
>>> combine them into one property of msi_desc::msi_attrib.
>>>
>>> Reported-by: Josef Johansson <josef@oderland.se>
>>> Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions")
>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Jason Andryuk <jandryuk@gmail.com>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Bjorn Helgaas <helgaas@kernel.org>
>>> Cc: linux-pci@vger.kernel.org
>>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>>> Cc: Juergen Gross <jgross@suse.com>
>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Cc: David Woodhouse <dwmw2@infradead.org>
>>> ---
>>>  drivers/pci/msi.c   |   26 ++++++++++++++------------
>>>  include/linux/msi.h |    2 +-
>>>  2 files changed, 15 insertions(+), 13 deletions(-)
>>>
>>> --- a/drivers/pci/msi.c
>>> +++ b/drivers/pci/msi.c
>>> @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask
>>>  	raw_spinlock_t *lock = &desc->dev->msi_lock;
>>>  	unsigned long flags;
>>>  
>>> +	if (!desc->msi_attrib.can_mask)
>>> +		return;
>>> +
>>>  	raw_spin_lock_irqsave(lock, flags);
>>>  	desc->msi_mask &= ~clear;
>>>  	desc->msi_mask |= set;
>>> @@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(s
>>>  {
>>>  	void __iomem *desc_addr = pci_msix_desc_addr(desc);
>>>  
>>> -	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>> +	if (desc->msi_attrib.can_mask)
>>> +		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>>  }
>>>  
>>>  static inline void pci_msix_mask(struct msi_desc *desc)
>>> @@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struc
>>>  
>>>  static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
>>>  {
>>> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>>> -		return;
>>> -
>>>  	if (desc->msi_attrib.is_msix)
>>>  		pci_msix_mask(desc);
>>> -	else if (desc->msi_attrib.maskbit)
>>> +	else
>>>  		pci_msi_mask(desc, mask);
>>>  }
>>>  
>>>  static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
>>>  {
>>> -	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>>> -		return;
>>> -
>>>  	if (desc->msi_attrib.is_msix)
>>>  		pci_msix_unmask(desc);
>>> -	else if (desc->msi_attrib.maskbit)
>>> +	else
>>>  		pci_msi_unmask(desc, mask);
>>>  }
>>>  
>>> @@ -484,7 +482,8 @@ msi_setup_entry(struct pci_dev *dev, int
>>>  	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
>>>  	entry->msi_attrib.is_virtual    = 0;
>>>  	entry->msi_attrib.entry_nr	= 0;
>>> -	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>>> +	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
>>> +					  !!(control & PCI_MSI_FLAGS_MASKBIT);
>>>  	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>>>  	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
>>>  	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
>>> @@ -495,7 +494,7 @@ msi_setup_entry(struct pci_dev *dev, int
>>>  		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
>>>  
>>>  	/* Save the initial mask status */
>>> -	if (entry->msi_attrib.maskbit)
>>> +	if (entry->msi_attrib.can_mask)
>>>  		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
>>>  
>>>  out:
>>> @@ -638,10 +637,13 @@ static int msix_setup_entries(struct pci
>>>  		entry->msi_attrib.is_virtual =
>>>  			entry->msi_attrib.entry_nr >= vec_count;
>>>  
>>> +		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
>>> +						  !entry->msi_attrib.is_virtual;
>>> +
>>>  		entry->msi_attrib.default_irq	= dev->irq;
>>>  		entry->mask_base		= base;
>>>  
>>> -		if (!entry->msi_attrib.is_virtual) {
>>> +		if (!entry->msi_attrib.can_mask) {
>>>  			addr = pci_msix_desc_addr(entry);
>>>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>>  		}
>>> --- a/include/linux/msi.h
>>> +++ b/include/linux/msi.h
>>> @@ -148,7 +148,7 @@ struct msi_desc {
>>>  				u8	is_msix		: 1;
>>>  				u8	multiple	: 3;
>>>  				u8	multi_cap	: 3;
>>> -				u8	maskbit		: 1;
>>> +				u8	can_mask	: 1;
>>>  				u8	is_64		: 1;
>>>  				u8	is_virtual	: 1;
>>>  				u16	entry_nr;
>> Thanks,
>> I'll test this out ASAP.
> I'm adding
>
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 6a5ecee6e567..28d509452958 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -529,10 +529,10 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
>  
>  	/*
>  	 * Checking the first MSI descriptor is sufficient. MSIX supports
> -	 * masking and MSI does so when the maskbit is set.
> +	 * masking and MSI does so when the can_mask is set.
>  	 */
>  	desc = first_msi_entry(dev);
> -	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
> +	return desc->msi_attrib.is_msix || desc->msi_attrib.can_mask;
>  }
>  
>  int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>
Hi Thomas,

With the above added the kernel boots fine and I can even suspend it twice.
Which is with my laptop, a good sign.

You can add Tested-By: josef@oderland.se.

Thanks again!

When I suspend I get errors from Xen, including stacktraces below
if anyone has any clue, if this might be related. I get one each time I
suspend
and the third time amdgpu gives up.

rtc_cmos 00:01: registered as rtc0
rtc_cmos 00:01: setting system clock to 2021-10-27T15:04:35 UTC (1635347075)
rtc_cmos 00:01: no alarms, y3k, 114 bytes nvram
device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
efifb: probing for efifb
efifb: cannot reserve video memory at 0x60000000
------------[ cut here ]------------
ioremap on RAM at 0x0000000060000000 - 0x00000000607e8fff
WARNING: CPU: 7 PID: 1 at arch/x86/mm/ioremap.c:210 __ioremap_caller+0x332/0x350
Modules linked in:
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
RIP: e030:__ioremap_caller+0x332/0x350
Code: e8 c3 ca ff ff 49 09 c6 e9 32 fe ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 35 f2 5d 82 c6 05 e8 7b a9 01 01 e8 48 39 be 00 <0f> 0b 45 31 e4 e9 ac fe ff ff e8 ff f5 c3 00 66 66 2e 0f 1f 84 00
RSP: e02b:ffffc9004007bb00 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000007e9000 RCX: ffffffff82915ca8
RDX: c0000000ffffdfff RSI: 0000000000000000 RDI: ffffffff82865ca0
RBP: 0000000060000000 R08: 0000000000000000 R09: ffffc9004007b948
R10: ffffc9004007b940 R11: ffffffff82945ce8 R12: 0000000000000001
R13: 00000000007e9000 R14: 00000000007e9000 R15: ffffffff81c8f772
FS:  0000000000000000(0000) GS:ffff8881407c0000(0000) knlGS:0000000000000000
CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002810000 CR4: 0000000000050660
Call Trace:
 efifb_probe.cold+0x2e6/0x688
 platform_probe+0x3f/0x90
 call_driver_probe+0x24/0xc0
 really_probe+0x1e7/0x310
 __driver_probe_device+0xfe/0x180
 driver_probe_device+0x1e/0x90
 __device_attach_driver+0x72/0xe0
 ? driver_allows_async_probing+0x50/0x50
 ? driver_allows_async_probing+0x50/0x50
 bus_for_each_drv+0x8f/0xd0
 __device_attach+0xe9/0x1f0
 bus_probe_device+0x8e/0xa0
 device_add+0x3fb/0x630
 platform_device_add+0x102/0x230
 sysfb_init+0xea/0x141
 ? firmware_map_add_early+0xb8/0xb8
 do_one_initcall+0x57/0x200
 do_initcalls+0x109/0x166
 kernel_init_freeable+0x23c/0x2bd
 ? rest_init+0xc0/0xc0
 kernel_init+0x16/0x120
 ret_from_fork+0x22/0x30
---[ end trace b068d3cd1b7f5f49 ]---
efifb: abort, cannot remap video memory 0x7e9000 @ 0x60000000
efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
--
printk: Suspending console(s) (use no_console_suspend to debug)
[drm] free PSP TMR buffer
PM: suspend devices took 0.428 seconds
ACPI: EC: interrupt blocked
ACPI: PM: Preparing to enter system sleep state S3
ACPI: EC: event blocked
ACPI: EC: EC stopped
ACPI: PM: Saving platform NVS memory
Disabling non-boot CPUs ...
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522 switch_mm_irqs_off+0x3c5/0x400
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
RIP: e030:switch_mm_irqs_off+0x3c5/0x400
Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
Call Trace:
 switch_mm+0x1c/0x30
 idle_task_exit+0x55/0x60
 play_dead_common+0xa/0x20
 xen_pv_play_dead+0xa/0x60
 do_idle+0xd1/0xe0
 cpu_startup_entry+0x19/0x20
 asm_cpu_bringup_and_idle+0x5/0x1000
---[ end trace b068d3cd1b7f5f4b ]---
smpboot: CPU 1 is now offline
smpboot: CPU 2 is now offline
smpboot: CPU 3 is now offline
smpboot: CPU 4 is now offline
smpboot: CPU 5 is now offline
smpboot: CPU 6 is now offline
smpboot: CPU 7 is now offline
ACPI: PM: Low-level resume complete
ACPI: EC: EC started
ACPI: PM: Restoring platform NVS memory
xen_acpi_processor: Uploading Xen processor PM info
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU11
--
CPU2 is up
installing Xen timer for CPU 3
cpu 3 spinlock event irq 79
[Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
ACPI: \_SB_.PLTF.C003: Found 3 idle states
ACPI: FW issue: working around C-state latencies out of order
CPU3 is up
------------[ cut here ]------------
cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
installing Xen timer for CPU 4
WARNING: CPU: 3 PID: 455 at kernel/sched/fair.c:3339 __update_blocked_fair+0x49b/0x4b0
Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
CPU: 3 PID: 455 Comm: kworker/3:2 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
Workqueue:  0x0 (events)
RIP: e030:__update_blocked_fair+0x49b/0x4b0
Code: 6b fd ff ff 49 8b 96 48 01 00 00 48 89 90 50 09 00 00 e9 ff fc ff ff 48 c7 c7 10 7a 5e 82 c6 05 f3 35 9e 01 01 e8 1f f3 b2 00 <0f> 0b 41 8b 86 38 01 00 00 e9 c6 fc ff ff 0f 1f 80 00 00 00 00 0f
RSP: e02b:ffffc900410d7ce0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000018 RCX: ffff8881406d8a08
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881406d8a00
RBP: ffff8881406e9800 R08: 0000000000000048 R09: ffffc900410d7c78
R10: 0000000000000049 R11: 000000002d2d2d2d R12: ffff8881406e9f80
R13: ffff8881406e9e40 R14: ffff8881406e96c0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8881406c0000(0000) knlGS:0000000000000000
CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000782e51820000 CR3: 0000000002810000 CR4: 0000000000050660
Call Trace:
 update_blocked_averages+0xa8/0x180
 newidle_balance+0x175/0x380
 pick_next_task_fair+0x39/0x3e0
 pick_next_task+0x4c/0xbd0
 ? dequeue_task_fair+0xba/0x390
 __schedule+0x13a/0x570
 schedule+0x44/0xa0
 worker_thread+0xc0/0x320
 ? process_one_work+0x390/0x390
 kthread+0x10f/0x130
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30
---[ end trace b068d3cd1b7f5f4c ]---
cpu 4 spinlock event irq 85
[Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
ACPI: \_SB_.PLTF.C004: Found 3 idle states
ACPI: FW issue: working around C-state latencies out of order
CPU4 is up
installing Xen timer for CPU 5
cpu 5 spinlock event irq 91
[Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
ACPI: \_SB_.PLTF.C005: Found 3 idle states
ACPI: FW issue: working around C-state latencies out of order
CPU5 is up


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Thomas Gleixner 2 years, 5 months ago
Josef!

On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
> On 10/27/21 14:01, Josef Johansson wrote:
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 6a5ecee6e567..28d509452958 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -529,10 +529,10 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
>>  
>>  	/*
>>  	 * Checking the first MSI descriptor is sufficient. MSIX supports
>> -	 * masking and MSI does so when the maskbit is set.
>> +	 * masking and MSI does so when the can_mask is set.
>>  	 */
>>  	desc = first_msi_entry(dev);
>> -	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
>> +	return desc->msi_attrib.is_msix || desc->msi_attrib.can_mask;
>>  }
>>  
>>  int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>>
> Hi Thomas,
>
> With the above added the kernel boots fine and I can even suspend it twice.
> Which is with my laptop, a good sign.
>
> You can add Tested-By: josef@oderland.se.

Thank you for fixing my quick hack in vacation mode. I'll send out a v2
in a minute.

Thanks,

        tglx
 

[PATCH v2] PCI/MSI: Move non-mask check back into low level accessors
Posted by Thomas Gleixner 2 years, 5 months ago
From: Thomas Gleixner <tglx@linutronix.de>

The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
low level accessors into the higher level mask/unmask functions.

This missed the fact that these accessors can be invoked from other places
as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
also violates the virtual MSIX and the msi_attrib.maskbit protections.

Instead of sprinkling checks all over the place, lift them back into the
low level accessor functions. To avoid checking three different conditions
combine them into one property of msi_desc::msi_attrib.

[ josef: Fixed the missed conversion in the core code ]

Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions")
Reported-by: Josef Johansson <josef@oderland.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Josef Johansson <josef@oderland.se>
Cc: stable@vger.kernel.org
---
V2: Added the missing conversion in the core code - Josef
---
 drivers/pci/msi.c   |   26 ++++++++++++++------------
 include/linux/msi.h |    2 +-
 kernel/irq/msi.c    |    4 ++--
 3 files changed, 17 insertions(+), 15 deletions(-)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask
 	raw_spinlock_t *lock = &desc->dev->msi_lock;
 	unsigned long flags;
 
+	if (!desc->msi_attrib.can_mask)
+		return;
+
 	raw_spin_lock_irqsave(lock, flags);
 	desc->msi_mask &= ~clear;
 	desc->msi_mask |= set;
@@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(s
 {
 	void __iomem *desc_addr = pci_msix_desc_addr(desc);
 
-	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	if (desc->msi_attrib.can_mask)
+		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
 static inline void pci_msix_mask(struct msi_desc *desc)
@@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struc
 
 static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_mask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_mask(desc, mask);
 }
 
 static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_unmask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_unmask(desc, mask);
 }
 
@@ -484,7 +482,8 @@ msi_setup_entry(struct pci_dev *dev, int
 	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
 	entry->msi_attrib.is_virtual    = 0;
 	entry->msi_attrib.entry_nr	= 0;
-	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
+	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+					  !!(control & PCI_MSI_FLAGS_MASKBIT);
 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
 	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
@@ -495,7 +494,7 @@ msi_setup_entry(struct pci_dev *dev, int
 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
 
 	/* Save the initial mask status */
-	if (entry->msi_attrib.maskbit)
+	if (entry->msi_attrib.can_mask)
 		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
 
 out:
@@ -638,10 +637,13 @@ static int msix_setup_entries(struct pci
 		entry->msi_attrib.is_virtual =
 			entry->msi_attrib.entry_nr >= vec_count;
 
+		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+						  !entry->msi_attrib.is_virtual;
+
 		entry->msi_attrib.default_irq	= dev->irq;
 		entry->mask_base		= base;
 
-		if (!entry->msi_attrib.is_virtual) {
+		if (!entry->msi_attrib.can_mask) {
 			addr = pci_msix_desc_addr(entry);
 			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 		}
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -148,7 +148,7 @@ struct msi_desc {
 				u8	is_msix		: 1;
 				u8	multiple	: 3;
 				u8	multi_cap	: 3;
-				u8	maskbit		: 1;
+				u8	can_mask	: 1;
 				u8	is_64		: 1;
 				u8	is_virtual	: 1;
 				u16	entry_nr;
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -529,10 +529,10 @@ static bool msi_check_reservation_mode(s
 
 	/*
 	 * Checking the first MSI descriptor is sufficient. MSIX supports
-	 * masking and MSI does so when the maskbit is set.
+	 * masking and MSI does so when the can_mask attribute is set.
 	 */
 	desc = first_msi_entry(dev);
-	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
+	return desc->msi_attrib.is_msix || desc->msi_attrib.can_mask;
 }
 
 int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,

Re: [PATCH v2] PCI/MSI: Move non-mask check back into low level accessors
Posted by Thomas Gleixner 2 years, 5 months ago
On Thu, Nov 04 2021 at 00:27, Thomas Gleixner wrote:
>  
> -		if (!entry->msi_attrib.is_virtual) {
> +		if (!entry->msi_attrib.can_mask) {

Groan. I'm a moron. This obviously needs to be

		if (entry->msi_attrib.can_mask) {

>  			addr = pci_msix_desc_addr(entry);
>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>  		}

Re: [PATCH v2] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 5 months ago
On 11/9/21 15:53, Thomas Gleixner wrote:
> On Thu, Nov 04 2021 at 00:27, Thomas Gleixner wrote:
>>  
>> -		if (!entry->msi_attrib.is_virtual) {
>> +		if (!entry->msi_attrib.can_mask) {
> Groan. I'm a moron. This obviously needs to be
>
> 		if (entry->msi_attrib.can_mask) {
I will compile and check. Thanks for being thorough.
>>  			addr = pci_msix_desc_addr(entry);
>>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>  		}


Re: [PATCH v2] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 5 months ago
On 11/10/21 14:31, Josef Johansson wrote:
> On 11/9/21 15:53, Thomas Gleixner wrote:
>> On Thu, Nov 04 2021 at 00:27, Thomas Gleixner wrote:
>>>  
>>> -		if (!entry->msi_attrib.is_virtual) {
>>> +		if (!entry->msi_attrib.can_mask) {
>> Groan. I'm a moron. This obviously needs to be
>>
>> 		if (entry->msi_attrib.can_mask) {
> I will compile and check. Thanks for being thorough.
This worked as well.
>>>  			addr = pci_msix_desc_addr(entry);
>>>  			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>>  		}


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Thomas Gleixner 2 years, 5 months ago
On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:

CC+: EFIFB and scheduler folks

> On 10/27/21 14:01, Josef Johansson wrote:
> When I suspend I get errors from Xen, including stacktraces below
> if anyone has any clue, if this might be related. I get one each time I
> suspend
> and the third time amdgpu gives up.
>
> rtc_cmos 00:01: registered as rtc0
> rtc_cmos 00:01: setting system clock to 2021-10-27T15:04:35 UTC (1635347075)
> rtc_cmos 00:01: no alarms, y3k, 114 bytes nvram
> device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> device-mapper: uevent: version 1.0.3
> device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
> efifb: probing for efifb
> efifb: cannot reserve video memory at 0x60000000
> ------------[ cut here ]------------
> ioremap on RAM at 0x0000000060000000 - 0x00000000607e8fff
> WARNING: CPU: 7 PID: 1 at arch/x86/mm/ioremap.c:210 __ioremap_caller+0x332/0x350

That's this warning:

	/*
	 * Don't allow anybody to remap normal RAM that we're using..
	 */
	if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
		WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
			  &phys_addr, &last_addr);
		return NULL;
	}


> Modules linked in:
> CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> RIP: e030:__ioremap_caller+0x332/0x350
> Code: e8 c3 ca ff ff 49 09 c6 e9 32 fe ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 35 f2 5d 82 c6 05 e8 7b a9 01 01 e8 48 39 be 00 <0f> 0b 45 31 e4 e9 ac fe ff ff e8 ff f5 c3 00 66 66 2e 0f 1f 84 00
> RSP: e02b:ffffc9004007bb00 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 00000000007e9000 RCX: ffffffff82915ca8
> RDX: c0000000ffffdfff RSI: 0000000000000000 RDI: ffffffff82865ca0
> RBP: 0000000060000000 R08: 0000000000000000 R09: ffffc9004007b948
> R10: ffffc9004007b940 R11: ffffffff82945ce8 R12: 0000000000000001
> R13: 00000000007e9000 R14: 00000000007e9000 R15: ffffffff81c8f772
> FS:  0000000000000000(0000) GS:ffff8881407c0000(0000) knlGS:0000000000000000
> CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000002810000 CR4: 0000000000050660
> Call Trace:
>  efifb_probe.cold+0x2e6/0x688

Why is this probing EFIFB at resume? Josef is that hibernate or suspend
to RAM?

>  platform_probe+0x3f/0x90
>  call_driver_probe+0x24/0xc0
>  really_probe+0x1e7/0x310
>  __driver_probe_device+0xfe/0x180
>  driver_probe_device+0x1e/0x90
>  __device_attach_driver+0x72/0xe0
>  ? driver_allows_async_probing+0x50/0x50
>  ? driver_allows_async_probing+0x50/0x50
>  bus_for_each_drv+0x8f/0xd0
>  __device_attach+0xe9/0x1f0
>  bus_probe_device+0x8e/0xa0
>  device_add+0x3fb/0x630
>  platform_device_add+0x102/0x230
>  sysfb_init+0xea/0x141
>  ? firmware_map_add_early+0xb8/0xb8
>  do_one_initcall+0x57/0x200
>  do_initcalls+0x109/0x166
>  kernel_init_freeable+0x23c/0x2bd
>  ? rest_init+0xc0/0xc0
>  kernel_init+0x16/0x120
>  ret_from_fork+0x22/0x30
> ---[ end trace b068d3cd1b7f5f49 ]---
> efifb: abort, cannot remap video memory 0x7e9000 @ 0x60000000
> efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
> --
> printk: Suspending console(s) (use no_console_suspend to debug)
> [drm] free PSP TMR buffer
> PM: suspend devices took 0.428 seconds
> ACPI: EC: interrupt blocked
> ACPI: PM: Preparing to enter system sleep state S3
> ACPI: EC: event blocked
> ACPI: EC: EC stopped
> ACPI: PM: Saving platform NVS memory
> Disabling non-boot CPUs ...
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522  switch_mm_irqs_off+0x3c5/0x400

	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {

> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> RIP: e030:switch_mm_irqs_off+0x3c5/0x400
> Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
> RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
> RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
> RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
> RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
> R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
> CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
> CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
> Call Trace:
>  switch_mm+0x1c/0x30
>  idle_task_exit+0x55/0x60
>  play_dead_common+0xa/0x20
>  xen_pv_play_dead+0xa/0x60

So this is when bringing the non boot CPUs down and the switch_mm() code
discovers inconsistency between CR3 and the expected value.

Would probably be interesting to print the actual values, but XEN folks
might have an idea.

>  do_idle+0xd1/0xe0
>  cpu_startup_entry+0x19/0x20
>  asm_cpu_bringup_and_idle+0x5/0x1000
> ---[ end trace b068d3cd1b7f5f4b ]---
> smpboot: CPU 1 is now offline
> smpboot: CPU 2 is now offline
> smpboot: CPU 3 is now offline
> smpboot: CPU 4 is now offline
> smpboot: CPU 5 is now offline
> smpboot: CPU 6 is now offline
> smpboot: CPU 7 is now offline
> ACPI: PM: Low-level resume complete
> ACPI: EC: EC started
> ACPI: PM: Restoring platform NVS memory
> xen_acpi_processor: Uploading Xen processor PM info
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU11
> --
> CPU2 is up
> installing Xen timer for CPU 3
> cpu 3 spinlock event irq 79
> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> ACPI: \_SB_.PLTF.C003: Found 3 idle states
> ACPI: FW issue: working around C-state latencies out of order
> CPU3 is up
> ------------[ cut here ]------------
> cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
> installing Xen timer for CPU 4
> WARNING: CPU: 3 PID: 455 at kernel/sched/fair.c:3339  __update_blocked_fair+0x49b/0x4b0

	/*
	 * _avg must be null when _sum are null because _avg = _sum / divider
	 * Make sure that rounding and/or propagation of PELT values never
	 * break this.
	 */
	SCHED_WARN_ON(cfs_rq->avg.load_avg ||
		      cfs_rq->avg.util_avg ||
		      cfs_rq->avg.runnable_avg);

PeterZ, does that ring any bell?

> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> CPU: 3 PID: 455 Comm: kworker/3:2 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> Workqueue:  0x0 (events)
> RIP: e030:__update_blocked_fair+0x49b/0x4b0
> Code: 6b fd ff ff 49 8b 96 48 01 00 00 48 89 90 50 09 00 00 e9 ff fc ff ff 48 c7 c7 10 7a 5e 82 c6 05 f3 35 9e 01 01 e8 1f f3 b2 00 <0f> 0b 41 8b 86 38 01 00 00 e9 c6 fc ff ff 0f 1f 80 00 00 00 00 0f
> RSP: e02b:ffffc900410d7ce0 EFLAGS: 00010082
> RAX: 0000000000000000 RBX: 0000000000000018 RCX: ffff8881406d8a08
> RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881406d8a00
> RBP: ffff8881406e9800 R08: 0000000000000048 R09: ffffc900410d7c78
> R10: 0000000000000049 R11: 000000002d2d2d2d R12: ffff8881406e9f80
> R13: ffff8881406e9e40 R14: ffff8881406e96c0 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff8881406c0000(0000) knlGS:0000000000000000
> CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000782e51820000 CR3: 0000000002810000 CR4: 0000000000050660
> Call Trace:
>  update_blocked_averages+0xa8/0x180
>  newidle_balance+0x175/0x380
>  pick_next_task_fair+0x39/0x3e0
>  pick_next_task+0x4c/0xbd0
>  ? dequeue_task_fair+0xba/0x390
>  __schedule+0x13a/0x570
>  schedule+0x44/0xa0
>  worker_thread+0xc0/0x320
>  ? process_one_work+0x390/0x390
>  kthread+0x10f/0x130
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x22/0x30
> ---[ end trace b068d3cd1b7f5f4c ]---
> cpu 4 spinlock event irq 85
> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> ACPI: \_SB_.PLTF.C004: Found 3 idle states
> ACPI: FW issue: working around C-state latencies out of order
> CPU4 is up
> installing Xen timer for CPU 5
> cpu 5 spinlock event irq 91
> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> ACPI: \_SB_.PLTF.C005: Found 3 idle states
> ACPI: FW issue: working around C-state latencies out of order
> CPU5 is up

Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 5 months ago
On 11/4/21 00:45, Thomas Gleixner wrote:
> On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
>
> CC+: EFIFB and scheduler folks
>
>> On 10/27/21 14:01, Josef Johansson wrote:
>> When I suspend I get errors from Xen, including stacktraces below
>> if anyone has any clue, if this might be related. I get one each time I
>> suspend
>> and the third time amdgpu gives up.
>>
>> rtc_cmos 00:01: registered as rtc0
>> rtc_cmos 00:01: setting system clock to 2021-10-27T15:04:35 UTC (1635347075)
>> rtc_cmos 00:01: no alarms, y3k, 114 bytes nvram
>> device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
>> device-mapper: uevent: version 1.0.3
>> device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
>> efifb: probing for efifb
>> efifb: cannot reserve video memory at 0x60000000
>> ------------[ cut here ]------------
>> ioremap on RAM at 0x0000000060000000 - 0x00000000607e8fff
>> WARNING: CPU: 7 PID: 1 at arch/x86/mm/ioremap.c:210 __ioremap_caller+0x332/0x350
> That's this warning:
>
> 	/*
> 	 * Don't allow anybody to remap normal RAM that we're using..
> 	 */
> 	if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
> 		WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
> 			  &phys_addr, &last_addr);
> 		return NULL;
> 	}
>
>
>> Modules linked in:
>> CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
>> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
>> RIP: e030:__ioremap_caller+0x332/0x350
>> Code: e8 c3 ca ff ff 49 09 c6 e9 32 fe ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 35 f2 5d 82 c6 05 e8 7b a9 01 01 e8 48 39 be 00 <0f> 0b 45 31 e4 e9 ac fe ff ff e8 ff f5 c3 00 66 66 2e 0f 1f 84 00
>> RSP: e02b:ffffc9004007bb00 EFLAGS: 00010286
>> RAX: 0000000000000000 RBX: 00000000007e9000 RCX: ffffffff82915ca8
>> RDX: c0000000ffffdfff RSI: 0000000000000000 RDI: ffffffff82865ca0
>> RBP: 0000000060000000 R08: 0000000000000000 R09: ffffc9004007b948
>> R10: ffffc9004007b940 R11: ffffffff82945ce8 R12: 0000000000000001
>> R13: 00000000007e9000 R14: 00000000007e9000 R15: ffffffff81c8f772
>> FS:  0000000000000000(0000) GS:ffff8881407c0000(0000) knlGS:0000000000000000
>> CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 0000000002810000 CR4: 0000000000050660
>> Call Trace:
>>  efifb_probe.cold+0x2e6/0x688
> Why is this probing EFIFB at resume? Josef is that hibernate or suspend
> to RAM?
This is actually on boot, might be a totally ignore able warning though.
Would be nice to actually get rid of!
>
>>  platform_probe+0x3f/0x90
>>  call_driver_probe+0x24/0xc0
>>  really_probe+0x1e7/0x310
>>  __driver_probe_device+0xfe/0x180
>>  driver_probe_device+0x1e/0x90
>>  __device_attach_driver+0x72/0xe0
>>  ? driver_allows_async_probing+0x50/0x50
>>  ? driver_allows_async_probing+0x50/0x50
>>  bus_for_each_drv+0x8f/0xd0
>>  __device_attach+0xe9/0x1f0
>>  bus_probe_device+0x8e/0xa0
>>  device_add+0x3fb/0x630
>>  platform_device_add+0x102/0x230
>>  sysfb_init+0xea/0x141
>>  ? firmware_map_add_early+0xb8/0xb8
>>  do_one_initcall+0x57/0x200
>>  do_initcalls+0x109/0x166
>>  kernel_init_freeable+0x23c/0x2bd
>>  ? rest_init+0xc0/0xc0
>>  kernel_init+0x16/0x120
>>  ret_from_fork+0x22/0x30
>> ---[ end trace b068d3cd1b7f5f49 ]---
>> efifb: abort, cannot remap video memory 0x7e9000 @ 0x60000000
>> efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
>> --
>> printk: Suspending console(s) (use no_console_suspend to debug)
>> [drm] free PSP TMR buffer
>> PM: suspend devices took 0.428 seconds
>> ACPI: EC: interrupt blocked
>> ACPI: PM: Preparing to enter system sleep state S3
>> ACPI: EC: event blocked
>> ACPI: EC: EC stopped
>> ACPI: PM: Saving platform NVS memory
>> Disabling non-boot CPUs ...
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522  switch_mm_irqs_off+0x3c5/0x400
> 	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
>
>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
>> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
>> RIP: e030:switch_mm_irqs_off+0x3c5/0x400
>> Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
>> RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
>> RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
>> RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
>> RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
>> R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
>> FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
>> CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>> CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
>> Call Trace:
>>  switch_mm+0x1c/0x30
>>  idle_task_exit+0x55/0x60
>>  play_dead_common+0xa/0x20
>>  xen_pv_play_dead+0xa/0x60
> So this is when bringing the non boot CPUs down and the switch_mm() code
> discovers inconsistency between CR3 and the expected value.
>
> Would probably be interesting to print the actual values, but XEN folks
> might have an idea.
My guess here after digging through the code is that XEN is just (as the
comment above this warn
says), just doing a load_cr3(swapper_pg_dir) without going through
switch_mm first.
I could add a BUG_ON somewhere, or maybe printk, but I'm very unsure to
where I should put them.
Assistance here would be great.
>
>>  do_idle+0xd1/0xe0
>>  cpu_startup_entry+0x19/0x20
>>  asm_cpu_bringup_and_idle+0x5/0x1000
>> ---[ end trace b068d3cd1b7f5f4b ]---
>> smpboot: CPU 1 is now offline
>> smpboot: CPU 2 is now offline
>> smpboot: CPU 3 is now offline
>> smpboot: CPU 4 is now offline
>> smpboot: CPU 5 is now offline
>> smpboot: CPU 6 is now offline
>> smpboot: CPU 7 is now offline
>> ACPI: PM: Low-level resume complete
>> ACPI: EC: EC started
>> ACPI: PM: Restoring platform NVS memory
>> xen_acpi_processor: Uploading Xen processor PM info
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU11
>> --
>> CPU2 is up
>> installing Xen timer for CPU 3
>> cpu 3 spinlock event irq 79
>> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
>> ACPI: \_SB_.PLTF.C003: Found 3 idle states
>> ACPI: FW issue: working around C-state latencies out of order
>> CPU3 is up
>> ------------[ cut here ]------------
>> cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
>> installing Xen timer for CPU 4
>> WARNING: CPU: 3 PID: 455 at kernel/sched/fair.c:3339  __update_blocked_fair+0x49b/0x4b0
> 	/*
> 	 * _avg must be null when _sum are null because _avg = _sum / divider
> 	 * Make sure that rounding and/or propagation of PELT values never
> 	 * break this.
> 	 */
> 	SCHED_WARN_ON(cfs_rq->avg.load_avg ||
> 		      cfs_rq->avg.util_avg ||
> 		      cfs_rq->avg.runnable_avg);
>
> PeterZ, does that ring any bell?
I also assume that the first BUG triggers this one.
>
>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>> CPU: 3 PID: 455 Comm: kworker/3:2 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
>> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
>> Workqueue:  0x0 (events)
>> RIP: e030:__update_blocked_fair+0x49b/0x4b0
>> Code: 6b fd ff ff 49 8b 96 48 01 00 00 48 89 90 50 09 00 00 e9 ff fc ff ff 48 c7 c7 10 7a 5e 82 c6 05 f3 35 9e 01 01 e8 1f f3 b2 00 <0f> 0b 41 8b 86 38 01 00 00 e9 c6 fc ff ff 0f 1f 80 00 00 00 00 0f
>> RSP: e02b:ffffc900410d7ce0 EFLAGS: 00010082
>> RAX: 0000000000000000 RBX: 0000000000000018 RCX: ffff8881406d8a08
>> RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881406d8a00
>> RBP: ffff8881406e9800 R08: 0000000000000048 R09: ffffc900410d7c78
>> R10: 0000000000000049 R11: 000000002d2d2d2d R12: ffff8881406e9f80
>> R13: ffff8881406e9e40 R14: ffff8881406e96c0 R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffff8881406c0000(0000) knlGS:0000000000000000
>> CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000782e51820000 CR3: 0000000002810000 CR4: 0000000000050660
>> Call Trace:
>>  update_blocked_averages+0xa8/0x180
>>  newidle_balance+0x175/0x380
>>  pick_next_task_fair+0x39/0x3e0
>>  pick_next_task+0x4c/0xbd0
>>  ? dequeue_task_fair+0xba/0x390
>>  __schedule+0x13a/0x570
>>  schedule+0x44/0xa0
>>  worker_thread+0xc0/0x320
>>  ? process_one_work+0x390/0x390
>>  kthread+0x10f/0x130
>>  ? set_kthread_struct+0x40/0x40
>>  ret_from_fork+0x22/0x30
>> ---[ end trace b068d3cd1b7f5f4c ]---
>> cpu 4 spinlock event irq 85
>> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
>> ACPI: \_SB_.PLTF.C004: Found 3 idle states
>> ACPI: FW issue: working around C-state latencies out of order
>> CPU4 is up
>> installing Xen timer for CPU 5
>> cpu 5 spinlock event irq 91
>> [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
>> ACPI: \_SB_.PLTF.C005: Found 3 idle states
>> ACPI: FW issue: working around C-state latencies out of order
>> CPU5 is up


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Peter Zijlstra 2 years, 5 months ago
On Thu, Nov 04, 2021 at 12:45:29AM +0100, Thomas Gleixner wrote:
> On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
> > ------------[ cut here ]------------
> > cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
> > installing Xen timer for CPU 4
> > WARNING: CPU: 3 PID: 455 at kernel/sched/fair.c:3339  __update_blocked_fair+0x49b/0x4b0
> 
> 	/*
> 	 * _avg must be null when _sum are null because _avg = _sum / divider
> 	 * Make sure that rounding and/or propagation of PELT values never
> 	 * break this.
> 	 */
> 	SCHED_WARN_ON(cfs_rq->avg.load_avg ||
> 		      cfs_rq->avg.util_avg ||
> 		      cfs_rq->avg.runnable_avg);
> 
> PeterZ, does that ring any bell?

Hurrr.. I thought we fixed all those. Vincent?

> > Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> > CPU: 3 PID: 455 Comm: kworker/3:2 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> > Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> > Workqueue:  0x0 (events)
> > RIP: e030:__update_blocked_fair+0x49b/0x4b0
> > Code: 6b fd ff ff 49 8b 96 48 01 00 00 48 89 90 50 09 00 00 e9 ff fc ff ff 48 c7 c7 10 7a 5e 82 c6 05 f3 35 9e 01 01 e8 1f f3 b2 00 <0f> 0b 41 8b 86 38 01 00 00 e9 c6 fc ff ff 0f 1f 80 00 00 00 00 0f
> > RSP: e02b:ffffc900410d7ce0 EFLAGS: 00010082
> > RAX: 0000000000000000 RBX: 0000000000000018 RCX: ffff8881406d8a08
> > RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881406d8a00
> > RBP: ffff8881406e9800 R08: 0000000000000048 R09: ffffc900410d7c78
> > R10: 0000000000000049 R11: 000000002d2d2d2d R12: ffff8881406e9f80
> > R13: ffff8881406e9e40 R14: ffff8881406e96c0 R15: 0000000000000000
> > FS:  0000000000000000(0000) GS:ffff8881406c0000(0000) knlGS:0000000000000000
> > CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000782e51820000 CR3: 0000000002810000 CR4: 0000000000050660
> > Call Trace:
> >  update_blocked_averages+0xa8/0x180
> >  newidle_balance+0x175/0x380
> >  pick_next_task_fair+0x39/0x3e0
> >  pick_next_task+0x4c/0xbd0
> >  ? dequeue_task_fair+0xba/0x390
> >  __schedule+0x13a/0x570
> >  schedule+0x44/0xa0
> >  worker_thread+0xc0/0x320
> >  ? process_one_work+0x390/0x390
> >  kthread+0x10f/0x130
> >  ? set_kthread_struct+0x40/0x40
> >  ret_from_fork+0x22/0x30

Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Vincent Guittot 2 years, 5 months ago
On Thu, 4 Nov 2021 at 00:45, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
>
> CC+: EFIFB and scheduler folks
>
> > On 10/27/21 14:01, Josef Johansson wrote:
> > When I suspend I get errors from Xen, including stacktraces below
> > if anyone has any clue, if this might be related. I get one each time I
> > suspend
> > and the third time amdgpu gives up.
> >
> > rtc_cmos 00:01: registered as rtc0
> > rtc_cmos 00:01: setting system clock to 2021-10-27T15:04:35 UTC (1635347075)
> > rtc_cmos 00:01: no alarms, y3k, 114 bytes nvram
> > device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> > device-mapper: uevent: version 1.0.3
> > device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
> > efifb: probing for efifb
> > efifb: cannot reserve video memory at 0x60000000
> > ------------[ cut here ]------------
> > ioremap on RAM at 0x0000000060000000 - 0x00000000607e8fff
> > WARNING: CPU: 7 PID: 1 at arch/x86/mm/ioremap.c:210 __ioremap_caller+0x332/0x350
>
> That's this warning:
>
>         /*
>          * Don't allow anybody to remap normal RAM that we're using..
>          */
>         if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
>                 WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
>                           &phys_addr, &last_addr);
>                 return NULL;
>         }
>
>
> > Modules linked in:
> > CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> > Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> > RIP: e030:__ioremap_caller+0x332/0x350
> > Code: e8 c3 ca ff ff 49 09 c6 e9 32 fe ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 35 f2 5d 82 c6 05 e8 7b a9 01 01 e8 48 39 be 00 <0f> 0b 45 31 e4 e9 ac fe ff ff e8 ff f5 c3 00 66 66 2e 0f 1f 84 00
> > RSP: e02b:ffffc9004007bb00 EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: 00000000007e9000 RCX: ffffffff82915ca8
> > RDX: c0000000ffffdfff RSI: 0000000000000000 RDI: ffffffff82865ca0
> > RBP: 0000000060000000 R08: 0000000000000000 R09: ffffc9004007b948
> > R10: ffffc9004007b940 R11: ffffffff82945ce8 R12: 0000000000000001
> > R13: 00000000007e9000 R14: 00000000007e9000 R15: ffffffff81c8f772
> > FS:  0000000000000000(0000) GS:ffff8881407c0000(0000) knlGS:0000000000000000
> > CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 0000000002810000 CR4: 0000000000050660
> > Call Trace:
> >  efifb_probe.cold+0x2e6/0x688
>
> Why is this probing EFIFB at resume? Josef is that hibernate or suspend
> to RAM?
>
> >  platform_probe+0x3f/0x90
> >  call_driver_probe+0x24/0xc0
> >  really_probe+0x1e7/0x310
> >  __driver_probe_device+0xfe/0x180
> >  driver_probe_device+0x1e/0x90
> >  __device_attach_driver+0x72/0xe0
> >  ? driver_allows_async_probing+0x50/0x50
> >  ? driver_allows_async_probing+0x50/0x50
> >  bus_for_each_drv+0x8f/0xd0
> >  __device_attach+0xe9/0x1f0
> >  bus_probe_device+0x8e/0xa0
> >  device_add+0x3fb/0x630
> >  platform_device_add+0x102/0x230
> >  sysfb_init+0xea/0x141
> >  ? firmware_map_add_early+0xb8/0xb8
> >  do_one_initcall+0x57/0x200
> >  do_initcalls+0x109/0x166
> >  kernel_init_freeable+0x23c/0x2bd
> >  ? rest_init+0xc0/0xc0
> >  kernel_init+0x16/0x120
> >  ret_from_fork+0x22/0x30
> > ---[ end trace b068d3cd1b7f5f49 ]---
> > efifb: abort, cannot remap video memory 0x7e9000 @ 0x60000000
> > efi-framebuffer: probe of efi-framebuffer.0 failed with error -5
> > --
> > printk: Suspending console(s) (use no_console_suspend to debug)
> > [drm] free PSP TMR buffer
> > PM: suspend devices took 0.428 seconds
> > ACPI: EC: interrupt blocked
> > ACPI: PM: Preparing to enter system sleep state S3
> > ACPI: EC: event blocked
> > ACPI: EC: EC stopped
> > ACPI: PM: Saving platform NVS memory
> > Disabling non-boot CPUs ...
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522  switch_mm_irqs_off+0x3c5/0x400
>
>         if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
>
> > Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> > Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> > RIP: e030:switch_mm_irqs_off+0x3c5/0x400
> > Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
> > RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
> > RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
> > RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
> > RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
> > R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
> > FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
> > CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
> > CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
> > Call Trace:
> >  switch_mm+0x1c/0x30
> >  idle_task_exit+0x55/0x60
> >  play_dead_common+0xa/0x20
> >  xen_pv_play_dead+0xa/0x60
>
> So this is when bringing the non boot CPUs down and the switch_mm() code
> discovers inconsistency between CR3 and the expected value.
>
> Would probably be interesting to print the actual values, but XEN folks
> might have an idea.
>
> >  do_idle+0xd1/0xe0
> >  cpu_startup_entry+0x19/0x20
> >  asm_cpu_bringup_and_idle+0x5/0x1000
> > ---[ end trace b068d3cd1b7f5f4b ]---
> > smpboot: CPU 1 is now offline
> > smpboot: CPU 2 is now offline
> > smpboot: CPU 3 is now offline
> > smpboot: CPU 4 is now offline
> > smpboot: CPU 5 is now offline
> > smpboot: CPU 6 is now offline
> > smpboot: CPU 7 is now offline
> > ACPI: PM: Low-level resume complete
> > ACPI: EC: EC started
> > ACPI: PM: Restoring platform NVS memory
> > xen_acpi_processor: Uploading Xen processor PM info
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
> > xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU11
> > --
> > CPU2 is up
> > installing Xen timer for CPU 3
> > cpu 3 spinlock event irq 79
> > [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> > ACPI: \_SB_.PLTF.C003: Found 3 idle states
> > ACPI: FW issue: working around C-state latencies out of order
> > CPU3 is up
> > ------------[ cut here ]------------
> > cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
> > installing Xen timer for CPU 4
> > WARNING: CPU: 3 PID: 455 at kernel/sched/fair.c:3339  __update_blocked_fair+0x49b/0x4b0
>
>         /*
>          * _avg must be null when _sum are null because _avg = _sum / divider
>          * Make sure that rounding and/or propagation of PELT values never
>          * break this.
>          */
>         SCHED_WARN_ON(cfs_rq->avg.load_avg ||
>                       cfs_rq->avg.util_avg ||
>                       cfs_rq->avg.runnable_avg);
>
> PeterZ, does that ring any bell?

This warning raises when the PELT signal is not propagated correctly
in the cfs hierarchy which can impact the fairness. This is a bit
strange to get this warning during a resume. CPU 3 has just been put
online so no propagation already  happened and everything should be
synced

>
> > Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> > CPU: 3 PID: 455 Comm: kworker/3:2 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
> > Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
> > Workqueue:  0x0 (events)
> > RIP: e030:__update_blocked_fair+0x49b/0x4b0
> > Code: 6b fd ff ff 49 8b 96 48 01 00 00 48 89 90 50 09 00 00 e9 ff fc ff ff 48 c7 c7 10 7a 5e 82 c6 05 f3 35 9e 01 01 e8 1f f3 b2 00 <0f> 0b 41 8b 86 38 01 00 00 e9 c6 fc ff ff 0f 1f 80 00 00 00 00 0f
> > RSP: e02b:ffffc900410d7ce0 EFLAGS: 00010082
> > RAX: 0000000000000000 RBX: 0000000000000018 RCX: ffff8881406d8a08
> > RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8881406d8a00
> > RBP: ffff8881406e9800 R08: 0000000000000048 R09: ffffc900410d7c78
> > R10: 0000000000000049 R11: 000000002d2d2d2d R12: ffff8881406e9f80
> > R13: ffff8881406e9e40 R14: ffff8881406e96c0 R15: 0000000000000000
> > FS:  0000000000000000(0000) GS:ffff8881406c0000(0000) knlGS:0000000000000000
> > CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000782e51820000 CR3: 0000000002810000 CR4: 0000000000050660
> > Call Trace:
> >  update_blocked_averages+0xa8/0x180
> >  newidle_balance+0x175/0x380
> >  pick_next_task_fair+0x39/0x3e0
> >  pick_next_task+0x4c/0xbd0
> >  ? dequeue_task_fair+0xba/0x390
> >  __schedule+0x13a/0x570
> >  schedule+0x44/0xa0
> >  worker_thread+0xc0/0x320
> >  ? process_one_work+0x390/0x390
> >  kthread+0x10f/0x130
> >  ? set_kthread_struct+0x40/0x40
> >  ret_from_fork+0x22/0x30
> > ---[ end trace b068d3cd1b7f5f4c ]---
> > cpu 4 spinlock event irq 85
> > [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> > ACPI: \_SB_.PLTF.C004: Found 3 idle states
> > ACPI: FW issue: working around C-state latencies out of order
> > CPU4 is up
> > installing Xen timer for CPU 5
> > cpu 5 spinlock event irq 91
> > [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
> > ACPI: \_SB_.PLTF.C005: Found 3 idle states
> > ACPI: FW issue: working around C-state latencies out of order
> > CPU5 is up

Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 5 months ago
On 11/4/21 00:45, Thomas Gleixner wrote:
> On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
>
> CC+: EFIFB and scheduler folks
>
>> On 10/27/21 14:01, Josef Johansson wrote:
>>
>> printk: Suspending console(s) (use no_console_suspend to debug)
>> [drm] free PSP TMR buffer
>> PM: suspend devices took 0.428 seconds
>> ACPI: EC: interrupt blocked
>> ACPI: PM: Preparing to enter system sleep state S3
>> ACPI: EC: event blocked
>> ACPI: EC: EC stopped
>> ACPI: PM: Saving platform NVS memory
>> Disabling non-boot CPUs ...
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522  switch_mm_irqs_off+0x3c5/0x400
> 	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
>
>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
>> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
>> RIP: e030:switch_mm_irqs_off+0x3c5/0x400
>> Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
>> RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
>> RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
>> RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
>> RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
>> R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
>> FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
>> CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>> CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
>> Call Trace:
>>  switch_mm+0x1c/0x30
>>  idle_task_exit+0x55/0x60
>>  play_dead_common+0xa/0x20
>>  xen_pv_play_dead+0xa/0x60
> So this is when bringing the non boot CPUs down and the switch_mm() code
> discovers inconsistency between CR3 and the expected value.
>
> Would probably be interesting to print the actual values, but XEN folks
> might have an idea.
I can install some print-statements showing some more info here.
I guess I will be getting memory addresses, we already know that CR3 is
0000000002810000

If you have any hints on how to do an effective print statement for this
please do say so :)
I'll try though and see what I find out.
>>  do_idle+0xd1/0xe0
>>  cpu_startup_entry+0x19/0x20
>>  asm_cpu_bringup_and_idle+0x5/0x1000
>> ---[ end trace b068d3cd1b7f5f4b ]---
>> smpboot: CPU 1 is now offline
>> smpboot: CPU 2 is now offline
>> smpboot: CPU 3 is now offline
>> smpboot: CPU 4 is now offline
>> smpboot: CPU 5 is now offline
>> smpboot: CPU 6 is now offline
>> smpboot: CPU 7 is now offline
>> ACPI: PM: Low-level resume complete
>> ACPI: EC: EC started
>> ACPI: PM: Restoring platform NVS memory
>> xen_acpi_processor: Uploading Xen processor PM info
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
>>


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by Josef Johansson 2 years, 5 months ago
On 11/10/21 21:30, Josef Johansson wrote:
> On 11/4/21 00:45, Thomas Gleixner wrote:
>> On Wed, Oct 27 2021 at 17:29, Josef Johansson wrote:
>>
>> CC+: EFIFB and scheduler folks
>>
>>> On 10/27/21 14:01, Josef Johansson wrote:
>>>
>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>> [drm] free PSP TMR buffer
>>> PM: suspend devices took 0.428 seconds
>>> ACPI: EC: interrupt blocked
>>> ACPI: PM: Preparing to enter system sleep state S3
>>> ACPI: EC: event blocked
>>> ACPI: EC: EC stopped
>>> ACPI: PM: Saving platform NVS memory
>>> Disabling non-boot CPUs ...
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522  switch_mm_irqs_off+0x3c5/0x400
>> 	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
>>
>>> Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer nf_tables nfnetlink vfat fat intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_rapl_common pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops joydev videobuf2_v4l2 sp5100_tco k10temp videobuf2_common i2c_piix4 iwlwifi videodev mc cfg80211 thinkpad_acpi ipmi_devintf ucsi_acpi platform_profile typec_ucsi ledtrig_audio ipmi_msghandler r8169 rfkill typec snd wmi soundcore video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ghash_clmulni_intel ttm serio_raw drm_kms_helper cec sdhci_pci cqhci sdhci xhci_pci drm xhci_pci_renesas nvme xhci_hcd ehci_pci mmc_core ehci_hcd nvme_core xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
>>> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W        --------- ---  5.15.0-0.rc7.0.fc32.qubes.x86_64 #1
>>> Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET65W(1.34 ) 06/17/2021
>>> RIP: e030:switch_mm_irqs_off+0x3c5/0x400
>>> Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 60 1d 08 81 e8 a0 f3 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad fc ff ff 0f 0b e9 31 fe ff ff 0f 0b e9
>>> RSP: e02b:ffffc900400f3eb0 EFLAGS: 00010006
>>> RAX: 00000001336c6000 RBX: ffff888140660000 RCX: 0000000000000040
>>> RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff8881b36c6000
>>> RBP: ffffffff829d91c0 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000008 R11: 0000000000000000 R12: ffff888104e88440
>>> R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
>>> FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
>>> CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
>>> CR2: 000060b7d78bf198 CR3: 0000000002810000 CR4: 0000000000050660
>>> Call Trace:
>>>  switch_mm+0x1c/0x30
>>>  idle_task_exit+0x55/0x60
>>>  play_dead_common+0xa/0x20
>>>  xen_pv_play_dead+0xa/0x60
>> So this is when bringing the non boot CPUs down and the switch_mm() code
>> discovers inconsistency between CR3 and the expected value.
>>
>> Would probably be interesting to print the actual values, but XEN folks
>> might have an idea.
> I can install some print-statements showing some more info here.
> I guess I will be getting memory addresses, we already know that CR3 is
> 0000000002810000
>
> If you have any hints on how to do an effective print statement for this
> please do say so :)
> I'll try though and see what I find out.

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 59ba2968af1b..511792852e9e 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -520,6 +520,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 */
 #ifdef CONFIG_DEBUG_VM
 	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+		printk("josef-debug: cr3: %lx, build_cr3: %lx, (%px, %x)",
+			__read_cr3(),
+			build_cr3(real_prev->pgd, prev_asid),
+			real_prev->pgd, prev_asid);
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.

this patch gave me the three values which where already known,
__read_cr3 = CR3 = 0000000002810000
build_cr3 = RAX = 00000001336c6000
real_prev->pgd = RDI = ffff8881b36c6000
prev_asid = RSI = 0

Not sure what conclusions I should draw though.

>>>  do_idle+0xd1/0xe0
>>>  cpu_startup_entry+0x19/0x20
>>>  asm_cpu_bringup_and_idle+0x5/0x1000
>>> ---[ end trace b068d3cd1b7f5f4b ]---
>>> smpboot: CPU 1 is now offline
>>> smpboot: CPU 2 is now offline
>>> smpboot: CPU 3 is now offline
>>> smpboot: CPU 4 is now offline
>>> smpboot: CPU 5 is now offline
>>> smpboot: CPU 6 is now offline
>>> smpboot: CPU 7 is now offline
>>> ACPI: PM: Low-level resume complete
>>> ACPI: EC: EC started
>>> ACPI: PM: Restoring platform NVS memory
>>> xen_acpi_processor: Uploading Xen processor PM info
>>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
>>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU3
>>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU5
>>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU7
>>> xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU9
>>>


Re: [PATCH] PCI/MSI: Move non-mask check back into low level accessors
Posted by David Woodhouse 2 years, 6 months ago

On 27 October 2021 10:50:15 BST, Thomas Gleixner <tglx@linutronix.de> wrote:
>The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
>low level accessors into the higher level mask/unmask functions.
>
>This missed the fact that these accessors can be invoked from other places
>as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
>also violates the virtual MSIX and the msi_attrib.maskbit protections.

Not just PV. It's Xen HVM guests too.

I'll also give it a spin on both Xen and not-Xen. Thanks.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by David Woodhouse 2 years, 6 months ago
On Sun, 2021-10-24 at 21:25 -0400, Jason Andryuk wrote:
> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
> functions") introduce functions pci_msi_update_mask() and
> pci_msix_write_vector_ctrl() that is missing checks for
> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
> new mask/unmask functions").  The checks are in place at the high level
> __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
> directly to the helpers.
> 
> Push the pci_msi_ignore_mask check down to the functions that make
> the actual writes.  This keeps the logic local to the writes that need
> to be bypassed.
> 
> With Xen PV, the hypervisor is responsible for masking and unmasking the
> interrupts, which pci_msi_ignore_mask is used to indicate.

This isn't just for Xen PV; Xen HVM guests let Xen unmask the MSI for
them too.
Re: [PATCH] PCI/MSI: Fix masking MSI/MSI-X on Xen PV
Posted by Jason Andryuk 2 years, 6 months ago
On Mon, Oct 25, 2021 at 3:44 AM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Sun, 2021-10-24 at 21:25 -0400, Jason Andryuk wrote:
> > commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
> > functions") introduce functions pci_msi_update_mask() and
> > pci_msix_write_vector_ctrl() that is missing checks for
> > pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
> > new mask/unmask functions").  The checks are in place at the high level
> > __pci_msi_mask_desc()/__pci_msi_unmask_desc(), but some functions call
> > directly to the helpers.
> >
> > Push the pci_msi_ignore_mask check down to the functions that make
> > the actual writes.  This keeps the logic local to the writes that need
> > to be bypassed.
> >
> > With Xen PV, the hypervisor is responsible for masking and unmasking the
> > interrupts, which pci_msi_ignore_mask is used to indicate.
>
> This isn't just for Xen PV; Xen HVM guests let Xen unmask the MSI for
> them too.

Ah, that makes sense that Xen handles both.  I was repeating another
commit message's statement.  Oh, it looks like pci_msi_ignore_mask is
PV-specific.

Regards,
Jason