[PATCH V3] PCI/MSI: Fix MSI hwirq truncation

Vidya Sagar posted 1 patch 1 year, 11 months ago
There is a newer version of this series
drivers/pci/msi/irqdomain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH V3] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 11 months ago
While calculating the hwirq number for an MSI interrupt, the higher
bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
number gets truncated because of the shifted value casting to return
type of pci_domain_nr() which is 'int'. This for example is resulting
in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.

So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
it to calculate hwirq number.

Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain")
Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
V3:
* Addressed review comments from Thomas Gleixner
* Added Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>

V2:
* Added Fixes tag

 drivers/pci/msi/irqdomain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index c8be056c248d..cfd84a899c82 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 
 	return (irq_hw_number_t)desc->msi_index |
 		pci_dev_id(dev) << 11 |
-		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+		((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
 }
 
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
-- 
2.25.1
Re: [PATCH V3] PCI/MSI: Fix MSI hwirq truncation
Posted by Thomas Gleixner 1 year, 11 months ago
On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote:
> While calculating the hwirq number for an MSI interrupt, the higher
> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
> number gets truncated because of the shifted value casting to return
> type of pci_domain_nr() which is 'int'. This for example is resulting
> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.
>
> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
> it to calculate hwirq number.

This still does not explain that this fixes it only on 64-bit platforms
and why we don't care for 32-bit systems.
Re: [PATCH V3] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 11 months ago

On 1/12/2024 9:23 PM, Thomas Gleixner wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote:
>> While calculating the hwirq number for an MSI interrupt, the higher
>> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
>> number gets truncated because of the shifted value casting to return
>> type of pci_domain_nr() which is 'int'. This for example is resulting
>> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.
>>
>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
>> it to calculate hwirq number.
> 
> This still does not explain that this fixes it only on 64-bit platforms
> and why we don't care for 32-bit systems.
Agree that this fixes the issue only on 64-bit platforms. It doesn't
change the behavior on 32-bit platforms. My understanding is that the
issue surfaces only if there are too many PCIe controllers in the system
which usually is the case in modern server systems and it is arguable if
the server systems really run 32-bit kernels.

One way to fix it for both 32-bit and 64-bit systems is by changing the
type of 'hwirq' to u64. This may cause two memory reads in 32-bit
systems whenever 'hwirq' is accessed and that may intern cause some perf
impact?? Is this the way you think I should be handling it?
>
Re: [PATCH V3] PCI/MSI: Fix MSI hwirq truncation
Posted by Thomas Gleixner 1 year, 11 months ago
On Fri, Jan 12 2024 at 23:03, Vidya Sagar wrote:
> On 1/12/2024 9:23 PM, Thomas Gleixner wrote:
>> On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote:
>>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
>>> it to calculate hwirq number.
>> 
>> This still does not explain that this fixes it only on 64-bit platforms
>> and why we don't care for 32-bit systems.
> Agree that this fixes the issue only on 64-bit platforms. It doesn't
> change the behavior on 32-bit platforms. My understanding is that the
> issue surfaces only if there are too many PCIe controllers in the system
> which usually is the case in modern server systems and it is arguable if
> the server systems really run 32-bit kernels.

Arguably people who do that can keep the pieces.

> One way to fix it for both 32-bit and 64-bit systems is by changing the
> type of 'hwirq' to u64. This may cause two memory reads in 32-bit
> systems whenever 'hwirq' is accessed and that may intern cause some perf
> impact?? Is this the way you think I should be handling it?

No. Leave it as is. What I'm asking for is that it's properly documented
in the changelog.

Thanks,

        tglx
Re: [PATCH V3] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 11 months ago

On 1/15/2024 3:31 PM, Thomas Gleixner wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Jan 12 2024 at 23:03, Vidya Sagar wrote:
>> On 1/12/2024 9:23 PM, Thomas Gleixner wrote:
>>> On Thu, Jan 11 2024 at 10:58, Vidya Sagar wrote:
>>>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
>>>> it to calculate hwirq number.
>>>
>>> This still does not explain that this fixes it only on 64-bit platforms
>>> and why we don't care for 32-bit systems.
>> Agree that this fixes the issue only on 64-bit platforms. It doesn't
>> change the behavior on 32-bit platforms. My understanding is that the
>> issue surfaces only if there are too many PCIe controllers in the system
>> which usually is the case in modern server systems and it is arguable if
>> the server systems really run 32-bit kernels.
> 
> Arguably people who do that can keep the pieces.
> 
>> One way to fix it for both 32-bit and 64-bit systems is by changing the
>> type of 'hwirq' to u64. This may cause two memory reads in 32-bit
>> systems whenever 'hwirq' is accessed and that may intern cause some perf
>> impact?? Is this the way you think I should be handling it?
> 
> No. Leave it as is. What I'm asking for is that it's properly documented
> in the changelog.
Sure. I'll add this extra information in the change log.

> 
> Thanks,
> 
>          tglx
>
[PATCH V4] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 11 months ago
While calculating the hwirq number for an MSI interrupt, the higher
bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
number gets truncated because of the shifted value casting to return
type of pci_domain_nr() which is 'int'. This for example is resulting
in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.

So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
it to calculate hwirq number. Please note that this fixes the issue only
on 64-bit systems and doesn't change the behavior in 32-bit systems i.e.
the 32-bit systems continue to have the issue. Since the issue surfaces
only if there are too many PCIe controllers in the system which usually
is the case in modern server systems and they don't tend to run 32-bit
kernels.

Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain")
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
V4:
* Added extra information in the change log about the impact of this patch
  in a 32-bit system as suggested by Thomas

V3:
* Addressed review comments from Thomas Gleixner
* Added Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>

V2:
* Added Fixes tag

 drivers/pci/msi/irqdomain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index c8be056c248d..cfd84a899c82 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 
 	return (irq_hw_number_t)desc->msi_index |
 		pci_dev_id(dev) << 11 |
-		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+		((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
 }
 
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
-- 
2.25.1
Re: [PATCH V4] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 11 months ago
Hi Thomas,
Does this patch look fine to you?
If yes, would you mind giving an Ack?

Thanks,
Vidya Sagar

On 1/15/2024 7:26 PM, Vidya Sagar wrote:
> While calculating the hwirq number for an MSI interrupt, the higher
> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
> number gets truncated because of the shifted value casting to return
> type of pci_domain_nr() which is 'int'. This for example is resulting
> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.
> 
> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
> it to calculate hwirq number. Please note that this fixes the issue only
> on 64-bit systems and doesn't change the behavior in 32-bit systems i.e.
> the 32-bit systems continue to have the issue. Since the issue surfaces
> only if there are too many PCIe controllers in the system which usually
> is the case in modern server systems and they don't tend to run 32-bit
> kernels.
> 
> Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain")
> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> V4:
> * Added extra information in the change log about the impact of this patch
>    in a 32-bit system as suggested by Thomas
> 
> V3:
> * Addressed review comments from Thomas Gleixner
> * Added Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>
> 
> V2:
> * Added Fixes tag
> 
>   drivers/pci/msi/irqdomain.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index c8be056c248d..cfd84a899c82 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
>   
>   	return (irq_hw_number_t)desc->msi_index |
>   		pci_dev_id(dev) << 11 |
> -		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
> +		((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
>   }
>   
>   static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
Re: [PATCH V4] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 10 months ago
Hi Thomas,
Sorry to bother you.
Would you mind giving an Ack to this patch?

Thanks,
Vidya Sagar

On 1/23/2024 9:31 PM, Vidya Sagar wrote:
> Hi Thomas,
> Does this patch look fine to you?
> If yes, would you mind giving an Ack?
> 
> Thanks,
> Vidya Sagar
> 
> On 1/15/2024 7:26 PM, Vidya Sagar wrote:
>> While calculating the hwirq number for an MSI interrupt, the higher
>> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
>> number gets truncated because of the shifted value casting to return
>> type of pci_domain_nr() which is 'int'. This for example is resulting
>> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.
>>
>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
>> it to calculate hwirq number. Please note that this fixes the issue only
>> on 64-bit systems and doesn't change the behavior in 32-bit systems i.e.
>> the 32-bit systems continue to have the issue. Since the issue surfaces
>> only if there are too many PCIe controllers in the system which usually
>> is the case in modern server systems and they don't tend to run 32-bit
>> kernels.
>>
>> Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy 
>> irqdomain")
>> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>> V4:
>> * Added extra information in the change log about the impact of this 
>> patch
>>    in a 32-bit system as suggested by Thomas
>>
>> V3:
>> * Addressed review comments from Thomas Gleixner
>> * Added Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> V2:
>> * Added Fixes tag
>>
>>   drivers/pci/msi/irqdomain.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
>> index c8be056c248d..cfd84a899c82 100644
>> --- a/drivers/pci/msi/irqdomain.c
>> +++ b/drivers/pci/msi/irqdomain.c
>> @@ -61,7 +61,7 @@ static irq_hw_number_t 
>> pci_msi_domain_calc_hwirq(struct msi_desc *desc)
>>       return (irq_hw_number_t)desc->msi_index |
>>           pci_dev_id(dev) << 11 |
>> -        (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
>> +        ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
>>   }
>>   static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
Re: [PATCH V4] PCI/MSI: Fix MSI hwirq truncation
Posted by Vidya Sagar 1 year, 10 months ago
Hi Thomas / Bjorn,
Can you please guide me on getting this patch merged?

Thanks,
Vidya Sagar

On 1/31/2024 8:45 AM, Vidya Sagar wrote:
> Hi Thomas,
> Sorry to bother you.
> Would you mind giving an Ack to this patch?
> 
> Thanks,
> Vidya Sagar
> 
> On 1/23/2024 9:31 PM, Vidya Sagar wrote:
>> Hi Thomas,
>> Does this patch look fine to you?
>> If yes, would you mind giving an Ack?
>>
>> Thanks,
>> Vidya Sagar
>>
>> On 1/15/2024 7:26 PM, Vidya Sagar wrote:
>>> While calculating the hwirq number for an MSI interrupt, the higher
>>> bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain
>>> number gets truncated because of the shifted value casting to return
>>> type of pci_domain_nr() which is 'int'. This for example is resulting
>>> in same hwirq number for devices 0019:00:00.0 and 0039:00:00.0.
>>>
>>> So, cast the PCI domain number to 'irq_hw_number_t' before left shifting
>>> it to calculate hwirq number. Please note that this fixes the issue only
>>> on 64-bit systems and doesn't change the behavior in 32-bit systems i.e.
>>> the 32-bit systems continue to have the issue. Since the issue surfaces
>>> only if there are too many PCIe controllers in the system which usually
>>> is the case in modern server systems and they don't tend to run 32-bit
>>> kernels.
>>>
>>> Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy 
>>> irqdomain")
>>> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>> ---
>>> V4:
>>> * Added extra information in the change log about the impact of this 
>>> patch
>>>    in a 32-bit system as suggested by Thomas
>>>
>>> V3:
>>> * Addressed review comments from Thomas Gleixner
>>> * Added Tested-By: Shanker Donthineni <sdonthineni@nvidia.com>
>>>
>>> V2:
>>> * Added Fixes tag
>>>
>>>   drivers/pci/msi/irqdomain.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
>>> index c8be056c248d..cfd84a899c82 100644
>>> --- a/drivers/pci/msi/irqdomain.c
>>> +++ b/drivers/pci/msi/irqdomain.c
>>> @@ -61,7 +61,7 @@ static irq_hw_number_t 
>>> pci_msi_domain_calc_hwirq(struct msi_desc *desc)
>>>       return (irq_hw_number_t)desc->msi_index |
>>>           pci_dev_id(dev) << 11 |
>>> -        (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
>>> +        ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 
>>> 27;
>>>   }
>>>   static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
Re: [PATCH V4] PCI/MSI: Fix MSI hwirq truncation
Posted by Thomas Gleixner 1 year, 10 months ago
On Wed, Feb 07 2024 at 12:29, Vidya Sagar wrote:
> Hi Thomas / Bjorn,
> Can you please guide me on getting this patch merged?

It's in my backlog...
[tip: irq/urgent] PCI/MSI: Prevent MSI hardware interrupt number truncation
Posted by tip-bot2 for Vidya Sagar 1 year, 10 months ago
The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     db744ddd59be798c2627efbfc71f707f5a935a40
Gitweb:        https://git.kernel.org/tip/db744ddd59be798c2627efbfc71f707f5a935a40
Author:        Vidya Sagar <vidyas@nvidia.com>
AuthorDate:    Mon, 15 Jan 2024 19:26:49 +05:30
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Feb 2024 16:11:01 +01:00

PCI/MSI: Prevent MSI hardware interrupt number truncation

While calculating the hardware interrupt number for a MSI interrupt, the
higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI
domain number gets truncated because of the shifted value casting to return
type of pci_domain_nr() which is 'int'. This for example is resulting in
same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0.

To address this cast the PCI domain number to 'irq_hw_number_t' before left
shifting it to calculate the hardware interrupt number.

Please note that this fixes the issue only on 64-bit systems and doesn't
change the behavior for 32-bit systems i.e. the 32-bit systems continue to
have the issue. Since the issue surfaces only if there are too many PCIe
controllers in the system which usually is the case in modern server
systems and they don't tend to run 32-bit kernels.

Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com
---
 drivers/pci/msi/irqdomain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index c8be056..cfd84a8 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 
 	return (irq_hw_number_t)desc->msi_index |
 		pci_dev_id(dev) << 11 |
-		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+		((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
 }
 
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,