During the construction of dmask value, it gets shifted by
(32 - msi->vectors) bits. If msi->vectors is 0, the result of the shift
becomes undefined due to shifting by a size of the type. While this
works fine on x86, on ARM the resulting mask becomes 0xFFFFFFFF, which
is incorrect.
Fix this by adding an explicit check for msi->vectors == 0.
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
xen/drivers/vpci/msi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 7bda47e7fc..787296fd42 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -172,7 +172,7 @@ static void cf_check mask_write(
struct vpci_msi *msi = data;
uint32_t dmask = msi->mask ^ val;
- if ( !dmask )
+ if ( !dmask || msi->vectors == 0 )
return;
if ( msi->enabled )
--
2.34.1
On 06.11.2024 09:05, Mykyta Poturai wrote: > During the construction of dmask value, it gets shifted by > (32 - msi->vectors) bits. If msi->vectors is 0, the result of the shift > becomes undefined due to shifting by a size of the type. While this > works fine on x86, Oh, also - what made you think this would be fine on x86? Afaict ... > on ARM the resulting mask becomes 0xFFFFFFFF, which > is incorrect. ... the exact same thing would happen (if msi->vectors indeed could ever be zero) there, due to the type of the value shifted being unsigned int, not unsigned long. Jan
On Wed, Nov 06, 2024 at 08:05:19AM +0000, Mykyta Poturai wrote: > During the construction of dmask value, it gets shifted by > (32 - msi->vectors) bits. If msi->vectors is 0, the result of the shift > becomes undefined due to shifting by a size of the type. While this > works fine on x86, on ARM the resulting mask becomes 0xFFFFFFFF, which > is incorrect. > > Fix this by adding an explicit check for msi->vectors == 0. I would also add: Fixes: 188fa82305e7 ('xen/vpci: Improve code generation in mask_write()') > Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com> > --- > xen/drivers/vpci/msi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c > index 7bda47e7fc..787296fd42 100644 > --- a/xen/drivers/vpci/msi.c > +++ b/xen/drivers/vpci/msi.c > @@ -172,7 +172,7 @@ static void cf_check mask_write( > struct vpci_msi *msi = data; > uint32_t dmask = msi->mask ^ val; > > - if ( !dmask ) > + if ( !dmask || msi->vectors == 0 ) > return; I'm afraid returning this early is not correct - the cached mask needs to be updated, even if there are no vectors currently enabled. The adjustment likely needs to be: if ( msi->enabled && msi->vectors ) ... So that the update of msi->mask is not skipped. Thanks, Roger.
On Wed, Nov 06, 2024 at 10:00:07AM +0100, Roger Pau Monné wrote: > On Wed, Nov 06, 2024 at 08:05:19AM +0000, Mykyta Poturai wrote: > > During the construction of dmask value, it gets shifted by > > (32 - msi->vectors) bits. If msi->vectors is 0, the result of the shift > > becomes undefined due to shifting by a size of the type. While this > > works fine on x86, on ARM the resulting mask becomes 0xFFFFFFFF, which > > is incorrect. > > > > Fix this by adding an explicit check for msi->vectors == 0. Wait - how can msi->vectors ever be 0? AFAICT there's no way in the MSI logic to configure 0 vectors, there will always be at least 1 vector enabled. Maybe what you want, if this fix is for compliance reasons, is an assert unreachable that msi->vectors > 0? Thanks, Roger.
On 06.11.2024 10:07, Roger Pau Monné wrote: > On Wed, Nov 06, 2024 at 10:00:07AM +0100, Roger Pau Monné wrote: >> On Wed, Nov 06, 2024 at 08:05:19AM +0000, Mykyta Poturai wrote: >>> During the construction of dmask value, it gets shifted by >>> (32 - msi->vectors) bits. If msi->vectors is 0, the result of the shift >>> becomes undefined due to shifting by a size of the type. While this >>> works fine on x86, on ARM the resulting mask becomes 0xFFFFFFFF, which >>> is incorrect. >>> >>> Fix this by adding an explicit check for msi->vectors == 0. > > Wait - how can msi->vectors ever be 0? AFAICT there's no way in the > MSI logic to configure 0 vectors, there will always be at least 1 vector > enabled. > > Maybe what you want, if this fix is for compliance reasons, is an > assert unreachable that msi->vectors > 0? Which raises a question as to (lack of) context: Was this spotted by mere code inspection? Or by a static analyzer? If so, which one? That may help figure whether some workaround like the one suggested is necessary, or whether it can simply be left alone. Jan
On 06.11.2024 10:07, Roger Pau Monné wrote:> > Wait - how can msi->vectors ever be 0? AFAICT there's no way in the > MSI logic to configure 0 vectors, there will always be at least 1 vector > enabled. > > Maybe what you want, if this fix is for compliance reasons, is an > assert unreachable that msi->vectors > 0? I did some investigation and figured out that the value of 0 is being set by guest writing to msi_control_reg. As far as I understand, the control_write() function only checks that vectors are not greater than the maximum allowed value, but does not check for 0. So I am not sure if this is a valid scenario or not. Is this incorrect guest behavior and it should be forbidden from setting vectors to 0 and enable to 1 at the same time? On 06.11.24 13:31, Jan Beulich wrote: > > Which raises a question as to (lack of) context: Was this spotted by > mere code inspection? Or by a static analyzer? If so, which one? That > may help figure whether some workaround like the one suggested is > necessary, or whether it can simply be left alone. > > Jan I have found this while porting the PCI passthrough patches to Xen 4.20. After checking the previous version which was on 4.18 it seems that on it msi->vectors are also set to 0 but nothing breaks due to it being the explicit end of the loop. So I have assumed that setting it to 0 is a valid scenario. I am testing all of this on Rcar Gen4 boards. Mykyta
On 06.11.2024 13:26, Mykyta Poturai wrote: > On 06.11.2024 10:07, Roger Pau Monné wrote:> >> Wait - how can msi->vectors ever be 0? AFAICT there's no way in the >> MSI logic to configure 0 vectors, there will always be at least 1 vector >> enabled. >> >> Maybe what you want, if this fix is for compliance reasons, is an >> assert unreachable that msi->vectors > 0? > > I did some investigation and figured out that the value of 0 is being > set by guest writing to msi_control_reg. As far as I understand, the > control_write() function only checks that vectors are not greater than > the maximum allowed value, but does not check for 0. How that? How could it even check for 0, when 0 isn't possible? Quoting the code there: unsigned int vectors = min_t(uint8_t, 1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE), pdev->msi_maxvec); "val" in the guest written value. As that's used as a shift count, how could 0 result there? The only way I can see 0 ending up in vectors is when pdev->msi_maxvec was still zero. Yet that's then a bug in device initialization. Jan
On Thu, Nov 07, 2024 at 10:25:02AM +0100, Jan Beulich wrote: > On 06.11.2024 13:26, Mykyta Poturai wrote: > > On 06.11.2024 10:07, Roger Pau Monné wrote:> > >> Wait - how can msi->vectors ever be 0? AFAICT there's no way in the > >> MSI logic to configure 0 vectors, there will always be at least 1 vector > >> enabled. > >> > >> Maybe what you want, if this fix is for compliance reasons, is an > >> assert unreachable that msi->vectors > 0? > > > > I did some investigation and figured out that the value of 0 is being > > set by guest writing to msi_control_reg. As far as I understand, the > > control_write() function only checks that vectors are not greater than > > the maximum allowed value, but does not check for 0. > > How that? How could it even check for 0, when 0 isn't possible? Quoting > the code there: > > unsigned int vectors = min_t(uint8_t, > 1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE), > pdev->msi_maxvec); > > "val" in the guest written value. As that's used as a shift count, how > could 0 result there? The only way I can see 0 ending up in vectors is > when pdev->msi_maxvec was still zero. Yet that's then a bug in device > initialization. See followup emails, I've arrived at the same conclusion and Mykyta confirmed it's msi_maxvec that's indeed 0. Still waiting for them to figure out why msi_maxvec is 0. Thanks, Roger.
On Wed, Nov 06, 2024 at 12:26:55PM +0000, Mykyta Poturai wrote: > On 06.11.2024 10:07, Roger Pau Monné wrote:> > > Wait - how can msi->vectors ever be 0? AFAICT there's no way in the > > MSI logic to configure 0 vectors, there will always be at least 1 vector > > enabled. > > > > Maybe what you want, if this fix is for compliance reasons, is an > > assert unreachable that msi->vectors > 0? > > I did some investigation and figured out that the value of 0 is being > set by guest writing to msi_control_reg. As far as I understand, the > control_write() function only checks that vectors are not greater than > the maximum allowed value, but does not check for 0. control_write() will set vectors to (1UL << val), so even if user provides val == 0, vectors will be 1. Can you provide an example input value of control_write() that will lead to msi->vectors == 0? Is maybe msi_maxvec not set correctly in your use case if you indeed see vectors == 0? Thanks, Roger.
On 06.11.24 14:42, Roger Pau Monné wrote: > On Wed, Nov 06, 2024 at 12:26:55PM +0000, Mykyta Poturai wrote: >> On 06.11.2024 10:07, Roger Pau Monné wrote: >>> Wait - how can msi->vectors ever be 0? AFAICT there's no way in the >>> MSI logic to configure 0 vectors, there will always be at least 1 vector >>> enabled. >>> >>> Maybe what you want, if this fix is for compliance reasons, is an >>> assert unreachable that msi->vectors > 0? >> >> I did some investigation and figured out that the value of 0 is being >> set by guest writing to msi_control_reg. As far as I understand, the >> control_write() function only checks that vectors are not greater than >> the maximum allowed value, but does not check for 0. > > control_write() will set vectors to (1UL << val), so even if user > provides val == 0, vectors will be 1. > > Can you provide an example input value of control_write() that will > lead to msi->vectors == 0? > > Is maybe msi_maxvec not set correctly in your use case if you indeed > see vectors == 0? > > Thanks, Roger. Indeed, I have checked and msi_maxvec is set to 0. Thanks for pointing this out. I will investigate further why this is happening. It is quite strange that it somehow worked on 4.18 with the same problem. I will change the check to an assert then, so if something similar happens again it can be caught earlier. Mykyta
On Wed, Nov 06, 2024 at 02:32:13PM +0000, Mykyta Poturai wrote: > On 06.11.24 14:42, Roger Pau Monné wrote: > > On Wed, Nov 06, 2024 at 12:26:55PM +0000, Mykyta Poturai wrote: > >> On 06.11.2024 10:07, Roger Pau Monné wrote: > >>> Wait - how can msi->vectors ever be 0? AFAICT there's no way in the > >>> MSI logic to configure 0 vectors, there will always be at least 1 vector > >>> enabled. > >>> > >>> Maybe what you want, if this fix is for compliance reasons, is an > >>> assert unreachable that msi->vectors > 0? > >> > >> I did some investigation and figured out that the value of 0 is being > >> set by guest writing to msi_control_reg. As far as I understand, the > >> control_write() function only checks that vectors are not greater than > >> the maximum allowed value, but does not check for 0. > > > > control_write() will set vectors to (1UL << val), so even if user > > provides val == 0, vectors will be 1. > > > > Can you provide an example input value of control_write() that will > > lead to msi->vectors == 0? > > > > Is maybe msi_maxvec not set correctly in your use case if you indeed > > see vectors == 0? > > > > Thanks, Roger. > > Indeed, I have checked and msi_maxvec is set to 0. Thanks for pointing > this out. I will investigate further why this is happening. It is quite > strange that it somehow worked on 4.18 with the same problem. Check whether pdev_msi_init() is called during device addition, as that's what initializes msi_maxvec. Another cause could be memory corruption. > I will change the check to an assert then, so if something similar > happens again it can be caught earlier. Let's try to figure out what causes msi_maxvec to be 0 in your case and then we can see how to better detect this. If msi_maxvec needs to be checked it should likely be done in init_msi(). Regards, Roger.
On 06.11.24 16:57, Roger Pau Monné wrote: > > Let's try to figure out what causes msi_maxvec to be 0 in your case > and then we can see how to better detect this. If msi_maxvec needs to > be checked it should likely be done in init_msi(). > > Regards, Roger. Hi everyone, So I have done some more investigations, and I think it finally makes sense. The real cause of my crashes was a long-standing bug in yet to be upstreamed vpci patches where the register value and offset were swapped by mistake. And this bug was hidden for a long time because mask_write skipped actually doing anything, respecting vectors = 0, so I failed to spot it from the get-go. Regarding msi_maxvec there seems to be an implicit dependency between CONFIG_HAS_VPCI and CONFIG_HAS_PCI_MSI. If HAS_PCI_MSI=n, then pdev_msi_init gets replaced with a stub and msi_maxvec remains 0, but it is still used in control_write unconditionally. I see two possible solutions to this: either adding an explicit dependency or, if msi_maxvec can't be 0 anyway, always initializing it to 1. But I am not sure which one is better. Mykyta
On Mon, Nov 11, 2024 at 12:21:32PM +0000, Mykyta Poturai wrote: > On 06.11.24 16:57, Roger Pau Monné wrote: > > > > Let's try to figure out what causes msi_maxvec to be 0 in your case > > and then we can see how to better detect this. If msi_maxvec needs to > > be checked it should likely be done in init_msi(). > > > > Regards, Roger. > > Hi everyone, > So I have done some more investigations, and I think it finally makes > sense. The real cause of my crashes was a long-standing bug in yet to be > upstreamed vpci patches where the register value and offset were swapped > by mistake. And this bug was hidden for a long time because mask_write > skipped actually doing anything, respecting vectors = 0, so I failed to > spot it from the get-go. > > Regarding msi_maxvec there seems to be an implicit dependency between > CONFIG_HAS_VPCI and CONFIG_HAS_PCI_MSI. If HAS_PCI_MSI=n, then > pdev_msi_init gets replaced with a stub and msi_maxvec remains 0, but it > is still used in control_write unconditionally. Hello, If HAS_PCI_MSI=n then vPCI shouldn't attempt to handle the MSI(-X) capabilities in the first place. However that can lead to incorrect scenarios if vPCI is used for dom0, as vPCI not supporting MSI(-X) grants dom0 unmediated access to the capability registers, which won't result in a functional system at least on x86. I think the more robust solution is for vPCI to require HAS_PCI_MSI=y. Thanks, Roger.
© 2016 - 2025 Red Hat, Inc.