[PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()

Akihiko Odaki posted 15 patches 8 months, 3 weeks ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Akihiko Odaki <akihiko.odaki@daynix.com>, Sriram Yagnaraman <sriram.yagnaraman@est.tech>, Jason Wang <jasowang@redhat.com>, Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>
There is a newer version of this series
[PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Akihiko Odaki 8 months, 3 weeks ago
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.

It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.

It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26328
Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/nvme/ctrl.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e9e..7a56e7b79b4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
     nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 }
 
-static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
-                                      uint32_t val, int len)
+static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
 {
     NvmeCtrl *n = NVME(dev);
     NvmeSecCtrlEntry *sctrl;
-    uint16_t sriov_cap = dev->exp.sriov_cap;
-    uint32_t off = address - sriov_cap;
-    int i, num_vfs;
+    int i;
 
-    if (!sriov_cap) {
-        return;
-    }
-
-    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
-        if (!(val & PCI_SRIOV_CTRL_VFE)) {
-            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
-            for (i = 0; i < num_vfs; i++) {
-                sctrl = &n->sec_ctrl_list.sec[i];
-                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
-            }
-        }
+    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
+        sctrl = &n->sec_ctrl_list.sec[i];
+        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
     }
 }
 
 static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
                                   uint32_t val, int len)
 {
-    nvme_sriov_pre_write_ctrl(dev, address, val, len);
+    uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
+
     pci_default_write_config(dev, address, val, len);
     pcie_cap_flr_write_config(dev, address, val, len);
+    nvme_sriov_post_write_config(dev, old_num_vfs);
 }
 
 static const VMStateDescription nvme_vmstate = {

-- 
2.43.1
Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Kevin Wolf 8 months, 3 weeks ago
Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> configurations to know the number of VFs being disabled due to SR-IOV
> configuration writes, but the logic was flawed and resulted in
> out-of-bound memory access.
> 
> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> VFs, but it actually doesn't in the following cases:
> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> - VFs were only partially enabled because of realization failure.
> 
> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> provides, to get the number of enabled VFs before and after SR-IOV
> configuration writes.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: CVE-2024-26328
> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/nvme/ctrl.c | 26 ++++++++------------------
>  1 file changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f026245d1e9e..7a56e7b79b4d 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
>      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
>  }
>  
> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> -                                      uint32_t val, int len)
> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
>  {
>      NvmeCtrl *n = NVME(dev);
>      NvmeSecCtrlEntry *sctrl;
> -    uint16_t sriov_cap = dev->exp.sriov_cap;
> -    uint32_t off = address - sriov_cap;
> -    int i, num_vfs;
> +    int i;
>  
> -    if (!sriov_cap) {
> -        return;
> -    }
> -
> -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> -            for (i = 0; i < num_vfs; i++) {
> -                sctrl = &n->sec_ctrl_list.sec[i];
> -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> -            }
> -        }
> +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> +        sctrl = &n->sec_ctrl_list.sec[i];
> +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>      }
>  }

Maybe I'm missing something, but if the concern is that 'i' could run
beyond the end of the array, I don't see anything that limits
pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
has. register_vfs() seems to just take whatever 16 bit value the guest
wrote without imposing additional restrictions.

If there is some mechanism that makes register_vf() fail if we exceed
the limit, maybe an assertion with a comment would be in order because
it doesn't seem obvious. I couldn't find any code that enforces it,
sriov_max_vfs only ever seems to be used as a hint for the guest.

If not, do we need another check that fails gracefully in the error
case?

Kevin
Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Kevin Wolf 8 months, 3 weeks ago
Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > configurations to know the number of VFs being disabled due to SR-IOV
> > configuration writes, but the logic was flawed and resulted in
> > out-of-bound memory access.
> > 
> > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > VFs, but it actually doesn't in the following cases:
> > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > - VFs were only partially enabled because of realization failure.
> > 
> > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > provides, to get the number of enabled VFs before and after SR-IOV
> > configuration writes.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: CVE-2024-26328
> > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/nvme/ctrl.c | 26 ++++++++------------------
> >  1 file changed, 8 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f026245d1e9e..7a56e7b79b4d 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> >      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> >  }
> >  
> > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > -                                      uint32_t val, int len)
> > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> >  {
> >      NvmeCtrl *n = NVME(dev);
> >      NvmeSecCtrlEntry *sctrl;
> > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > -    uint32_t off = address - sriov_cap;
> > -    int i, num_vfs;
> > +    int i;
> >  
> > -    if (!sriov_cap) {
> > -        return;
> > -    }
> > -
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > -            for (i = 0; i < num_vfs; i++) {
> > -                sctrl = &n->sec_ctrl_list.sec[i];
> > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > -            }
> > -        }
> > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > +        sctrl = &n->sec_ctrl_list.sec[i];
> > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> >      }
> >  }
> 
> Maybe I'm missing something, but if the concern is that 'i' could run
> beyond the end of the array, I don't see anything that limits
> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> has. register_vfs() seems to just take whatever 16 bit value the guest
> wrote without imposing additional restrictions.
> 
> If there is some mechanism that makes register_vf() fail if we exceed
> the limit, maybe an assertion with a comment would be in order because
> it doesn't seem obvious. I couldn't find any code that enforces it,
> sriov_max_vfs only ever seems to be used as a hint for the guest.
> 
> If not, do we need another check that fails gracefully in the error
> case?

Ok, I see now that patch 2 fixes this. But then the commit message is
wrong because it implies that this patch is the only thing you need to
fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
of the fix is still missing.

Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
good idea. But looking at this one, it seems to me that numcntl isn't
completely correct either:

    list->numcntl = cpu_to_le16(max_vfs);

Both list->numcntl and max_vfs are uint8_t, so I think this will always
be 0 on big endian hosts?

Kevin
Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Akihiko Odaki 8 months, 3 weeks ago
On 2024/02/20 23:53, Kevin Wolf wrote:
> Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
>> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
>>> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
>>> configurations to know the number of VFs being disabled due to SR-IOV
>>> configuration writes, but the logic was flawed and resulted in
>>> out-of-bound memory access.
>>>
>>> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
>>> VFs, but it actually doesn't in the following cases:
>>> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
>>> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
>>> - VFs were only partially enabled because of realization failure.
>>>
>>> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
>>> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
>>> provides, to get the number of enabled VFs before and after SR-IOV
>>> configuration writes.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Fixes: CVE-2024-26328
>>> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
>>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>> ---
>>>   hw/nvme/ctrl.c | 26 ++++++++------------------
>>>   1 file changed, 8 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>> index f026245d1e9e..7a56e7b79b4d 100644
>>> --- a/hw/nvme/ctrl.c
>>> +++ b/hw/nvme/ctrl.c
>>> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
>>>       nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
>>>   }
>>>   
>>> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
>>> -                                      uint32_t val, int len)
>>> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
>>>   {
>>>       NvmeCtrl *n = NVME(dev);
>>>       NvmeSecCtrlEntry *sctrl;
>>> -    uint16_t sriov_cap = dev->exp.sriov_cap;
>>> -    uint32_t off = address - sriov_cap;
>>> -    int i, num_vfs;
>>> +    int i;
>>>   
>>> -    if (!sriov_cap) {
>>> -        return;
>>> -    }
>>> -
>>> -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
>>> -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
>>> -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
>>> -            for (i = 0; i < num_vfs; i++) {
>>> -                sctrl = &n->sec_ctrl_list.sec[i];
>>> -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>>> -            }
>>> -        }
>>> +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
>>> +        sctrl = &n->sec_ctrl_list.sec[i];
>>> +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>>>       }
>>>   }
>>
>> Maybe I'm missing something, but if the concern is that 'i' could run
>> beyond the end of the array, I don't see anything that limits
>> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
>> has. register_vfs() seems to just take whatever 16 bit value the guest
>> wrote without imposing additional restrictions.
>>
>> If there is some mechanism that makes register_vf() fail if we exceed
>> the limit, maybe an assertion with a comment would be in order because
>> it doesn't seem obvious. I couldn't find any code that enforces it,
>> sriov_max_vfs only ever seems to be used as a hint for the guest.
>>
>> If not, do we need another check that fails gracefully in the error
>> case?
> 
> Ok, I see now that patch 2 fixes this. But then the commit message is
> wrong because it implies that this patch is the only thing you need to
> fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
> of the fix is still missing.

I didn't assign CVE-2024-26328 for the case that the value of 
PCI_SRIOV_NUM_VF is greater than PCI_SRIOV_TOTAL_VF; it's what 
CVE-2024-26327 deals with.

The problem I dealt here is that the value of PCI_SRIOV_NUM_VF may not 
represent the actual number of enabled VFs because another register 
(PCI_SRIOV_CTRL_VFE) is not set, for example.

If an assertion to be added, I think it should be in 
pcie_sriov_num_vfs() and ensure the returning value is less than the 
value of PCI_SRIOV_TOTAL_VF (aka sriov_max_vfs in hw/nvme/ctrl.c), but I 
think it's fine without it.

> 
> Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
> good idea. But looking at this one, it seems to me that numcntl isn't
> completely correct either:
> 
>      list->numcntl = cpu_to_le16(max_vfs);
> 
> Both list->numcntl and max_vfs are uint8_t, so I think this will always
> be 0 on big endian hosts?

Indeed it looks wrong. Will you write a patch?

Regards,
Akihiko Odaki
Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Klaus Jensen 8 months, 3 weeks ago
On Feb 21 00:33, Akihiko Odaki wrote:
> On 2024/02/20 23:53, Kevin Wolf wrote:
> > Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> > > Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > > > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > > > configurations to know the number of VFs being disabled due to SR-IOV
> > > > configuration writes, but the logic was flawed and resulted in
> > > > out-of-bound memory access.
> > > > 
> > > > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > > > VFs, but it actually doesn't in the following cases:
> > > > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > > > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > > > - VFs were only partially enabled because of realization failure.
> > > > 
> > > > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > > > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > > > provides, to get the number of enabled VFs before and after SR-IOV
> > > > configuration writes.
> > > > 
> > > > Cc: qemu-stable@nongnu.org
> > > > Fixes: CVE-2024-26328
> > > > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > ---
> > > >   hw/nvme/ctrl.c | 26 ++++++++------------------
> > > >   1 file changed, 8 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > > index f026245d1e9e..7a56e7b79b4d 100644
> > > > --- a/hw/nvme/ctrl.c
> > > > +++ b/hw/nvme/ctrl.c
> > > > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> > > >       nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> > > >   }
> > > > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > > > -                                      uint32_t val, int len)
> > > > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> > > >   {
> > > >       NvmeCtrl *n = NVME(dev);
> > > >       NvmeSecCtrlEntry *sctrl;
> > > > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > > > -    uint32_t off = address - sriov_cap;
> > > > -    int i, num_vfs;
> > > > +    int i;
> > > > -    if (!sriov_cap) {
> > > > -        return;
> > > > -    }
> > > > -
> > > > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > > > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > > > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > > > -            for (i = 0; i < num_vfs; i++) {
> > > > -                sctrl = &n->sec_ctrl_list.sec[i];
> > > > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > > > -            }
> > > > -        }
> > > > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > > > +        sctrl = &n->sec_ctrl_list.sec[i];
> > > > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > > >       }
> > > >   }
> > > 
> > > Maybe I'm missing something, but if the concern is that 'i' could run
> > > beyond the end of the array, I don't see anything that limits
> > > pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> > > has. register_vfs() seems to just take whatever 16 bit value the guest
> > > wrote without imposing additional restrictions.
> > > 
> > > If there is some mechanism that makes register_vf() fail if we exceed
> > > the limit, maybe an assertion with a comment would be in order because
> > > it doesn't seem obvious. I couldn't find any code that enforces it,
> > > sriov_max_vfs only ever seems to be used as a hint for the guest.
> > > 
> > > If not, do we need another check that fails gracefully in the error
> > > case?
> > 
> > Ok, I see now that patch 2 fixes this. But then the commit message is
> > wrong because it implies that this patch is the only thing you need to
> > fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
> > of the fix is still missing.
> 
> I didn't assign CVE-2024-26328 for the case that the value of
> PCI_SRIOV_NUM_VF is greater than PCI_SRIOV_TOTAL_VF; it's what
> CVE-2024-26327 deals with.
> 
> The problem I dealt here is that the value of PCI_SRIOV_NUM_VF may not
> represent the actual number of enabled VFs because another register
> (PCI_SRIOV_CTRL_VFE) is not set, for example.
> 
> If an assertion to be added, I think it should be in pcie_sriov_num_vfs()
> and ensure the returning value is less than the value of PCI_SRIOV_TOTAL_VF
> (aka sriov_max_vfs in hw/nvme/ctrl.c), but I think it's fine without it.
> 
> > 
> > Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
> > good idea. But looking at this one, it seems to me that numcntl isn't
> > completely correct either:
> > 
> >      list->numcntl = cpu_to_le16(max_vfs);
> > 
> > Both list->numcntl and max_vfs are uint8_t, so I think this will always
> > be 0 on big endian hosts?
> 
> Indeed it looks wrong. Will you write a patch?
> 

I'll fix it. And give the SR-IOV parts of hw/nvme some love all around.
Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
Posted by Klaus Jensen 8 months, 3 weeks ago
On Feb 20 15:29, Kevin Wolf wrote:
> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > configurations to know the number of VFs being disabled due to SR-IOV
> > configuration writes, but the logic was flawed and resulted in
> > out-of-bound memory access.
> > 
> > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > VFs, but it actually doesn't in the following cases:
> > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > - VFs were only partially enabled because of realization failure.
> > 
> > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > provides, to get the number of enabled VFs before and after SR-IOV
> > configuration writes.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: CVE-2024-26328
> > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/nvme/ctrl.c | 26 ++++++++------------------
> >  1 file changed, 8 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f026245d1e9e..7a56e7b79b4d 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> >      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> >  }
> >  
> > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > -                                      uint32_t val, int len)
> > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> >  {
> >      NvmeCtrl *n = NVME(dev);
> >      NvmeSecCtrlEntry *sctrl;
> > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > -    uint32_t off = address - sriov_cap;
> > -    int i, num_vfs;
> > +    int i;
> >  
> > -    if (!sriov_cap) {
> > -        return;
> > -    }
> > -
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > -            for (i = 0; i < num_vfs; i++) {
> > -                sctrl = &n->sec_ctrl_list.sec[i];
> > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > -            }
> > -        }
> > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > +        sctrl = &n->sec_ctrl_list.sec[i];
> > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> >      }
> >  }
> 
> Maybe I'm missing something, but if the concern is that 'i' could run
> beyond the end of the array, I don't see anything that limits
> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> has. register_vfs() seems to just take whatever 16 bit value the guest
> wrote without imposing additional restrictions.
> 

Hi Kevin,

Thanks for reviewing, I believe patch 2 in this series fixes that
missing validation of NumVFs.