[RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public

Vipin Sharma posted 21 patches 3 months, 3 weeks ago
[RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Vipin Sharma 3 months, 3 weeks ago
Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
linux/pci.h so that they are available to code outside of the PCI core.

These structs will be used in subsequent commits to serialize and
deserialize PCI state across Live Update.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 drivers/pci/pci.c   |  5 -----
 drivers/pci/pci.h   |  7 -------
 include/linux/pci.h | 13 +++++++++++++
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..b68bf3e820ce 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1884,11 +1884,6 @@ void pci_restore_state(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_restore_state);
 
-struct pci_saved_state {
-	u32 config_space[16];
-	struct pci_cap_saved_data cap[];
-};
-
 /**
  * pci_store_saved_state - Allocate and return an opaque struct containing
  *			   the device saved state.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 09476a467cc0..973fcdf7898d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -197,13 +197,6 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
 int pci_bus_error_reset(struct pci_dev *dev);
 int __pci_reset_bus(struct pci_bus *bus);
 
-struct pci_cap_saved_data {
-	u16		cap_nr;
-	bool		cap_extended;
-	unsigned int	size;
-	u32		data[];
-};
-
 struct pci_cap_saved_state {
 	struct hlist_node		next;
 	struct pci_cap_saved_data	cap;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8ce2d4528193..70c9b12c8c02 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1448,6 +1448,19 @@ void pci_disable_rom(struct pci_dev *pdev);
 void __iomem __must_check *pci_map_rom(struct pci_dev *pdev, size_t *size);
 void pci_unmap_rom(struct pci_dev *pdev, void __iomem *rom);
 
+
+struct pci_cap_saved_data {
+	u16		cap_nr;
+	bool		cap_extended;
+	unsigned int	size;
+	u32		data[];
+};
+
+struct pci_saved_state {
+	u32 config_space[16];
+	struct pci_cap_saved_data cap[];
+};
+
 /* Power management related routines */
 int pci_save_state(struct pci_dev *dev);
 void pci_restore_state(struct pci_dev *dev);
-- 
2.51.0.858.gf9c4a03a3a-goog
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Lukas Wunner 3 months, 3 weeks ago
On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> linux/pci.h so that they are available to code outside of the PCI core.
> 
> These structs will be used in subsequent commits to serialize and
> deserialize PCI state across Live Update.

That's not sufficient as a justification to make these public in my view.

There are already pci_store_saved_state() and pci_load_saved_state()
helpers to serialize PCI state.  Why do you need anything more?
(Honest question.)

Thanks,

Lukas
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Vipin Sharma 3 months, 3 weeks ago
On 2025-10-18 09:17:33, Lukas Wunner wrote:
> On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> > Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> > linux/pci.h so that they are available to code outside of the PCI core.
> > 
> > These structs will be used in subsequent commits to serialize and
> > deserialize PCI state across Live Update.
> 
> That's not sufficient as a justification to make these public in my view.
> 
> There are already pci_store_saved_state() and pci_load_saved_state()
> helpers to serialize PCI state.  Why do you need anything more?
> (Honest question.)
> 

In LUO ecosystem, currently,  we do not have a solid solution to do
proper serialization/deserialization of structs along with versioning
between different kernel versions. This work is still being discussed.

Here, I created separate structs (exactly same as the original one) to
have little bit control on what gets saved in serialized state and
correctly gets deserialized after kexec.

For example, if I am using existing structs and not creating my own
structs then I cannot just do a blind memcpy() between whole of the PCI state
prior to kexec to PCI state after the kexec. In the new kernel
layout might have changed like addition or removal of a field.

Having __packed in my version of struct, I can build validation like
hardcoded offset of members. I can add version number (not added in this
series) for checking compatbility in the struct for serialization and
deserialization. Overall, it is providing some freedom to how to pass
data to next kernel without changing or modifying the PCI state structs.
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Lukas Wunner 3 months, 3 weeks ago
On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> On 2025-10-18 09:17:33, Lukas Wunner wrote:
> > On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> > > Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> > > linux/pci.h so that they are available to code outside of the PCI core.
> > > 
> > > These structs will be used in subsequent commits to serialize and
> > > deserialize PCI state across Live Update.
> > 
> > That's not sufficient as a justification to make these public in my view.
> > 
> > There are already pci_store_saved_state() and pci_load_saved_state()
> > helpers to serialize PCI state.  Why do you need anything more?
> > (Honest question.)
> 
> In LUO ecosystem, currently,  we do not have a solid solution to do
> proper serialization/deserialization of structs along with versioning
> between different kernel versions. This work is still being discussed.
> 
> Here, I created separate structs (exactly same as the original one) to
> have little bit control on what gets saved in serialized state and
> correctly gets deserialized after kexec.
> 
> For example, if I am using existing structs and not creating my own
> structs then I cannot just do a blind memcpy() between whole of the PCI state
> prior to kexec to PCI state after the kexec. In the new kernel
> layout might have changed like addition or removal of a field.

The last time we changed those structs was in 2013 by fd0f7f73ca96.
So changes are extremely rare.

What could change in theory is the layout of the individual
capabilities (the data[] in struct pci_cap_saved_data).
E.g. maybe we decide that we need to save an additional register.
But that's also rare.  Normally we add all the mutable registers
when a new capability is supported and have no need to amend that
afterwards.

So I think you're preparing for an eventuality that's very unlikely
to happen.  Question is whether that justifies the additional
complexity and duplication.  (Probably not.)

Note that struct pci_cap_saved_state was made private in 2021 by
f0ab00174eb7.  We try to prevent other subsystems or drivers fiddling
with structures internal to the PCI core.  For LUO to find acceptance,
it needs to respect subsystems' desire to keep private what's private
and it needs to be as non-intrusive as possible.  If necessary,
helpers needed by LUO (e.g. to determine the size of saved PCI state)
should probably live in the PCI core and be #ifdef'ed to LUO being enabled.

Thanks,

Lukas
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by David Matlack 3 months, 1 week ago
On 2025-10-19 10:15 AM, Lukas Wunner wrote:
> On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> > On 2025-10-18 09:17:33, Lukas Wunner wrote:
> > > On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> > > > Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> > > > linux/pci.h so that they are available to code outside of the PCI core.
> > > > 
> > > > These structs will be used in subsequent commits to serialize and
> > > > deserialize PCI state across Live Update.
> > > 
> > > That's not sufficient as a justification to make these public in my view.
> > > 
> > > There are already pci_store_saved_state() and pci_load_saved_state()
> > > helpers to serialize PCI state.  Why do you need anything more?
> > > (Honest question.)
> > 
> > In LUO ecosystem, currently,  we do not have a solid solution to do
> > proper serialization/deserialization of structs along with versioning
> > between different kernel versions. This work is still being discussed.
> > 
> > Here, I created separate structs (exactly same as the original one) to
> > have little bit control on what gets saved in serialized state and
> > correctly gets deserialized after kexec.
> > 
> > For example, if I am using existing structs and not creating my own
> > structs then I cannot just do a blind memcpy() between whole of the PCI state
> > prior to kexec to PCI state after the kexec. In the new kernel
> > layout might have changed like addition or removal of a field.
> 
> The last time we changed those structs was in 2013 by fd0f7f73ca96.
> So changes are extremely rare.
> 
> What could change in theory is the layout of the individual
> capabilities (the data[] in struct pci_cap_saved_data).
> E.g. maybe we decide that we need to save an additional register.
> But that's also rare.  Normally we add all the mutable registers
> when a new capability is supported and have no need to amend that
> afterwards.

Yeah that has me worried. A totally innocuous commit that adds, removes,
or reorders a register stashed in data[] could lead a broken device when
VFIO does pci_restore_state() after a Live Update.

Turing pci_save_state into an actual ABI would require adding the
registers into the save state probably, rather than assuming their
order.

But... I wonder if we truly need to preserve the PCI save state
across Live Update.

Based on this comment in drivers/vfio/pci/vfio_pci_core.c, the PCI
save/restore stuff in VFIO is for cleaning up devices that do not
support resets:

 648         /*
 649          * If we have saved state, restore it.  If we can reset the device,
 650          * even better.  Resetting with current state seems better than
 651          * nothing, but saving and restoring current state without reset
 652          * is just busy work.
 653          */
 654         if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) {
 655                 pci_info(pdev, "%s: Couldn't reload saved state\n", __func__);
 656
 657                 if (!vdev->reset_works)
 658                         goto out;
 659
 660                 pci_save_state(pdev);
 661         }

So if we just limit Live Update support to devices with reset_works,
then we don't have to deal with preserving the save state.

I will have to double check that reset_works is true for all the devices
we care about supporting for Live Update, but I imagine it will be.
They're all relatively modern PCIe devices.
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by David Matlack 3 months, 1 week ago
On Thu, Oct 30, 2025 at 4:55 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-10-19 10:15 AM, Lukas Wunner wrote:
> > On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> > > On 2025-10-18 09:17:33, Lukas Wunner wrote:
> > > > On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> > > > > Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> > > > > linux/pci.h so that they are available to code outside of the PCI core.
> > > > >
> > > > > These structs will be used in subsequent commits to serialize and
> > > > > deserialize PCI state across Live Update.
> > > >
> > > > That's not sufficient as a justification to make these public in my view.
> > > >
> > > > There are already pci_store_saved_state() and pci_load_saved_state()
> > > > helpers to serialize PCI state.  Why do you need anything more?
> > > > (Honest question.)
> > >
> > > In LUO ecosystem, currently,  we do not have a solid solution to do
> > > proper serialization/deserialization of structs along with versioning
> > > between different kernel versions. This work is still being discussed.
> > >
> > > Here, I created separate structs (exactly same as the original one) to
> > > have little bit control on what gets saved in serialized state and
> > > correctly gets deserialized after kexec.
> > >
> > > For example, if I am using existing structs and not creating my own
> > > structs then I cannot just do a blind memcpy() between whole of the PCI state
> > > prior to kexec to PCI state after the kexec. In the new kernel
> > > layout might have changed like addition or removal of a field.
> >
> > The last time we changed those structs was in 2013 by fd0f7f73ca96.
> > So changes are extremely rare.
> >
> > What could change in theory is the layout of the individual
> > capabilities (the data[] in struct pci_cap_saved_data).
> > E.g. maybe we decide that we need to save an additional register.
> > But that's also rare.  Normally we add all the mutable registers
> > when a new capability is supported and have no need to amend that
> > afterwards.
>
> Yeah that has me worried. A totally innocuous commit that adds, removes,
> or reorders a register stashed in data[] could lead a broken device when
> VFIO does pci_restore_state() after a Live Update.
>
> Turing pci_save_state into an actual ABI would require adding the
> registers into the save state probably, rather than assuming their
> order.
>
> But... I wonder if we truly need to preserve the PCI save state
> across Live Update.
>
> Based on this comment in drivers/vfio/pci/vfio_pci_core.c, the PCI
> save/restore stuff in VFIO is for cleaning up devices that do not
> support resets:

Err, no, I misread that comment. But I guess my question still stands
whether we truly need to preserve the pci_save_state across Live
Update. Maybe there is a simpler way for VFIO to clean up the device
in vfio_pci_core_disable() if we make certain restrictions on which
devices we support.
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Vipin Sharma 3 months, 3 weeks ago
On 2025-10-19 10:15:19, Lukas Wunner wrote:
> On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> > On 2025-10-18 09:17:33, Lukas Wunner wrote:
> > > On Fri, Oct 17, 2025 at 05:07:07PM -0700, Vipin Sharma wrote:
> > > > Move struct pci_saved_state{} and struct pci_cap_saved_data{} to
> > > > linux/pci.h so that they are available to code outside of the PCI core.
> > > > 
> > > > These structs will be used in subsequent commits to serialize and
> > > > deserialize PCI state across Live Update.
> > > 
> > > That's not sufficient as a justification to make these public in my view.
> > > 
> > > There are already pci_store_saved_state() and pci_load_saved_state()
> > > helpers to serialize PCI state.  Why do you need anything more?
> > > (Honest question.)
> > 
> > In LUO ecosystem, currently,  we do not have a solid solution to do
> > proper serialization/deserialization of structs along with versioning
> > between different kernel versions. This work is still being discussed.
> > 
> > Here, I created separate structs (exactly same as the original one) to
> > have little bit control on what gets saved in serialized state and
> > correctly gets deserialized after kexec.
> > 
> > For example, if I am using existing structs and not creating my own
> > structs then I cannot just do a blind memcpy() between whole of the PCI state
> > prior to kexec to PCI state after the kexec. In the new kernel
> > layout might have changed like addition or removal of a field.
> 
> The last time we changed those structs was in 2013 by fd0f7f73ca96.
> So changes are extremely rare.
> 
> What could change in theory is the layout of the individual
> capabilities (the data[] in struct pci_cap_saved_data).
> E.g. maybe we decide that we need to save an additional register.
> But that's also rare.  Normally we add all the mutable registers
> when a new capability is supported and have no need to amend that
> afterwards.
> 
> So I think you're preparing for an eventuality that's very unlikely
> to happen.  Question is whether that justifies the additional
> complexity and duplication.  (Probably not.)
> 
> Note that struct pci_cap_saved_state was made private in 2021 by
> f0ab00174eb7.  We try to prevent other subsystems or drivers fiddling
> with structures internal to the PCI core.  For LUO to find acceptance,
> it needs to respect subsystems' desire to keep private what's private
> and it needs to be as non-intrusive as possible.  If necessary,
> helpers needed by LUO (e.g. to determine the size of saved PCI state)
> should probably live in the PCI core and be #ifdef'ed to LUO being enabled.
> 

Sounds good, I will create helpers in PCI core and ifdef them for the
things we end up agreeing that need to be saved.

But I also think we need some guardrails to detect if they change
otherwise we might end up getting some hard to catch data corruption. I
think this ties up to what Jason also saying we need to define LUO ABI.
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Jason Gunthorpe 3 months, 3 weeks ago
On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:

> Having __packed in my version of struct, I can build validation like
> hardcoded offset of members. I can add version number (not added in this
> series) for checking compatbility in the struct for serialization and
> deserialization. Overall, it is providing some freedom to how to pass
> data to next kernel without changing or modifying the PCI state
> structs.

I keep saying this, and this series really strongly shows why, we need
to have a dedicated header directroy for LUO "ABI" structs. Putting
this random struct in some random header and then declaring it is part
of the luo ABI is really bad.

All the information in the abi headers needs to have detailed comments
explaining what it is and so on so people can evaluate if it is
suitable or not.

But, it is also not clear why pci serialization structs should leak
out of the PCI layer.

The design of luo was to allow each layer to contribute its own
tags/etc to the serialization so there is no reason to have vfio
piggback on pci structs or something.

Jason
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Vipin Sharma 3 months, 3 weeks ago
On 2025-10-18 20:11:26, Jason Gunthorpe wrote:
> On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> 
> > Having __packed in my version of struct, I can build validation like
> > hardcoded offset of members. I can add version number (not added in this
> > series) for checking compatbility in the struct for serialization and
> > deserialization. Overall, it is providing some freedom to how to pass
> > data to next kernel without changing or modifying the PCI state
> > structs.
> 
> I keep saying this, and this series really strongly shows why, we need
> to have a dedicated header directroy for LUO "ABI" structs. Putting
> this random struct in some random header and then declaring it is part
> of the luo ABI is really bad.

Now that we have PCI, IOMMU, and VFIO series out. What should be the
strategy for LUO "ABI" structs? I would like some more clarity on how
you are visioning this.

Are you suggesting that each subsystem create a separate header file for
their serialization structs or we can have one common header file used
by all subsystems as dumping ground for their structs?


> 
> All the information in the abi headers needs to have detailed comments
> explaining what it is and so on so people can evaluate if it is
> suitable or not.

I agree. I should have at least written comments in my *_ser structs on
why that particular field is there and what it is enabling. I will do
that in next version.

> 
> But, it is also not clear why pci serialization structs should leak
> out of the PCI layer.
> 

When PCI device is opened for the first time, VFIO driver asks for this state
from PCI and saves it in struct vfio_pci_core_device{.pci_saved_state}
field. It loads this value back to pci device after last device FD is
closed. 

PCI layer will not have access to this value as it can be changed once
VFIO has start using this device. Therefore, I thought this should be
saved.

May be serialization and deserialization logic can be put in PCI and
that way it can stay in PCI?

> The design of luo was to allow each layer to contribute its own
> tags/etc to the serialization so there is no reason to have vfio
> piggback on pci structs or something.
> 
> Jason
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Jason Gunthorpe 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 04:49:34PM -0700, Vipin Sharma wrote:

> May be serialization and deserialization logic can be put in PCI and
> that way it can stay in PCI?

This does seem better

vfio should call something and get back a token it can store.

Jason
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by David Matlack 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 4:49 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On 2025-10-18 20:11:26, Jason Gunthorpe wrote:
> > On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> >
> > > Having __packed in my version of struct, I can build validation like
> > > hardcoded offset of members. I can add version number (not added in this
> > > series) for checking compatbility in the struct for serialization and
> > > deserialization. Overall, it is providing some freedom to how to pass
> > > data to next kernel without changing or modifying the PCI state
> > > structs.
> >
> > I keep saying this, and this series really strongly shows why, we need
> > to have a dedicated header directroy for LUO "ABI" structs. Putting
> > this random struct in some random header and then declaring it is part
> > of the luo ABI is really bad.
>
> Now that we have PCI, IOMMU, and VFIO series out. What should be the
> strategy for LUO "ABI" structs? I would like some more clarity on how
> you are visioning this.
>
> Are you suggesting that each subsystem create a separate header file for
> their serialization structs or we can have one common header file used
> by all subsystems as dumping ground for their structs?

I think we should have multiple header files in one directory, that
way we can assign separate MAINTAINERS for each file as needed.

Jason Miu proposed the first such header for KHO in
https://lore.kernel.org/lkml/CALzav=eqwTdzFhZLi_mWWXGuDBRwWQdBxQrzr4tN28ag8Zr_8Q@mail.gmail.com/.

Following that example we can add vfio_pci.h and pci.h to that
directory for VFIO and PCI ABI structs respectively.
Re: [RFC PATCH 15/21] PCI: Make PCI saved state and capability structs public
Posted by Jason Gunthorpe 3 months, 2 weeks ago
On Wed, Oct 22, 2025 at 10:45:31AM -0700, David Matlack wrote:
> On Mon, Oct 20, 2025 at 4:49 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On 2025-10-18 20:11:26, Jason Gunthorpe wrote:
> > > On Sat, Oct 18, 2025 at 03:36:20PM -0700, Vipin Sharma wrote:
> > >
> > > > Having __packed in my version of struct, I can build validation like
> > > > hardcoded offset of members. I can add version number (not added in this
> > > > series) for checking compatbility in the struct for serialization and
> > > > deserialization. Overall, it is providing some freedom to how to pass
> > > > data to next kernel without changing or modifying the PCI state
> > > > structs.
> > >
> > > I keep saying this, and this series really strongly shows why, we need
> > > to have a dedicated header directroy for LUO "ABI" structs. Putting
> > > this random struct in some random header and then declaring it is part
> > > of the luo ABI is really bad.
> >
> > Now that we have PCI, IOMMU, and VFIO series out. What should be the
> > strategy for LUO "ABI" structs? I would like some more clarity on how
> > you are visioning this.
> >
> > Are you suggesting that each subsystem create a separate header file for
> > their serialization structs or we can have one common header file used
> > by all subsystems as dumping ground for their structs?
> 
> I think we should have multiple header files in one directory, that
> way we can assign separate MAINTAINERS for each file as needed.
> 
> Jason Miu proposed the first such header for KHO in
> https://lore.kernel.org/lkml/CALzav=eqwTdzFhZLi_mWWXGuDBRwWQdBxQrzr4tN28ag8Zr_8Q@mail.gmail.com/.
> 
> Following that example we can add vfio_pci.h and pci.h to that
> directory for VFIO and PCI ABI structs respectively.

Seems like a good idea to me.

Jason