docs: Document a policy for when to deviate from specifications

[PATCH] docs: Document a policy for when to deviate from specifications

Posted by George Dunlap 7 months, 1 week ago

There is an ongoing disagreement among maintainers for how Xen should
handle deviations to specifications such as ACPI or EFI.

Write up an explicit policy, and include two worked-out examples from
recent discussions.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
NB that the technical descriptions of the costs of the accommodations
or lack thereof I've just gathered from reading the discussions; I'm
not familiar enough with the details to assert things about them.  So
please correct any technical issues.
---
 docs/policy/FollowingSpecifications.md | 219 +++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 docs/policy/FollowingSpecifications.md

diff --git a/docs/policy/FollowingSpecifications.md b/docs/policy/FollowingSpecifications.md
new file mode 100644
index 0000000000..a197f01f65
--- /dev/null
+++ b/docs/policy/FollowingSpecifications.md
@@ -0,0 +1,219 @@
+# Guidelines for following specifications
+
+## In general, follow specifications
+
+In general, specifications such as ACPI and EFI should be followed.
+
+## Accommodate non-compliant systems if it doesn't affect compliant systems
+
+Sometimes, however, there occur situations where real systems "in the
+wild" violate these specifications, or at least our interpretation of
+them (henceforth called "non-compliant").  If we can accommodate
+non-compliant systems without affecting any compliant systems, then we
+should do so.
+
+## If accommodation would affect theoretical compliant systems that are
+   not known to exist, and Linux and/or Windows takes the
+   accommodation, take the accommodation unless there's a
+   reason not to.
+
+Sometimes, however, there occur situations where real, non-compliant
+systems "in the wild" cannot be accommodated without affecting
+theoretical compliant systems; but there are no known theoretical
+compliant systems which exist.  If Linux and/or Windows take the
+accommodation, then from a cost/benefits perspective it's probably best
+for us to take the accommodation as well.
+
+This is really a generalization of the next principle; the "reason not
+to" would be in the form of a cost-benefits analysis as described in
+the next section showing why the "special case" doesn't apply to the
+accommodation in question.
+
+## If things aren't clear, do a cost-benefits analysis
+
+Sometimes, however, things are more complicated or less clear.  In
+that case, we should do a cost-benefits analysis for a particular
+accommodation.  Things which should be factored into the analysis:
+
+N-1: The number of non-compliant systems that require the accommodation
+ N-1a: The number of known current systems
+ N-1b: The probable number of unknown current systems
+ N-1c: The probable number of unknown future systems
+
+N-2 The severity of the effect of non-accommodation on these systems
+
+C-1: The number of compliant systems that would be affected by the accommodation
+ C-1a: The number of known current systems
+ C-1b: The probable number of unknown current systems
+ C-1c: The probable number of unknown future systems
+
+C-2 The severity of the effect of accommodation on these systems
+
+Intuitively, N-1 * N-2 gives us N, the cost of not making the
+accommodation, and C-1 * C-2 gives us C, the cost of taking the
+accommodation.  If N > C, then we should take the accommodation; if C >
+N, then we shouldn't.
+
+The idea isn't to come up with actual numbers to plug in here
+(although that's certainly an option if someone wants to), but to
+explain the general idea we're trying to get at.
+
+A couple of other principles to factor in:
+
+Vendors tend to copy themselves and other vendors.  If one or two
+major vendors are known to create compliant or non-compliant systems
+in a particular way, then there are likely to be more unknown and
+future systems which will be affected by / need a similar accommodation
+respectively; that is, we should raise our estimates of N-1{b,c} and
+C-1{b,c}.
+
+Some downstreams already implement accommodations, and test on a
+variety of hardware.  If downstreams such as QubesOS or XenServer /
+XCP-ng implement the accommodations, then N-1 * N-2 is likely to be
+non-negligible, and C-1 * C-2 is likely to be negligible.
+
+Windows and Linux are widely tested.  If Windows and/or Linux make a
+particular accommodation, and that accommodation has remained stable
+without being reverted, then it's likely that the number of unknown
+current systems that are affected by the accommodation is negligible;
+that is, we should lower the C-1b estimate.
+
+Vendors tend to test server hardware on Windows and Linux.  If Windows
+and/or Linux make a particular accommodation, then it's unlikely that
+future systems will be affected by the accommodation; that is, we
+should lower the C-1c estimate.
+
+# Example applications
+
+Here are some examples of how these principles can be applied.
+
+## ACPI MADT tables containing ~0
+
+Xen disables certain kinds of features on CPU hotplug systems; for
+example, it will avoid using TSC, which is faster and more power
+efficient (since on a hot-pluggable system it won't be reliable), and
+instead fall back to other timer sources which are slower and less
+power efficient.
+
+Some hardware vendors have (it seems) begun making a single ACPI table
+image for a range of similar systems, with MADT entries for the number
+of CPUs based on the system with the most CPUs, and then for the
+systems with fewer CPUs, replacing the APIC IDs in the MADT table with
+~0, to indicate that those entries aren't valid.  These systems are
+not hotplug capable.  Sometimes the invalid slots are on a separate
+socket.
+
+One interpretation of the spec is that a system with such MADT entries
+could actually have an extra socket, and that later the system could
+update the MADT table, populating the APIC IDs with real values.
+
+If Xen finds an MADT where all slots are either populated or filled
+with APICID ~0, , should it consider it a multi-socket hotplug system,
+disable features available on single-socket systems?  Or should it
+accommodate the systems above, treating the system as systems
+incapable of hotplug?
+
+N-1a: People have clearly found a number of systems in the wild, from
+different vendors, that exhibit this property; it's a non-negligible
+number of systems.
+
+N-1b,c: Since these systems are from different vendors, and there seem to
+be a fair number of them, there are likely to be many more that we
+don't know about; and likely to be many more produced in the future.
+
+N-2: Xen will use more expensive (both time and power-wise) clock
+sources unless the user manually modifies the Xen command-line.
+
+C-1a,b: There are no known systems that implement phyical CPU hotplug
+whatsoever, much less a system that uses ~0 for APICIDs.
+
+There are hypervisors that implement *virtual* CPU hotplug; but they
+don't use ~0 for APICIDs.
+
+C-1c: It seems that physical CPU hotplug is an unsolved problem: it was
+worked on for quite a while and then abandoned.  So it seems fairly
+unlikely that any physical CPU hotplug systems will come to exist any
+time in the near future.
+
+If any hotplug systems were created, they would only be affected if
+they happened to use ~0 the APIC ID of the empty slots in the MADT
+table.  This by itself seems unlikely, given the number of vendors who
+are now using that to mean "invalid slot", and the fact that virtual
+hotplug systems don't do this.
+
+Furthermore, Linux has been treating such entries as permanently
+invalid since 2016.  If any system were to implement physical CPU
+hotplug in the future, and use ~0 as a placeholder APIC ID, it's very
+likely they would test it on Linux, discover that it doesn't work, and
+modify the system to enable it to work (perhaps copying QEMU's
+behavior).  It seems likely that Windows will do the same thing,
+further reducing the probability that any system like this will make
+it into production.
+
+So the potential number of future systems affected by this before we
+can implement a fix seems very small indeed.
+
+C-2: If such a system did exist, everything would work fine at boot;
+the only issue would be that when an extra CPU was plugged in, nothing
+would happen.  This could be overridden by a command-line argument.
+
+Adding these all together, there's a widespread, moderate cost to not
+accommodating these systems, and an uncertain and small cost to
+accommodating them.  So it makes sense to apply the accommodation.
+
+## Calling EFI Reboot method
+
+One interpretation of the EFI spec is that operating systems should
+call the EFI ResetSystem method in preference to the ACPI reboot
+method.
+
+However, although the ResetSystem method is required by the EFI spec,
+a large number of different systems doesn't actully work, at least
+when called by Xen: a large number of systems don't cleanly reboot
+after calling the EFI REBOOT method, but rather crash or fail in some
+other random way.
+
+(One reason for this is that the Windows EFI test doesn't call the EFI
+ResetSystem method, but calls the ACPI reboot method.  One possibile
+explanation for the repeated pattern is that vendors smoke-test the
+ResetSystem method from the EFI shell, which has its own memory map;
+but fail to test it when running on the OS memory map.)
+
+Should Xen follow our interpretation of the EFI spec, and call the
+ResetSystem method in preference to the ACPI reboot method?  Or should
+Xen accommodate systems with broken ResetSystem methods, and call the
+ACPI reboot method by default?
+
+N-1a: There are clearly a large number of systems which exhibit this
+property.
+
+N-1b,c: Given the large number of diverse vendors who make this
+mistake, it seems likely that there are even more that we don't know
+about, and this will continue into the future.
+
+N-2: Systems are incapable of rebooting cleanly unless the right runes
+are put into the Xen command line to make it prefer using the ACPI
+reboot method.
+
+C-1a: A system would only be negatively affected if 1) an ACPI reboot
+method exists, 2) an EFI method exists, and 3) calling the ACPI method
+in preference to the EFI method causes some sort of issue.  So far
+nobody has run into such a system.
+
+C-1b,c: The Windows EFI test explicitly tests the ACPI reboot method
+on EFI systems.  Linux also prefers calling the ACPI reboot method
+even when an EFI method is available.  The chance of someone shipping
+a system that had a problem while that was the case is very tiny: it
+basically wouldn't run either of the two most important operating
+systems.
+
+C-2: It seems likely that the worst that could happen is what's
+happening now when calling the EFI method: that the ACPI method would
+cause a weird crash, which then would reboot or hang.
+
+XenServer has shipped this accommodation for several years now.
+
+Adding these altogether, the cost of non-accommodation is widespread
+and moderate; that is to say, non-negligible; and the cost of
+accommodation is theoretical and tiny.  So it makes sense to apply the
+accommodation.
\ No newline at end of file
-- 
2.42.0

Re: [PATCH] docs: Document a policy for when to deviate from specifications

Posted by Marek Marczykowski-Górecki 4 months, 2 weeks ago

On Mon, Sep 18, 2023 at 01:28:16PM +0100, George Dunlap wrote:
> There is an ongoing disagreement among maintainers for how Xen should
> handle deviations to specifications such as ACPI or EFI.
> 
> Write up an explicit policy, and include two worked-out examples from
> recent discussions.

Looks very reasonable to me. While it would be nice for every hardware
(or thing in general) to follow specifications, sadly it isn't reality
and Xen doesn't have enough market share to influence vendors in a
meaningful way. So, yes, the policy described below sounds like a
reasonable approach to make things working for end users.

Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

PS As a downstream distributions, we do ship several workarounds that were
rejected upstream on the grounds that "specification says otherwise"
before...

> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>
> ---
> NB that the technical descriptions of the costs of the accommodations
> or lack thereof I've just gathered from reading the discussions; I'm
> not familiar enough with the details to assert things about them.  So
> please correct any technical issues.
> ---
>  docs/policy/FollowingSpecifications.md | 219 +++++++++++++++++++++++++
>  1 file changed, 219 insertions(+)
>  create mode 100644 docs/policy/FollowingSpecifications.md
> 
> diff --git a/docs/policy/FollowingSpecifications.md b/docs/policy/FollowingSpecifications.md
> new file mode 100644
> index 0000000000..a197f01f65
> --- /dev/null
> +++ b/docs/policy/FollowingSpecifications.md
> @@ -0,0 +1,219 @@
> +# Guidelines for following specifications
> +
> +## In general, follow specifications
> +
> +In general, specifications such as ACPI and EFI should be followed.
> +
> +## Accommodate non-compliant systems if it doesn't affect compliant systems
> +
> +Sometimes, however, there occur situations where real systems "in the
> +wild" violate these specifications, or at least our interpretation of
> +them (henceforth called "non-compliant").  If we can accommodate
> +non-compliant systems without affecting any compliant systems, then we
> +should do so.
> +
> +## If accommodation would affect theoretical compliant systems that are
> +   not known to exist, and Linux and/or Windows takes the
> +   accommodation, take the accommodation unless there's a
> +   reason not to.
> +
> +Sometimes, however, there occur situations where real, non-compliant
> +systems "in the wild" cannot be accommodated without affecting
> +theoretical compliant systems; but there are no known theoretical
> +compliant systems which exist.  If Linux and/or Windows take the
> +accommodation, then from a cost/benefits perspective it's probably best
> +for us to take the accommodation as well.
> +
> +This is really a generalization of the next principle; the "reason not
> +to" would be in the form of a cost-benefits analysis as described in
> +the next section showing why the "special case" doesn't apply to the
> +accommodation in question.
> +
> +## If things aren't clear, do a cost-benefits analysis
> +
> +Sometimes, however, things are more complicated or less clear.  In
> +that case, we should do a cost-benefits analysis for a particular
> +accommodation.  Things which should be factored into the analysis:
> +
> +N-1: The number of non-compliant systems that require the accommodation
> + N-1a: The number of known current systems
> + N-1b: The probable number of unknown current systems
> + N-1c: The probable number of unknown future systems
> +
> +N-2 The severity of the effect of non-accommodation on these systems
> +
> +C-1: The number of compliant systems that would be affected by the accommodation
> + C-1a: The number of known current systems
> + C-1b: The probable number of unknown current systems
> + C-1c: The probable number of unknown future systems
> +
> +C-2 The severity of the effect of accommodation on these systems
> +
> +Intuitively, N-1 * N-2 gives us N, the cost of not making the
> +accommodation, and C-1 * C-2 gives us C, the cost of taking the
> +accommodation.  If N > C, then we should take the accommodation; if C >
> +N, then we shouldn't.
> +
> +The idea isn't to come up with actual numbers to plug in here
> +(although that's certainly an option if someone wants to), but to
> +explain the general idea we're trying to get at.
> +
> +A couple of other principles to factor in:
> +
> +Vendors tend to copy themselves and other vendors.  If one or two
> +major vendors are known to create compliant or non-compliant systems
> +in a particular way, then there are likely to be more unknown and
> +future systems which will be affected by / need a similar accommodation
> +respectively; that is, we should raise our estimates of N-1{b,c} and
> +C-1{b,c}.
> +
> +Some downstreams already implement accommodations, and test on a
> +variety of hardware.  If downstreams such as QubesOS or XenServer /
> +XCP-ng implement the accommodations, then N-1 * N-2 is likely to be
> +non-negligible, and C-1 * C-2 is likely to be negligible.
> +
> +Windows and Linux are widely tested.  If Windows and/or Linux make a
> +particular accommodation, and that accommodation has remained stable
> +without being reverted, then it's likely that the number of unknown
> +current systems that are affected by the accommodation is negligible;
> +that is, we should lower the C-1b estimate.
> +
> +Vendors tend to test server hardware on Windows and Linux.  If Windows
> +and/or Linux make a particular accommodation, then it's unlikely that
> +future systems will be affected by the accommodation; that is, we
> +should lower the C-1c estimate.
> +
> +# Example applications
> +
> +Here are some examples of how these principles can be applied.
> +
> +## ACPI MADT tables containing ~0
> +
> +Xen disables certain kinds of features on CPU hotplug systems; for
> +example, it will avoid using TSC, which is faster and more power
> +efficient (since on a hot-pluggable system it won't be reliable), and
> +instead fall back to other timer sources which are slower and less
> +power efficient.
> +
> +Some hardware vendors have (it seems) begun making a single ACPI table
> +image for a range of similar systems, with MADT entries for the number
> +of CPUs based on the system with the most CPUs, and then for the
> +systems with fewer CPUs, replacing the APIC IDs in the MADT table with
> +~0, to indicate that those entries aren't valid.  These systems are
> +not hotplug capable.  Sometimes the invalid slots are on a separate
> +socket.
> +
> +One interpretation of the spec is that a system with such MADT entries
> +could actually have an extra socket, and that later the system could
> +update the MADT table, populating the APIC IDs with real values.
> +
> +If Xen finds an MADT where all slots are either populated or filled
> +with APICID ~0, , should it consider it a multi-socket hotplug system,
> +disable features available on single-socket systems?  Or should it
> +accommodate the systems above, treating the system as systems
> +incapable of hotplug?
> +
> +N-1a: People have clearly found a number of systems in the wild, from
> +different vendors, that exhibit this property; it's a non-negligible
> +number of systems.
> +
> +N-1b,c: Since these systems are from different vendors, and there seem to
> +be a fair number of them, there are likely to be many more that we
> +don't know about; and likely to be many more produced in the future.
> +
> +N-2: Xen will use more expensive (both time and power-wise) clock
> +sources unless the user manually modifies the Xen command-line.
> +
> +C-1a,b: There are no known systems that implement phyical CPU hotplug
> +whatsoever, much less a system that uses ~0 for APICIDs.
> +
> +There are hypervisors that implement *virtual* CPU hotplug; but they
> +don't use ~0 for APICIDs.
> +
> +C-1c: It seems that physical CPU hotplug is an unsolved problem: it was
> +worked on for quite a while and then abandoned.  So it seems fairly
> +unlikely that any physical CPU hotplug systems will come to exist any
> +time in the near future.
> +
> +If any hotplug systems were created, they would only be affected if
> +they happened to use ~0 the APIC ID of the empty slots in the MADT
> +table.  This by itself seems unlikely, given the number of vendors who
> +are now using that to mean "invalid slot", and the fact that virtual
> +hotplug systems don't do this.
> +
> +Furthermore, Linux has been treating such entries as permanently
> +invalid since 2016.  If any system were to implement physical CPU
> +hotplug in the future, and use ~0 as a placeholder APIC ID, it's very
> +likely they would test it on Linux, discover that it doesn't work, and
> +modify the system to enable it to work (perhaps copying QEMU's
> +behavior).  It seems likely that Windows will do the same thing,
> +further reducing the probability that any system like this will make
> +it into production.
> +
> +So the potential number of future systems affected by this before we
> +can implement a fix seems very small indeed.
> +
> +C-2: If such a system did exist, everything would work fine at boot;
> +the only issue would be that when an extra CPU was plugged in, nothing
> +would happen.  This could be overridden by a command-line argument.
> +
> +Adding these all together, there's a widespread, moderate cost to not
> +accommodating these systems, and an uncertain and small cost to
> +accommodating them.  So it makes sense to apply the accommodation.
> +
> +## Calling EFI Reboot method
> +
> +One interpretation of the EFI spec is that operating systems should
> +call the EFI ResetSystem method in preference to the ACPI reboot
> +method.
> +
> +However, although the ResetSystem method is required by the EFI spec,
> +a large number of different systems doesn't actully work, at least
> +when called by Xen: a large number of systems don't cleanly reboot
> +after calling the EFI REBOOT method, but rather crash or fail in some
> +other random way.
> +
> +(One reason for this is that the Windows EFI test doesn't call the EFI
> +ResetSystem method, but calls the ACPI reboot method.  One possibile
> +explanation for the repeated pattern is that vendors smoke-test the
> +ResetSystem method from the EFI shell, which has its own memory map;
> +but fail to test it when running on the OS memory map.)
> +
> +Should Xen follow our interpretation of the EFI spec, and call the
> +ResetSystem method in preference to the ACPI reboot method?  Or should
> +Xen accommodate systems with broken ResetSystem methods, and call the
> +ACPI reboot method by default?
> +
> +N-1a: There are clearly a large number of systems which exhibit this
> +property.
> +
> +N-1b,c: Given the large number of diverse vendors who make this
> +mistake, it seems likely that there are even more that we don't know
> +about, and this will continue into the future.
> +
> +N-2: Systems are incapable of rebooting cleanly unless the right runes
> +are put into the Xen command line to make it prefer using the ACPI
> +reboot method.
> +
> +C-1a: A system would only be negatively affected if 1) an ACPI reboot
> +method exists, 2) an EFI method exists, and 3) calling the ACPI method
> +in preference to the EFI method causes some sort of issue.  So far
> +nobody has run into such a system.
> +
> +C-1b,c: The Windows EFI test explicitly tests the ACPI reboot method
> +on EFI systems.  Linux also prefers calling the ACPI reboot method
> +even when an EFI method is available.  The chance of someone shipping
> +a system that had a problem while that was the case is very tiny: it
> +basically wouldn't run either of the two most important operating
> +systems.
> +
> +C-2: It seems likely that the worst that could happen is what's
> +happening now when calling the EFI method: that the ACPI method would
> +cause a weird crash, which then would reboot or hang.
> +
> +XenServer has shipped this accommodation for several years now.
> +
> +Adding these altogether, the cost of non-accommodation is widespread
> +and moderate; that is to say, non-negligible; and the cost of
> +accommodation is theoretical and tiny.  So it makes sense to apply the
> +accommodation.
> \ No newline at end of file
> -- 
> 2.42.0
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Re: [PATCH] docs: Document a policy for when to deviate from specifications

Posted by Julien Grall 4 months, 2 weeks ago

Hi George,

Sorry for the late reply.

On 18/09/2023 13:28, George Dunlap wrote:
> There is an ongoing disagreement among maintainers for how Xen should
> handle deviations to specifications such as ACPI or EFI.
> 
> Write up an explicit policy, and include two worked-out examples from
> recent discussions.

NIT: For a more balanced arguments, it would have been good to have one 
example where Xen decides to not accept a deviationg from the spec.

Anyway...

> 
> Signed-off-by: George Dunlap <george.dunlap@cloud.com>

... your proposal makes sense to me. So with one typo below:

Acked-by: Julien Grall <jgrall@amazon.com>

> +## Calling EFI Reboot method
> +
> +One interpretation of the EFI spec is that operating systems should
> +call the EFI ResetSystem method in preference to the ACPI reboot
> +method.
> +
> +However, although the ResetSystem method is required by the EFI spec,
> +a large number of different systems doesn't actully work, at least

Typo: s/actully/actually/

> +when called by Xen: a large number of systems don't cleanly reboot
> +after calling the EFI REBOOT method, but rather crash or fail in some
> +other random way.

Cheers,

-- 
Julien Grall