docs/policy/FollowingSpecifications.md | 219 +++++++++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 docs/policy/FollowingSpecifications.md
There is an ongoing disagreement among maintainers for how Xen should
handle deviations to specifications such as ACPI or EFI.
Write up an explicit policy, and include two worked-out examples from
recent discussions.
Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
NB that the technical descriptions of the costs of the accommodations
or lack thereof I've just gathered from reading the discussions; I'm
not familiar enough with the details to assert things about them. So
please correct any technical issues.
---
docs/policy/FollowingSpecifications.md | 219 +++++++++++++++++++++++++
1 file changed, 219 insertions(+)
create mode 100644 docs/policy/FollowingSpecifications.md
diff --git a/docs/policy/FollowingSpecifications.md b/docs/policy/FollowingSpecifications.md
new file mode 100644
index 0000000000..a197f01f65
--- /dev/null
+++ b/docs/policy/FollowingSpecifications.md
@@ -0,0 +1,219 @@
+# Guidelines for following specifications
+
+## In general, follow specifications
+
+In general, specifications such as ACPI and EFI should be followed.
+
+## Accommodate non-compliant systems if it doesn't affect compliant systems
+
+Sometimes, however, there occur situations where real systems "in the
+wild" violate these specifications, or at least our interpretation of
+them (henceforth called "non-compliant"). If we can accommodate
+non-compliant systems without affecting any compliant systems, then we
+should do so.
+
+## If accommodation would affect theoretical compliant systems that are
+ not known to exist, and Linux and/or Windows takes the
+ accommodation, take the accommodation unless there's a
+ reason not to.
+
+Sometimes, however, there occur situations where real, non-compliant
+systems "in the wild" cannot be accommodated without affecting
+theoretical compliant systems; but there are no known theoretical
+compliant systems which exist. If Linux and/or Windows take the
+accommodation, then from a cost/benefits perspective it's probably best
+for us to take the accommodation as well.
+
+This is really a generalization of the next principle; the "reason not
+to" would be in the form of a cost-benefits analysis as described in
+the next section showing why the "special case" doesn't apply to the
+accommodation in question.
+
+## If things aren't clear, do a cost-benefits analysis
+
+Sometimes, however, things are more complicated or less clear. In
+that case, we should do a cost-benefits analysis for a particular
+accommodation. Things which should be factored into the analysis:
+
+N-1: The number of non-compliant systems that require the accommodation
+ N-1a: The number of known current systems
+ N-1b: The probable number of unknown current systems
+ N-1c: The probable number of unknown future systems
+
+N-2 The severity of the effect of non-accommodation on these systems
+
+C-1: The number of compliant systems that would be affected by the accommodation
+ C-1a: The number of known current systems
+ C-1b: The probable number of unknown current systems
+ C-1c: The probable number of unknown future systems
+
+C-2 The severity of the effect of accommodation on these systems
+
+Intuitively, N-1 * N-2 gives us N, the cost of not making the
+accommodation, and C-1 * C-2 gives us C, the cost of taking the
+accommodation. If N > C, then we should take the accommodation; if C >
+N, then we shouldn't.
+
+The idea isn't to come up with actual numbers to plug in here
+(although that's certainly an option if someone wants to), but to
+explain the general idea we're trying to get at.
+
+A couple of other principles to factor in:
+
+Vendors tend to copy themselves and other vendors. If one or two
+major vendors are known to create compliant or non-compliant systems
+in a particular way, then there are likely to be more unknown and
+future systems which will be affected by / need a similar accommodation
+respectively; that is, we should raise our estimates of N-1{b,c} and
+C-1{b,c}.
+
+Some downstreams already implement accommodations, and test on a
+variety of hardware. If downstreams such as QubesOS or XenServer /
+XCP-ng implement the accommodations, then N-1 * N-2 is likely to be
+non-negligible, and C-1 * C-2 is likely to be negligible.
+
+Windows and Linux are widely tested. If Windows and/or Linux make a
+particular accommodation, and that accommodation has remained stable
+without being reverted, then it's likely that the number of unknown
+current systems that are affected by the accommodation is negligible;
+that is, we should lower the C-1b estimate.
+
+Vendors tend to test server hardware on Windows and Linux. If Windows
+and/or Linux make a particular accommodation, then it's unlikely that
+future systems will be affected by the accommodation; that is, we
+should lower the C-1c estimate.
+
+# Example applications
+
+Here are some examples of how these principles can be applied.
+
+## ACPI MADT tables containing ~0
+
+Xen disables certain kinds of features on CPU hotplug systems; for
+example, it will avoid using TSC, which is faster and more power
+efficient (since on a hot-pluggable system it won't be reliable), and
+instead fall back to other timer sources which are slower and less
+power efficient.
+
+Some hardware vendors have (it seems) begun making a single ACPI table
+image for a range of similar systems, with MADT entries for the number
+of CPUs based on the system with the most CPUs, and then for the
+systems with fewer CPUs, replacing the APIC IDs in the MADT table with
+~0, to indicate that those entries aren't valid. These systems are
+not hotplug capable. Sometimes the invalid slots are on a separate
+socket.
+
+One interpretation of the spec is that a system with such MADT entries
+could actually have an extra socket, and that later the system could
+update the MADT table, populating the APIC IDs with real values.
+
+If Xen finds an MADT where all slots are either populated or filled
+with APICID ~0, , should it consider it a multi-socket hotplug system,
+disable features available on single-socket systems? Or should it
+accommodate the systems above, treating the system as systems
+incapable of hotplug?
+
+N-1a: People have clearly found a number of systems in the wild, from
+different vendors, that exhibit this property; it's a non-negligible
+number of systems.
+
+N-1b,c: Since these systems are from different vendors, and there seem to
+be a fair number of them, there are likely to be many more that we
+don't know about; and likely to be many more produced in the future.
+
+N-2: Xen will use more expensive (both time and power-wise) clock
+sources unless the user manually modifies the Xen command-line.
+
+C-1a,b: There are no known systems that implement phyical CPU hotplug
+whatsoever, much less a system that uses ~0 for APICIDs.
+
+There are hypervisors that implement *virtual* CPU hotplug; but they
+don't use ~0 for APICIDs.
+
+C-1c: It seems that physical CPU hotplug is an unsolved problem: it was
+worked on for quite a while and then abandoned. So it seems fairly
+unlikely that any physical CPU hotplug systems will come to exist any
+time in the near future.
+
+If any hotplug systems were created, they would only be affected if
+they happened to use ~0 the APIC ID of the empty slots in the MADT
+table. This by itself seems unlikely, given the number of vendors who
+are now using that to mean "invalid slot", and the fact that virtual
+hotplug systems don't do this.
+
+Furthermore, Linux has been treating such entries as permanently
+invalid since 2016. If any system were to implement physical CPU
+hotplug in the future, and use ~0 as a placeholder APIC ID, it's very
+likely they would test it on Linux, discover that it doesn't work, and
+modify the system to enable it to work (perhaps copying QEMU's
+behavior). It seems likely that Windows will do the same thing,
+further reducing the probability that any system like this will make
+it into production.
+
+So the potential number of future systems affected by this before we
+can implement a fix seems very small indeed.
+
+C-2: If such a system did exist, everything would work fine at boot;
+the only issue would be that when an extra CPU was plugged in, nothing
+would happen. This could be overridden by a command-line argument.
+
+Adding these all together, there's a widespread, moderate cost to not
+accommodating these systems, and an uncertain and small cost to
+accommodating them. So it makes sense to apply the accommodation.
+
+## Calling EFI Reboot method
+
+One interpretation of the EFI spec is that operating systems should
+call the EFI ResetSystem method in preference to the ACPI reboot
+method.
+
+However, although the ResetSystem method is required by the EFI spec,
+a large number of different systems doesn't actully work, at least
+when called by Xen: a large number of systems don't cleanly reboot
+after calling the EFI REBOOT method, but rather crash or fail in some
+other random way.
+
+(One reason for this is that the Windows EFI test doesn't call the EFI
+ResetSystem method, but calls the ACPI reboot method. One possibile
+explanation for the repeated pattern is that vendors smoke-test the
+ResetSystem method from the EFI shell, which has its own memory map;
+but fail to test it when running on the OS memory map.)
+
+Should Xen follow our interpretation of the EFI spec, and call the
+ResetSystem method in preference to the ACPI reboot method? Or should
+Xen accommodate systems with broken ResetSystem methods, and call the
+ACPI reboot method by default?
+
+N-1a: There are clearly a large number of systems which exhibit this
+property.
+
+N-1b,c: Given the large number of diverse vendors who make this
+mistake, it seems likely that there are even more that we don't know
+about, and this will continue into the future.
+
+N-2: Systems are incapable of rebooting cleanly unless the right runes
+are put into the Xen command line to make it prefer using the ACPI
+reboot method.
+
+C-1a: A system would only be negatively affected if 1) an ACPI reboot
+method exists, 2) an EFI method exists, and 3) calling the ACPI method
+in preference to the EFI method causes some sort of issue. So far
+nobody has run into such a system.
+
+C-1b,c: The Windows EFI test explicitly tests the ACPI reboot method
+on EFI systems. Linux also prefers calling the ACPI reboot method
+even when an EFI method is available. The chance of someone shipping
+a system that had a problem while that was the case is very tiny: it
+basically wouldn't run either of the two most important operating
+systems.
+
+C-2: It seems likely that the worst that could happen is what's
+happening now when calling the EFI method: that the ACPI method would
+cause a weird crash, which then would reboot or hang.
+
+XenServer has shipped this accommodation for several years now.
+
+Adding these altogether, the cost of non-accommodation is widespread
+and moderate; that is to say, non-negligible; and the cost of
+accommodation is theoretical and tiny. So it makes sense to apply the
+accommodation.
\ No newline at end of file
--
2.42.0
© 2016 - 2023 Red Hat, Inc.