docs: add a best practices coding guide

[PATCH] docs: add a best practices coding guide

Posted by Juergen Gross 5 months, 2 weeks ago

Today the CODING_STYLE contains a section "Handling unexpected
conditions" specific to the hypervisor. This section is kind of
misplaced for a coding style. It should rather be part of a "Coding
best practices" guide.

Add such a guide as docs/process/coding-best-practices.pandoc and
move the mentioned section from CODING_STYLE to the new file, while
converting the format to pandoc.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 CODING_STYLE                              | 102 ----------------------
 docs/process/coding-best-practices.pandoc |  92 +++++++++++++++++++
 2 files changed, 92 insertions(+), 102 deletions(-)
 create mode 100644 docs/process/coding-best-practices.pandoc

diff --git a/CODING_STYLE b/CODING_STYLE
index ed13ee2b66..7f6e9ad065 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -167,105 +167,3 @@ the end of files.  It should be:
  * indent-tabs-mode: nil
  * End:
  */
-
-Handling unexpected conditions
-------------------------------
-
-GUIDELINES:
-
-Passing errors up the stack should be used when the caller is already
-expecting to handle errors, and the state when the error was
-discovered isn’t broken, or isn't too hard to fix.
-
-domain_crash() should be used when passing errors up the stack is too
-difficult, and/or when fixing up state of a guest is impractical, but
-where fixing up the state of Xen will allow Xen to continue running.
-This is particularly appropriate when the guest is exhibiting behavior
-well-behaved guests shouldn't.
-
-BUG_ON() should be used when you can’t pass errors up the stack, and
-either continuing or crashing the guest would likely cause an
-information leak or privilege escalation vulnerability.
-
-ASSERT() IS NOT AN ERROR HANDLING MECHANISM.  ASSERT is a way to move
-detection of a bug earlier in the programming cycle; it is a
-more-noticeable printk.  It should only be added after one of the
-other three error-handling mechanisms has been evaluated for
-reliability and security.
-
-RATIONALE:
-
-It's frequently the case that code is written with the assumption that
-certain conditions can never happen.  There are several possible
-actions programmers can take in these situations:
-
-* Programmers can simply not handle those cases in any way, other than
-perhaps to write a comment documenting what the assumption is.
-
-* Programmers can try to handle the case gracefully -- fixing up
-in-progress state and returning an error to the user.
-
-* Programmers can crash the guest.
-
-* Programmers can use ASSERT(), which will cause the check to be
-executed in DEBUG builds, and cause the hypervisor to crash if it's
-violated
-
-* Programmers can use BUG_ON(), which will cause the check to be
-executed in both DEBUG and non-DEBUG builds, and cause the hypervisor
-to crash if it's violated.
-
-In selecting which response to use, we want to achieve several goals:
-
-- To minimize risk of introducing security vulnerabilities,
-  particularly as the code evolves over time
-
-- To efficiently spend programmer time
-
-- To detect violations of assumptions as early as possible
-
-- To minimize the impact of bugs on production use cases
-
-The guidelines above attempt to balance these:
-
-- When the caller is expecting to handle errors, and there is no
-broken state at the time the unexpected condition is discovered, or
-when fixing the state is straightforward, then fixing up the state and
-returning an error is the most robust thing to do.  However, if the
-caller isn't expecting to handle errors, or if the state is difficult
-to fix, then returning an error may require extensive refactoring,
-which is not a good use of programmer time when they're certain that
-this condition cannot occur.
-
-- BUG_ON() will stop all hypervisor action immediately.  In situations
-where continuing might allow an attacker to escalate privilege, a
-BUG_ON() can change a privilege escalation or information leak into a
-denial-of-service (an improvement).  But in situations where
-continuing (say, returning an error) might be safe, then BUG_ON() can
-change a benign failure into denial-of-service (a degradation).
-
-- domain_crash() is similar to BUG_ON(), but with a more limited
-effect: it stops that domain immediately.  In situations where
-continuing might cause guest or hypervisor corruption, but destroying
-the guest allows the hypervisor to continue, this can change a more
-serious bug into a guest denial-of-service.  But in situations where
-returning an error might be safe, then domain_crash() can change a
-benign failure into a guest denial-of-service.
-
-- ASSERT() will stop the hypervisor during development, but allow
-hypervisor action to continue during production.  In situations where
-continuing will at worst result in a denial-of-service, and at best
-may have little effect other than perhaps quirky behavior, using an
-ASSERT() will allow violation of assumptions to be detected as soon as
-possible, while not causing undue degradation in production
-hypervisors.  However, in situations where continuing could cause
-privilege escalation or information leaks, using an ASSERT() can
-introduce security vulnerabilities.
-
-Note however that domain_crash() has its own traps: callers far up the
-call stack may not realize that the domain is now dying as a result of
-an innocuous-looking operation, particularly if somewhere on the
-callstack between the initial function call and the failure, no error
-is returned.  Using domain_crash() requires careful inspection and
-documentation of the code to make sure all callers at the stack handle
-a newly-dead domain gracefully.
diff --git a/docs/process/coding-best-practices.pandoc b/docs/process/coding-best-practices.pandoc
new file mode 100644
index 0000000000..f611aa9a55
--- /dev/null
+++ b/docs/process/coding-best-practices.pandoc
@@ -0,0 +1,92 @@
+# Best Practices in the Hypervisor
+
+## Handling unexpected conditions
+
+### Guidelines
+
+Passing errors up the stack should be used when the caller is already
+expecting to handle errors, and the state when the error was
+discovered isn’t broken, or isn't too hard to fix.
+
+domain_crash() should be used when passing errors up the stack is too
+difficult, and/or when fixing up state of a guest is impractical, but
+where fixing up the state of Xen will allow Xen to continue running.
+This is particularly appropriate when the guest is exhibiting behavior
+well-behaved guests shouldn't.
+
+BUG_ON() should be used when you can’t pass errors up the stack, and
+either continuing or crashing the guest would likely cause an
+information leak or privilege escalation vulnerability.
+
+ASSERT() IS NOT AN ERROR HANDLING MECHANISM.  ASSERT is a way to move
+detection of a bug earlier in the programming cycle; it is a
+more-noticeable printk.  It should only be added after one of the
+other three error-handling mechanisms has been evaluated for
+reliability and security.
+
+### Rationale
+
+It's frequently the case that code is written with the assumption that
+certain conditions can never happen.  There are several possible
+actions programmers can take in these situations:
+
+ * Programmers can simply not handle those cases in any way, other than
+   perhaps to write a comment documenting what the assumption is.
+ * Programmers can try to handle the case gracefully -- fixing up
+   in-progress state and returning an error to the user.
+ * Programmers can crash the guest.
+ * Programmers can use ASSERT(), which will cause the check to be
+   executed in DEBUG builds, and cause the hypervisor to crash if it's
+   violated
+ * Programmers can use BUG_ON(), which will cause the check to be
+   executed in both DEBUG and non-DEBUG builds, and cause the hypervisor
+   to crash if it's violated.
+
+In selecting which response to use, we want to achieve several goals:
+
+ * To minimize risk of introducing security vulnerabilities,
+   particularly as the code evolves over time
+ * To efficiently spend programmer time
+ * To detect violations of assumptions as early as possible
+ * To minimize the impact of bugs on production use cases
+
+The guidelines above attempt to balance these:
+
+ * When the caller is expecting to handle errors, and there is no
+   broken state at the time the unexpected condition is discovered, or
+   when fixing the state is straightforward, then fixing up the state and
+   returning an error is the most robust thing to do.  However, if the
+   caller isn't expecting to handle errors, or if the state is difficult
+   to fix, then returning an error may require extensive refactoring,
+   which is not a good use of programmer time when they're certain that
+   this condition cannot occur.
+ * BUG_ON() will stop all hypervisor action immediately.  In situations
+   where continuing might allow an attacker to escalate privilege, a
+   BUG_ON() can change a privilege escalation or information leak into a
+   denial-of-service (an improvement).  But in situations where
+   continuing (say, returning an error) might be safe, then BUG_ON() can
+   change a benign failure into denial-of-service (a degradation).
+ * domain_crash() is similar to BUG_ON(), but with a more limited
+   effect: it stops that domain immediately.  In situations where
+   continuing might cause guest or hypervisor corruption, but destroying
+   the guest allows the hypervisor to continue, this can change a more
+   serious bug into a guest denial-of-service.  But in situations where
+   returning an error might be safe, then domain_crash() can change a
+   benign failure into a guest denial-of-service.
+ * ASSERT() will stop the hypervisor during development, but allow
+   hypervisor action to continue during production.  In situations where
+   continuing will at worst result in a denial-of-service, and at best
+   may have little effect other than perhaps quirky behavior, using an
+   ASSERT() will allow violation of assumptions to be detected as soon as
+   possible, while not causing undue degradation in production
+   hypervisors.  However, in situations where continuing could cause
+   privilege escalation or information leaks, using an ASSERT() can
+   introduce security vulnerabilities.
+
+Note however that domain_crash() has its own traps: callers far up the
+call stack may not realize that the domain is now dying as a result of
+an innocuous-looking operation, particularly if somewhere on the
+callstack between the initial function call and the failure, no error
+is returned.  Using domain_crash() requires careful inspection and
+documentation of the code to make sure all callers at the stack handle
+a newly-dead domain gracefully.
-- 
2.35.3

Re: [PATCH] docs: add a best practices coding guide

Posted by Julien Grall 5 months, 2 weeks ago

Hi Juergen,

On 08/02/2024 16:05, Juergen Gross wrote:
> Today the CODING_STYLE contains a section "Handling unexpected
> conditions" specific to the hypervisor. This section is kind of
> misplaced for a coding style. It should rather be part of a "Coding
> best practices" guide.
> 
> Add such a guide as docs/process/coding-best-practices.pandoc and
> move the mentioned section from CODING_STYLE to the new file, while
> converting the format to pandoc.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall

Re: [PATCH] docs: add a best practices coding guide

Posted by Julien Grall 5 months, 1 week ago


On 08/02/2024 16:27, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/02/2024 16:05, Juergen Gross wrote:
>> Today the CODING_STYLE contains a section "Handling unexpected
>> conditions" specific to the hypervisor. This section is kind of
>> misplaced for a coding style. It should rather be part of a "Coding
>> best practices" guide.
>>
>> Add such a guide as docs/process/coding-best-practices.pandoc and
>> move the mentioned section from CODING_STYLE to the new file, while
>> converting the format to pandoc.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Julien Grall <jgrall@amazon.com>

There was no other feedbacks. So I have committed it.

Cheers,

> 
> Cheers,
> 

-- 
Julien Grall