[XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

Roberto Bagnara posted 1 patch 10 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/db6e7432f92657c1386a475895c3b334e1c53693.1686839154.git.roberto.bagnara@bugseng.com
There is a newer version of this series
docs/misra/C-language-toolchain.rst | 465 ++++++++++++++++++++++++++++
1 file changed, 465 insertions(+)
create mode 100644 docs/misra/C-language-toolchain.rst
[XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 2 weeks ago
This document specifies the C language dialect used by Xen and
the assumptions Xen makes on the translation toolchain.

Signed-off-by: Roberto Bagnara <roberto.bagnara@bugseng.com>
---
 docs/misra/C-language-toolchain.rst | 465 ++++++++++++++++++++++++++++
 1 file changed, 465 insertions(+)
 create mode 100644 docs/misra/C-language-toolchain.rst

diff --git a/docs/misra/C-language-toolchain.rst b/docs/misra/C-language-toolchain.rst
new file mode 100644
index 0000000000..013cef071c
--- /dev/null
+++ b/docs/misra/C-language-toolchain.rst
@@ -0,0 +1,465 @@
+=============================================
+C Dialect and Translation Assumptions for Xen
+=============================================
+
+This document specifies the C language dialect used by Xen and
+the assumptions Xen makes on the translation toolchain.
+It covers, in particular:
+
+1. the used language extensions;
+2. the translation limits that the translation toolchains must be able
+   to accommodate;
+3. the implementation-defined behaviors upon which Xen may depend.
+
+All points are of course relevant for portability.  In addition,
+programming in C is impossible without a detailed knowledge of the
+implementation-defined behaviors.  For this reason, it is recommended
+that Xen developers have familiarity with this document and the
+documentation referenced therein.
+
+This document needs maintenance and adaptation in the following
+circumstances:
+
+- whenever the compiler is changed or updated;
+- whenever the use of a certain language extension is added or removed;
+- whenever code modifications cause exceeding the stated translation limits.
+
+
+Applicable C Language Standard
+______________________________
+
+Xen is written in C99 with extensions.  The relevant ISO standard is
+
+    *ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
+    Technical Corrigendum 3.
+    ISO/IEC, Geneva, Switzerland, 2007.
+
+
+Reference Documentation
+_______________________
+
+The following documents are referred to in the sequel:
+
+GCC_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
+CPP_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
+ARM64_ABI_MANUAL:
+  https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
+X86_64_ABI_MANUAL:
+  https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+ARM64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
+X86_64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
+
+
+C Language Extensions
+_____________________
+
+
+The following table lists the extensions currently used in Xen.
+The table columns are as follows:
+
+   Extension
+      a terse description of the extension;
+   Architectures
+      a set of Xen architectures making use of the extension;
+   References
+      when available, references to the documentation explaining
+      the syntax and semantics of (each instance of) the extension.
+
+
+.. list-table::
+   :widths: 30 15 55
+   :header-rows: 1
+
+   * - Extension
+     - Architectures
+     - References
+
+   * - Non-standard tokens
+     - ARM64, X86_64
+     - _Static_assert:
+          see Section "2.1 C Language" of GCC_MANUAL.
+       asm, __asm__:
+          see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline Assembly Language in C Code" of GCC_MANUAL.
+       __volatile__:
+          see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of GCC_MANUAL.
+       __const__, __inline__, __inline:
+          see Section "6.48 Alternate Keywords" of GCC_MANUAL.
+       typeof, __typeof__:
+          see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
+       __alignof__, __alignof:
+          see Sections "6.48 Alternate Keywords" and "6.44 Determining the Alignment of Functions, Types or Variables" of GCC_MANUAL.
+       __attribute__:
+          see Section "6.39 Attribute Syntax" of GCC_MANUAL.
+       __builtin_types_compatible_p:
+          see Section "6.59 Other Built-in Functions Provided by GCC" of GCC_MANUAL.
+       __builtin_va_arg:
+          non-documented GCC extension.
+       __builtin_offsetof:
+          see Section "6.53 Support for offsetof" of GCC_MANUAL.
+       __signed__:
+          non-documented GCC extension.
+
+   * - Empty initialization list
+     - ARM64, X86_64
+     - Non-documented GCC extension.
+
+   * - Arithmetic operator on void type
+     - ARM64, X86_64
+     - See Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL."
+
+   * - GNU statement expression
+     - ARM64, X86_64
+     - See Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
+
+   * - Structure or union definition with no members
+     - ARM64, X86_64
+     - See Section "6.19 Structures with No Members" of GCC_MANUAL.
+
+   * - Zero size array type
+     - ARM64, X86_64
+     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
+
+   * - Binary conditional expression
+     - ARM64, X86_64
+     - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.
+
+   * - 'Case' label with upper/lower values
+     - ARM64, X86_64
+     - See Section "6.30 Case Ranges" of GCC_MANUAL.
+
+   * - Unnamed field that is not a bit-field
+     - ARM64, X86_64
+     - See Section "6.63 Unnamed Structure and Union Fields" of GCC_MANUAL.
+
+   * - Empty declaration
+     - ARM64, X86_64
+     - Non-documented GCC extension.
+
+   * - Incomplete enum declaration
+     - ARM64
+     - Non-documented GCC extension.
+
+   * - Implicit conversion from a pointer to an incompatible pointer
+     - ARM64, X86_64
+     - Non-documented GCC extension.
+
+   * - Pointer to a function is converted to a pointer to an object or a pointer to an object is converted to a pointer to a function
+     - X86_64
+     - Non-documented GCC extension.
+
+   * - Ill-formed source detected by the parser
+     - ARM64, X86_64
+     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
+          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
+       must specify at least one argument for '...' parameter of variadic macro:
+          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
+       void function should not return void expression:
+          see the documentation for -Wreturn-type in Section "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL.
+       use of GNU statement expression extension from macro expansion:
+          see Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
+       invalid application of sizeof to a void type:
+          see Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL.
+       redeclaration of already-defined enum is a GNU extension:
+          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
+       static function is used in an inline function with external linkage:
+          non-documented GCC extension.
+       struct may not be nested in a struct due to flexible array member:
+          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
+       struct may not be used as an array element due to flexible array member:
+          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
+       ISO C restricts enumerator values to the range of int:
+          non-documented GCC extension.
+
+   * - Unspecified escape sequence is encountered in a character constant or a string literal token
+     - X86_64
+     - \\m:
+          non-documented GCC extension.
+
+   * - Non-standard type
+     - X86_64
+     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
+
+
+Translation Limits
+__________________
+
+The following table lists the translation limits that a toolchain has
+to satisfy in order to translate Xen.  The numbers given are a
+compromise: on the one hand, many modern compilers have very generous
+limits (in several cases, the only limitation is the amount of
+available memory); on the other hand we prefer setting limits that are
+not too high, because compilers do not have any obligation of
+diagnosing when a limit has been exceeded, and not too low, so as to
+avoid frequently updating this document.  In the table, only the
+limits that go beyond the minima specified by the relevant C Standard
+are listed.
+
+The table columns are as follows:
+
+   Limit
+      a terse description of the translation limit;
+   Architectures
+      a set relevant of Xen architectures;
+   Threshold
+      a value that the Xen project does not wish to exceed for that limit
+      (this is typically below, often much below what the translation
+      toolchain supports);
+   References
+      when available, references to the documentation providing evidence
+      that the translation toolchain honors the threshold (and more).
+
+.. list-table::
+   :widths: 30 15 10 45
+   :header-rows: 1
+
+   * - Limit
+     - Architectures
+     - Threshold
+     - References
+
+   * - Size of an object
+     - ARM64, X86_64
+     - 8388608
+     - The maximum size of an object is defined in the MAX_SIZE macro, and for a 32 bit architecture is 8MB.
+       The maximum size for an array is defined in the PTRDIFF_MAX and in a 32 bit architecture is 2^30-1.
+       See occurrences of these macros in GCC_MANUAL.
+
+   * - Characters in one logical source line
+     - ARM64
+     - 5000
+     - See Section "11.2 Implementation limits" of CPP_MANUAL.
+
+   * - Characters in one logical source line
+     - X86_64
+     - 12000
+     - See Section "11.2 Implementation limits" of CPP_MANUAL.
+
+   * - Nesting levels for #include files
+     - ARM64
+     - 24
+     - See Section "11.2 Implementation limits" of CPP_MANUAL.
+
+   * - Nesting levels for #include files
+     - X86_64
+     - 32
+     - See Section "11.2 Implementation limits" of CPP_MANUAL.
+
+   * - case labels for a switch statement (excluding those for any nested switch statements)
+     - X86_64
+     - 1500
+     - See Section "4.12 Statements" of GCC_MANUAL.
+
+   * - Number of significant initial characters in an external identifier
+     - ARM64, X86_64
+     - 63
+     - See Section "4.3 Identifiers" of GCC_MANUAL.
+
+
+Implementation-Defined Behaviors
+________________________________
+
+The following table lists the C language implementation-defined behaviors
+relevant for MISRA C:2012 Dir 1.1 upon which Xen may possibly depend.
+
+The table columns are as follows:
+
+   I.-D.B.
+      a terse description of the implementation-defined behavior;
+   Architectures
+      a set relevant of Xen architectures;
+   Value(s)
+      for i.-d.b.'s with values, the values allowed;
+   References
+      when available, references to the documentation providing details
+      about how the i.-d.b. is resolved by the translation toolchain.
+
+.. list-table::
+   :widths: 30 15 10 45
+   :header-rows: 1
+
+   * - I.-D.B.
+     - Architectures
+     - Value(s)
+     - References
+
+   * - Allowable bit-field types other than _Bool, signed int, and unsigned int
+     - ARM64, X86_64
+     - All explicitly signed integer types, all unsigned integer types,
+       and enumerations.
+     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields".
+
+   * - #pragma preprocessing directive that is documented as causing translation failure or some other form of undefined behavior is encountered
+     - ARM64, X86_64
+     - pack, GCC visibility
+     - #pragma pack:
+          see Section "6.62.11 Structure-Layout Pragmas" of GCC_MANUAL.
+       #pragma GCC visibility:
+          see Section "6.62.14 Visibility Pragmas" of GCC_MANUAL.
+
+   * - The number of bits in a byte
+     - ARM64
+     - 8
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
+
+   * - The number of bits in a byte
+     - X86_64
+     - 8
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - Whether signed integer types are represented using sign and magnitude, two's complement, or one's complement, and whether the extraordinary value is a trap representation or an ordinary value
+     - ARM64, X86_64
+     - Two's complement
+     - See Section "4.5 Integers" of GCC_MANUAL.
+
+   * - Any extended integer types that exist in the implementation
+     - X86_64
+     - __uint128_t
+     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
+
+   * - The number, order, and encoding of bytes in any object
+     - ARM64
+     -
+     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
+
+   * - The number, order, and encoding of bytes in any object
+     - X86_64
+     -
+     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - Whether a bit-field can straddle a storage-unit boundary
+     - ARM64
+     -
+     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
+
+   * - Whether a bit-field can straddle a storage-unit boundary
+     - X86_64
+     -
+     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - The order of allocation of bit-fields within a unit
+     - ARM64
+     -
+     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
+
+   * - The order of allocation of bit-fields within a unit
+     - X86_64
+     -
+     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - What constitutes an access to an object that has volatile-qualified type
+     - ARM64, X86_64
+     -
+     - See Section "4.10 Qualifiers" of GCC_MANUAL.
+
+   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
+     - ARM64
+     -
+     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
+
+   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
+     - X86_64
+     -
+     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - Character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token
+     - ARM64
+     - UTF-8
+     - See Section "1.1 Character sets" of CPP_MANUAL.
+       We assume the locale is not restricting any UTF-8 characters being part of the source character set.
+
+   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
+     - ARM64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
+
+   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
+     - X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
+     - ARM64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
+
+   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
+     - X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
+
+   * - The mapping of members of the source character set
+     - ARM64, X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -finput-charset=charset in the same manual.
+
+   * - The members of the source and execution character sets, except as explicitly specified in the Standard
+     - ARM64, X86_64
+     - UTF-8
+     - See Section "4.4 Characters" of GCC_MANUAL
+
+   * - The values of the members of the execution character set
+     - ARM64, X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -fexec-charset=charset in the same manual.
+
+   * - How a diagnostic is identified
+     - ARM64, X86_64
+     -
+     - See Section "4.1 Translation" of GCC_MANUAL.
+
+   * - The termination status returned to the host environment by the abort, exit, or _Exit function
+     - ARM64
+     -
+     - See "Section 25.7 Program Termination" of ARM64_LIBC_MANUAL.
+
+   * - The termination status returned to the host environment by the abort, exit, or _Exit function
+     - X86_64
+     -
+     - See "Section 25.7 Program Termination" of X86_64_LIBC_MANUAL.
+
+   * - The places that are searched for an included < > delimited header, and how the places are specified or the header is identified
+     - ARM64, X86_64
+     -
+     - See Chapter "2 Header Files" of CPP_MANUAL.
+
+   * - How the named source file is searched for in an included " " delimited header
+     - ARM64, X86_64
+     -
+     - See Chapter "2 Header Files" of CPP_MANUAL.
+
+   * - How sequences in both forms of header names are mapped to headers or external source file names
+     - ARM64, X86_64
+     -
+     - See Chapter "2 Header Files" of CPP_MANUAL.
+
+   * - Whether the # operator inserts a \ character before the \ character that begins a universal character name in a character constant or string literal
+     - ARM64, X86_64
+     -
+     - See Section "3.4 Stringizing" of CPP_MANUAL.
+
+   * - The current locale used to convert a wide string literal into corresponding wide character codes
+     - ARM64, X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
+
+   * - The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set
+     - X86_64
+     -
+     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
+
+   * - The behavior on each recognized #pragma directive
+     - ARM64, X86_64
+     - pack, GCC visibility
+     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "7 Pragmas" of CPP_MANUAL.
+
+   * - The method by which preprocessing tokens (possibly resulting from macro expansion) in a #include directive are combined into a header name
+     - X86_64
+     -
+     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
+
+
+END OF DOCUMENT.
-- 
2.34.1
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Stefano Stabellini 10 months, 2 weeks ago
On Thu, 15 Jun 2023, Roberto Bagnara wrote:
> This document specifies the C language dialect used by Xen and
> the assumptions Xen makes on the translation toolchain.
> 
> Signed-off-by: Roberto Bagnara <roberto.bagnara@bugseng.com>

Thanks Roberto for the amazing work of research and archaeology.

I have a few comments below, mostly to clarify the description of some
of the less documented GCC extensions, for the purpose of having all
community members be able to understand what they can and cannot use.


> ---
>  docs/misra/C-language-toolchain.rst | 465 ++++++++++++++++++++++++++++
>  1 file changed, 465 insertions(+)
>  create mode 100644 docs/misra/C-language-toolchain.rst
> 
> diff --git a/docs/misra/C-language-toolchain.rst b/docs/misra/C-language-toolchain.rst
> new file mode 100644
> index 0000000000..013cef071c
> --- /dev/null
> +++ b/docs/misra/C-language-toolchain.rst
> @@ -0,0 +1,465 @@
> +=============================================
> +C Dialect and Translation Assumptions for Xen
> +=============================================
> +
> +This document specifies the C language dialect used by Xen and
> +the assumptions Xen makes on the translation toolchain.
> +It covers, in particular:
> +
> +1. the used language extensions;
> +2. the translation limits that the translation toolchains must be able
> +   to accommodate;
> +3. the implementation-defined behaviors upon which Xen may depend.
> +
> +All points are of course relevant for portability.  In addition,
> +programming in C is impossible without a detailed knowledge of the
> +implementation-defined behaviors.  For this reason, it is recommended
> +that Xen developers have familiarity with this document and the
> +documentation referenced therein.
> +
> +This document needs maintenance and adaptation in the following
> +circumstances:
> +
> +- whenever the compiler is changed or updated;
> +- whenever the use of a certain language extension is added or removed;
> +- whenever code modifications cause exceeding the stated translation limits.
> +
> +
> +Applicable C Language Standard
> +______________________________
> +
> +Xen is written in C99 with extensions.  The relevant ISO standard is
> +
> +    *ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
> +    Technical Corrigendum 3.
> +    ISO/IEC, Geneva, Switzerland, 2007.
> +
> +
> +Reference Documentation
> +_______________________
> +
> +The following documents are referred to in the sequel:
> +
> +GCC_MANUAL:
> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
> +CPP_MANUAL:
> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
> +ARM64_ABI_MANUAL:
> +  https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
> +X86_64_ABI_MANUAL:
> +  https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
> +ARM64_LIBC_MANUAL:
> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
> +X86_64_LIBC_MANUAL:
> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
> +
> +
> +C Language Extensions
> +_____________________
> +
> +
> +The following table lists the extensions currently used in Xen.
> +The table columns are as follows:
> +
> +   Extension
> +      a terse description of the extension;
> +   Architectures
> +      a set of Xen architectures making use of the extension;
> +   References
> +      when available, references to the documentation explaining
> +      the syntax and semantics of (each instance of) the extension.
> +
> +
> +.. list-table::
> +   :widths: 30 15 55
> +   :header-rows: 1
> +
> +   * - Extension
> +     - Architectures
> +     - References
> +
> +   * - Non-standard tokens
> +     - ARM64, X86_64
> +     - _Static_assert:
> +          see Section "2.1 C Language" of GCC_MANUAL.
> +       asm, __asm__:
> +          see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline Assembly Language in C Code" of GCC_MANUAL.
> +       __volatile__:
> +          see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of GCC_MANUAL.
> +       __const__, __inline__, __inline:
> +          see Section "6.48 Alternate Keywords" of GCC_MANUAL.
> +       typeof, __typeof__:
> +          see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
> +       __alignof__, __alignof:
> +          see Sections "6.48 Alternate Keywords" and "6.44 Determining the Alignment of Functions, Types or Variables" of GCC_MANUAL.
> +       __attribute__:
> +          see Section "6.39 Attribute Syntax" of GCC_MANUAL.
> +       __builtin_types_compatible_p:
> +          see Section "6.59 Other Built-in Functions Provided by GCC" of GCC_MANUAL.
> +       __builtin_va_arg:
> +          non-documented GCC extension.
> +       __builtin_offsetof:
> +          see Section "6.53 Support for offsetof" of GCC_MANUAL.
> +       __signed__:
> +          non-documented GCC extension.
> +
> +   * - Empty initialization list
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.
> +
> +   * - Arithmetic operator on void type
> +     - ARM64, X86_64
> +     - See Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL."
> +
> +   * - GNU statement expression

"GNU statement expression" is not very clear, at least for me. I would
call it "Statements and Declarations in Expressions".


> +     - ARM64, X86_64
> +     - See Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
> +
> +   * - Structure or union definition with no members
> +     - ARM64, X86_64
> +     - See Section "6.19 Structures with No Members" of GCC_MANUAL.
> +
> +   * - Zero size array type
> +     - ARM64, X86_64
> +     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +
> +   * - Binary conditional expression
> +     - ARM64, X86_64
> +     - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.
> +
> +   * - 'Case' label with upper/lower values
> +     - ARM64, X86_64
> +     - See Section "6.30 Case Ranges" of GCC_MANUAL.
> +
> +   * - Unnamed field that is not a bit-field
> +     - ARM64, X86_64
> +     - See Section "6.63 Unnamed Structure and Union Fields" of GCC_MANUAL.
> +
> +   * - Empty declaration
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.

For the non-documented GCC extensions, would it be possible to add a
very brief example or a couple of words in the "References" sections?
Otherwise I think people might not understand what we are talking about.

For instance in this case I would say:

An empty declaration is a semicolon with nothing before it.
Non-documented GCC extension.


> +   * - Incomplete enum declaration
> +     - ARM64
> +     - Non-documented GCC extension.

Is this 6.49 of the GCC manual perhaps?


> +   * - Implicit conversion from a pointer to an incompatible pointer
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.

Is this related to -Wincompatible-pointer-types?


> +   * - Pointer to a function is converted to a pointer to an object or a pointer to an object is converted to a pointer to a function
> +     - X86_64
> +     - Non-documented GCC extension.

Is this J.5.7 of n1570?
https://www.iso-9899.info/n1570.html

Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584


> +   * - Ill-formed source detected by the parser

As we are documenting compiler extensions that we are using, I am a bit
confused by the name of this category of compiler extensions, and the
reason why they are bundled together. After all, they are all separate
compiler extensions? Should each of them have their own row?


> +     - ARM64, X86_64
> +     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
> +          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
> +       must specify at least one argument for '...' parameter of variadic macro:
> +          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
> +       void function should not return void expression:

I understand that GCC does a poor job at documenting several of these
extensions. In fact a few of them are not even documented at all.
However, if they are extensions, they should be described for what they
do, not for the rule they violate. What do you think?

For example, in this case maybe we should say "void function can return
a void expression" ?


> +          see the documentation for -Wreturn-type in Section "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL.
> +       use of GNU statement expression extension from macro expansion:
> +          see Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
> +       invalid application of sizeof to a void type:
> +          see Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL.
> +       redeclaration of already-defined enum is a GNU extension:
> +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
> +       static function is used in an inline function with external linkage:
> +          non-documented GCC extension.

I am not sure if I follow about this one. Did you mean "static is used
in an inline function with external linkage" ?


> +       struct may not be nested in a struct due to flexible array member:
> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +       struct may not be used as an array element due to flexible array member:
> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +       ISO C restricts enumerator values to the range of int:
> +          non-documented GCC extension.

Should we call it instead "enumerator values can be larger than int" ?


> +
> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
> +     - X86_64
> +     - \\m:
> +          non-documented GCC extension.

Are you saying that we are using \m and \m is not allowed by the C
standard?


> +   * - Non-standard type

Should we call it "128-bit Integers" ?


> +     - X86_64
> +     - See Section "6.9 128-bit Integers" of GCC_MANUAL.




> +Translation Limits
> +__________________
> +
> +The following table lists the translation limits that a toolchain has
> +to satisfy in order to translate Xen.  The numbers given are a
> +compromise: on the one hand, many modern compilers have very generous
> +limits (in several cases, the only limitation is the amount of
> +available memory); on the other hand we prefer setting limits that are
> +not too high, because compilers do not have any obligation of
> +diagnosing when a limit has been exceeded, and not too low, so as to
> +avoid frequently updating this document.  In the table, only the
> +limits that go beyond the minima specified by the relevant C Standard
> +are listed.
> +
> +The table columns are as follows:
> +
> +   Limit
> +      a terse description of the translation limit;
> +   Architectures
> +      a set relevant of Xen architectures;
> +   Threshold
> +      a value that the Xen project does not wish to exceed for that limit
> +      (this is typically below, often much below what the translation
> +      toolchain supports);
> +   References
> +      when available, references to the documentation providing evidence
> +      that the translation toolchain honors the threshold (and more).
> +
> +.. list-table::
> +   :widths: 30 15 10 45
> +   :header-rows: 1
> +
> +   * - Limit
> +     - Architectures
> +     - Threshold
> +     - References
> +
> +   * - Size of an object
> +     - ARM64, X86_64
> +     - 8388608
> +     - The maximum size of an object is defined in the MAX_SIZE macro, and for a 32 bit architecture is 8MB.
> +       The maximum size for an array is defined in the PTRDIFF_MAX and in a 32 bit architecture is 2^30-1.
> +       See occurrences of these macros in GCC_MANUAL.
> +
> +   * - Characters in one logical source line
> +     - ARM64
> +     - 5000
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Characters in one logical source line
> +     - X86_64
> +     - 12000
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Nesting levels for #include files
> +     - ARM64
> +     - 24
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Nesting levels for #include files
> +     - X86_64
> +     - 32
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - case labels for a switch statement (excluding those for any nested switch statements)
> +     - X86_64
> +     - 1500
> +     - See Section "4.12 Statements" of GCC_MANUAL.
> +
> +   * - Number of significant initial characters in an external identifier
> +     - ARM64, X86_64
> +     - 63
> +     - See Section "4.3 Identifiers" of GCC_MANUAL.
> +
> +
> +Implementation-Defined Behaviors
> +________________________________
> +
> +The following table lists the C language implementation-defined behaviors
> +relevant for MISRA C:2012 Dir 1.1 upon which Xen may possibly depend.
> +
> +The table columns are as follows:
> +
> +   I.-D.B.
> +      a terse description of the implementation-defined behavior;
> +   Architectures
> +      a set relevant of Xen architectures;
> +   Value(s)
> +      for i.-d.b.'s with values, the values allowed;
> +   References
> +      when available, references to the documentation providing details
> +      about how the i.-d.b. is resolved by the translation toolchain.
> +
> +.. list-table::
> +   :widths: 30 15 10 45
> +   :header-rows: 1
> +
> +   * - I.-D.B.
> +     - Architectures
> +     - Value(s)
> +     - References
> +
> +   * - Allowable bit-field types other than _Bool, signed int, and unsigned int
> +     - ARM64, X86_64
> +     - All explicitly signed integer types, all unsigned integer types,
> +       and enumerations.
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields".
> +
> +   * - #pragma preprocessing directive that is documented as causing translation failure or some other form of undefined behavior is encountered
> +     - ARM64, X86_64
> +     - pack, GCC visibility
> +     - #pragma pack:
> +          see Section "6.62.11 Structure-Layout Pragmas" of GCC_MANUAL.
> +       #pragma GCC visibility:
> +          see Section "6.62.14 Visibility Pragmas" of GCC_MANUAL.
> +
> +   * - The number of bits in a byte
> +     - ARM64
> +     - 8
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
> +
> +   * - The number of bits in a byte
> +     - X86_64
> +     - 8
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Whether signed integer types are represented using sign and magnitude, two's complement, or one's complement, and whether the extraordinary value is a trap representation or an ordinary value
> +     - ARM64, X86_64
> +     - Two's complement
> +     - See Section "4.5 Integers" of GCC_MANUAL.
> +
> +   * - Any extended integer types that exist in the implementation
> +     - X86_64
> +     - __uint128_t
> +     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
> +
> +   * - The number, order, and encoding of bytes in any object
> +     - ARM64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
> +
> +   * - The number, order, and encoding of bytes in any object
> +     - X86_64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Whether a bit-field can straddle a storage-unit boundary
> +     - ARM64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
> +
> +   * - Whether a bit-field can straddle a storage-unit boundary
> +     - X86_64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The order of allocation of bit-fields within a unit
> +     - ARM64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
> +
> +   * - The order of allocation of bit-fields within a unit
> +     - X86_64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - What constitutes an access to an object that has volatile-qualified type
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.10 Qualifiers" of GCC_MANUAL.
> +
> +   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
> +     - ARM64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
> +
> +   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
> +     - X86_64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token
> +     - ARM64
> +     - UTF-8
> +     - See Section "1.1 Character sets" of CPP_MANUAL.
> +       We assume the locale is not restricting any UTF-8 characters being part of the source character set.
> +
> +   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
> +     - ARM64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
> +
> +   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
> +     - ARM64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
> +
> +   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The mapping of members of the source character set
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -finput-charset=charset in the same manual.
> +
> +   * - The members of the source and execution character sets, except as explicitly specified in the Standard
> +     - ARM64, X86_64
> +     - UTF-8
> +     - See Section "4.4 Characters" of GCC_MANUAL
> +
> +   * - The values of the members of the execution character set
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -fexec-charset=charset in the same manual.
> +
> +   * - How a diagnostic is identified
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.1 Translation" of GCC_MANUAL.
> +
> +   * - The termination status returned to the host environment by the abort, exit, or _Exit function
> +     - ARM64
> +     -
> +     - See "Section 25.7 Program Termination" of ARM64_LIBC_MANUAL.
> +
> +   * - The termination status returned to the host environment by the abort, exit, or _Exit function
> +     - X86_64
> +     -
> +     - See "Section 25.7 Program Termination" of X86_64_LIBC_MANUAL.
> +
> +   * - The places that are searched for an included < > delimited header, and how the places are specified or the header is identified
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - How the named source file is searched for in an included " " delimited header
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - How sequences in both forms of header names are mapped to headers or external source file names
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - Whether the # operator inserts a \ character before the \ character that begins a universal character name in a character constant or string literal
> +     - ARM64, X86_64
> +     -
> +     - See Section "3.4 Stringizing" of CPP_MANUAL.
> +
> +   * - The current locale used to convert a wide string literal into corresponding wide character codes
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
> +
> +   * - The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
> +
> +   * - The behavior on each recognized #pragma directive
> +     - ARM64, X86_64
> +     - pack, GCC visibility
> +     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "7 Pragmas" of CPP_MANUAL.
> +
> +   * - The method by which preprocessing tokens (possibly resulting from macro expansion) in a #include directive are combined into a header name
> +     - X86_64
> +     -
> +     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
> +
> +
> +END OF DOCUMENT.

END OF DOCUMENT is unnecessary

> -- 
> 2.34.1
>
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 2 weeks ago
On 16/06/23 01:26, Stefano Stabellini wrote:
> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>> This document specifies the C language dialect used by Xen and
>> the assumptions Xen makes on the translation toolchain.
>>
>> Signed-off-by: Roberto Bagnara <roberto.bagnara@bugseng.com>
> 
> Thanks Roberto for the amazing work of research and archaeology.
> 
> I have a few comments below, mostly to clarify the description of some
> of the less documented GCC extensions, for the purpose of having all
> community members be able to understand what they can and cannot use.
>> +   * - Arithmetic operator on void type
>> +     - ARM64, X86_64
>> +     - See Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL."
>> +
>> +   * - GNU statement expression
> 
> "GNU statement expression" is not very clear, at least for me. I would
> call it "Statements and Declarations in Expressions".

Agreed.

>> +   * - Empty declaration
>> +     - ARM64, X86_64
>> +     - Non-documented GCC extension.
> 
> For the non-documented GCC extensions, would it be possible to add a
> very brief example or a couple of words in the "References" sections?
> Otherwise I think people might not understand what we are talking about.

Ok.

>> +   * - Incomplete enum declaration
>> +     - ARM64
>> +     - Non-documented GCC extension.
> 
> Is this 6.49 of the GCC manual perhaps?

Indeed, on a second reading, I think that section covers also the case
of an enum declaration that is never completed in the course of the
translation unit.

>> +   * - Implicit conversion from a pointer to an incompatible pointer
>> +     - ARM64, X86_64
>> +     - Non-documented GCC extension.
> 
> Is this related to -Wincompatible-pointer-types?

In my opinion, this does not specify what the result of the
conversion is.

>> +   * - Pointer to a function is converted to a pointer to an object or a pointer to an object is converted to a pointer to a function
>> +     - X86_64
>> +     - Non-documented GCC extension.
> 
> Is this J.5.7 of n1570?
> https://www.iso-9899.info/n1570.html

This says that function pointer casts are a common extension.
What we need here is documentation for GCC that assures us
that the extension is implemented and what its semantics is.

> Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584

My opinion is that this might not be accepted by an assessor:
if I was an assessor, I would not accept it.

>> +   * - Ill-formed source detected by the parser
> 
> As we are documenting compiler extensions that we are using, I am a bit
> confused by the name of this category of compiler extensions, and the
> reason why they are bundled together. After all, they are all separate
> compiler extensions? Should each of them have their own row?

Agreed.

>> +     - ARM64, X86_64
>> +     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
>> +          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
>> +       must specify at least one argument for '...' parameter of variadic macro:
>> +          see Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
>> +       void function should not return void expression:
> 
> I understand that GCC does a poor job at documenting several of these
> extensions. In fact a few of them are not even documented at all.
> However, if they are extensions, they should be described for what they
> do, not for the rule they violate. What do you think?

The point is that we don't know what they do.  We might make observations,
and our observations might substantiate what we believe they do.
But this would not allow us to generalize them.

> For example, in this case maybe we should say "void function can return
> a void expression" ?

We can certainly say that, but this might not convince an assessor.
One possibility would be to submit patches to the GCC manual and see
whether they are accepted.

>> +          see the documentation for -Wreturn-type in Section "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL.
>> +       use of GNU statement expression extension from macro expansion:
>> +          see Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
>> +       invalid application of sizeof to a void type:
>> +          see Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL.
>> +       redeclaration of already-defined enum is a GNU extension:
>> +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
>> +       static function is used in an inline function with external linkage:
>> +          non-documented GCC extension.
> 
> I am not sure if I follow about this one. Did you mean "static is used
> in an inline function with external linkage" ?

An inline function with external linkage can be inlined everywhere.
If that calls a static functions, which is not available everywhere,
the behavior is not defined.

>> +       struct may not be nested in a struct due to flexible array member:
>> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
>> +       struct may not be used as an array element due to flexible array member:
>> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
>> +       ISO C restricts enumerator values to the range of int:
>> +          non-documented GCC extension.
> 
> Should we call it instead "enumerator values can be larger than int" ?

Yes, I have rephrased that entry.

>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>> +     - X86_64
>> +     - \\m:
>> +          non-documented GCC extension.
> 
> Are you saying that we are using \m and \m is not allowed by the C
> standard?

The C standard does not specify that escape sequence, so what is
done with it, in particular by the preprocessor, is not specified.

>> +   * - Non-standard type
> 
> Should we call it "128-bit Integers" ?

I have rephrased this as suggested by Jan.

Thanks for your review.  I will submit a revised patch
on Monday.
Kind regards,

     Roberto
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Jan Beulich 10 months, 2 weeks ago
On 16.06.2023 17:54, Roberto Bagnara wrote:
> On 16/06/23 01:26, Stefano Stabellini wrote:
>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>>> +          see the documentation for -Wreturn-type in Section "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL.
>>> +       use of GNU statement expression extension from macro expansion:
>>> +          see Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
>>> +       invalid application of sizeof to a void type:
>>> +          see Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL.
>>> +       redeclaration of already-defined enum is a GNU extension:
>>> +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
>>> +       static function is used in an inline function with external linkage:
>>> +          non-documented GCC extension.
>>
>> I am not sure if I follow about this one. Did you mean "static is used
>> in an inline function with external linkage" ?
> 
> An inline function with external linkage can be inlined everywhere.
> If that calls a static functions, which is not available everywhere,
> the behavior is not defined.

I guess I could do with an example where this leads to UB. What I'd expect
is that it leads to a compilation error.

>>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>>> +     - X86_64
>>> +     - \\m:
>>> +          non-documented GCC extension.
>>
>> Are you saying that we are using \m and \m is not allowed by the C
>> standard?
> 
> The C standard does not specify that escape sequence, so what is
> done with it, in particular by the preprocessor, is not specified.

Isn't it rather that gcc doesn't follow the spec to the word here?
As per what preprocessing-token can be, anything that isn't (among
other things) a string-literal or a character-constants falls under
"each non-white-space character that cannot be one of the above".
Hence since "\mode" doesn't form a valid string literal, it would
need to become (using '' notation for separation purposes, not to
indicate character constants) '"' '\' 'mode'. Which of course would
break what subsequently are string literals, as the supposedly
closing double-quote would now be an opening one. Which in turn is
presumably the reason why gcc (and probably other compilers as well)
behaves the way it does.

Jan
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 1 week ago
On 19/06/23 09:54, Jan Beulich wrote:
> On 16.06.2023 17:54, Roberto Bagnara wrote:
>> On 16/06/23 01:26, Stefano Stabellini wrote:
>>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>>>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>>>> +     - X86_64
>>>> +     - \\m:
>>>> +          non-documented GCC extension.
>>>
>>> Are you saying that we are using \m and \m is not allowed by the C
>>> standard?
>>
>> The C standard does not specify that escape sequence, so what is
>> done with it, in particular by the preprocessor, is not specified.
> 
> Isn't it rather that gcc doesn't follow the spec to the word here?
> As per what preprocessing-token can be, anything that isn't (among
> other things) a string-literal or a character-constants falls under
> "each non-white-space character that cannot be one of the above".
> Hence since "\mode" doesn't form a valid string literal, it would
> need to become (using '' notation for separation purposes, not to
> indicate character constants) '"' '\' 'mode'. Which of course would
> break what subsequently are string literals, as the supposedly
> closing double-quote would now be an opening one. Which in turn is
> presumably the reason why gcc (and probably other compilers as well)
> behaves the way it does.

After a significant amount of work on the matter, we came to the
following conclusions:

1) In this matter, the C Standard is not at all clear regarding
    the conditions upon which it is legitimate placing undefined
    escape sequences in the sources.
2) The GNU C preprocessor manual says nothing in this regard.
3) Experimenting with a lot of compilers, it seems all implementers
    have filled the dots in the same way, that is: during translation
    phase 3, escape sequences are considered for the sole purpose
    of getting preprocessing tokens right; escape sequences, whether
    defined or undefined, are left untouched and passed over to translation
    phase 4.

Summarizing, we are now convinced that what we are facing is one
of the cases (there are many of them), where the C Standard is
not being clear, and not a case of undefined behavior.  Xen use
of \m guarded by __ASSEMBLY__ is thus correct and not problematic.
Indeed, the check for undefined escape sequences can only
be done after preprocessing.  I have asked that ECLAIR
is suitably amended.

Thank you all, and particularly to Jan, for the perseverance.
Kind regards,

    Roberto
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 1 week ago
On 19/06/23 09:54, Jan Beulich wrote:
> On 16.06.2023 17:54, Roberto Bagnara wrote:
>> On 16/06/23 01:26, Stefano Stabellini wrote:
>>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>>>> +       static function is used in an inline function with external linkage:
>>>> +          non-documented GCC extension.
>>>
>>> I am not sure if I follow about this one. Did you mean "static is used
>>> in an inline function with external linkage" ?
>>
>> An inline function with external linkage can be inlined everywhere.
>> If that calls a static functions, which is not available everywhere,
>> the behavior is not defined.
> 
> I guess I could do with an example where this leads to UB. What I'd expect
> is that it leads to a compilation error.

Here are the two occurrences we have in ARM64 code:

violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
xen/common/spinlock.c:316.29-316.40: Loc #1 [culprit: static function `observe_head(spinlock_tickets_t*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
xen/common/spinlock.c:301.26-301.37: Loc #2 [evidence: `observe_head(spinlock_tickets_t*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]

violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
xen/common/spinlock.c:324.5-324.12: Loc #1 [culprit: static function `got_lock(union lock_debug*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
xen/common/spinlock.c:227.13-227.20: Loc #2 [evidence: `got_lock(union lock_debug*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Jan Beulich 10 months, 1 week ago
On 19.06.2023 12:53, Roberto Bagnara wrote:
> On 19/06/23 09:54, Jan Beulich wrote:
>> On 16.06.2023 17:54, Roberto Bagnara wrote:
>>> On 16/06/23 01:26, Stefano Stabellini wrote:
>>>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>>>>> +       static function is used in an inline function with external linkage:
>>>>> +          non-documented GCC extension.
>>>>
>>>> I am not sure if I follow about this one. Did you mean "static is used
>>>> in an inline function with external linkage" ?
>>>
>>> An inline function with external linkage can be inlined everywhere.
>>> If that calls a static functions, which is not available everywhere,
>>> the behavior is not defined.
>>
>> I guess I could do with an example where this leads to UB. What I'd expect
>> is that it leads to a compilation error.
> 
> Here are the two occurrences we have in ARM64 code:
> 
> violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
> xen/common/spinlock.c:316.29-316.40: Loc #1 [culprit: static function `observe_head(spinlock_tickets_t*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
> xen/common/spinlock.c:301.26-301.37: Loc #2 [evidence: `observe_head(spinlock_tickets_t*)' declared here]
> xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]
> 
> violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
> xen/common/spinlock.c:324.5-324.12: Loc #1 [culprit: static function `got_lock(union lock_debug*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
> xen/common/spinlock.c:227.13-227.20: Loc #2 [evidence: `got_lock(union lock_debug*)' declared here]
> xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]

I know _spin_lock_cb() was an example of a violation (it isn't anymore),
but this does not serve as an example for the UB you claim may occur.
The "inline" there was in a .c file, and hence the function could only
be inlined with its (static) helper also in scope.

Jan
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 1 week ago
On 19/06/23 13:47, Jan Beulich wrote:
> On 19.06.2023 12:53, Roberto Bagnara wrote:
>> On 19/06/23 09:54, Jan Beulich wrote:
>>> On 16.06.2023 17:54, Roberto Bagnara wrote:
>>>> On 16/06/23 01:26, Stefano Stabellini wrote:
>>>>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>>>>>> +       static function is used in an inline function with external linkage:
>>>>>> +          non-documented GCC extension.
>>>>>
>>>>> I am not sure if I follow about this one. Did you mean "static is used
>>>>> in an inline function with external linkage" ?
>>>>
>>>> An inline function with external linkage can be inlined everywhere.
>>>> If that calls a static functions, which is not available everywhere,
>>>> the behavior is not defined.
>>>
>>> I guess I could do with an example where this leads to UB. What I'd expect
>>> is that it leads to a compilation error.
>>
>> Here are the two occurrences we have in ARM64 code:
>>
>> violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
>> xen/common/spinlock.c:316.29-316.40: Loc #1 [culprit: static function `observe_head(spinlock_tickets_t*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
>> xen/common/spinlock.c:301.26-301.37: Loc #2 [evidence: `observe_head(spinlock_tickets_t*)' declared here]
>> xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]
>>
>> violation for rule MC3R1.R1.1: (required) The program shall contain no violations of the standard C syntax and constraints, and shall not exceed the implementation's translation limits.
>> xen/common/spinlock.c:324.5-324.12: Loc #1 [culprit: static function `got_lock(union lock_debug*)' is used in an inline function with external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
>> xen/common/spinlock.c:227.13-227.20: Loc #2 [evidence: `got_lock(union lock_debug*)' declared here]
>> xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal linkage]
> 
> I know _spin_lock_cb() was an example of a violation (it isn't anymore),
> but this does not serve as an example for the UB you claim may occur.
> The "inline" there was in a .c file, and hence the function could only
> be inlined with its (static) helper also in scope.

This is a constraint violation according to C99 6.7.4p3: "An inline definition
of a function with external linkage shall not contain a definition of a modifiable
object with static storage duration, and shall not contain a reference to an identifier
with internal linkage."  A standard-compliant C compiler ought to diagnose all
constraint violations: when it does not, as is the case for GCC in these specific
examples, the behavior is implicitly undefined.
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Stefano Stabellini 10 months, 2 weeks ago
On Fri, 16 Jun 2023, Roberto Bagnara wrote:
> > > +   * - Implicit conversion from a pointer to an incompatible pointer
> > > +     - ARM64, X86_64
> > > +     - Non-documented GCC extension.
> > 
> > Is this related to -Wincompatible-pointer-types?
> 
> In my opinion, this does not specify what the result of the
> conversion is.

Fair enough. However, if -Wincompatible-pointer-types and "Implicit
conversion from a pointer to an incompatible pointer" are related, it
would add -Wincompatible-pointer-types as extra info about it. See also
below.


> > > +   * - Pointer to a function is converted to a pointer to an object or a
> > > pointer to an object is converted to a pointer to a function
> > > +     - X86_64
> > > +     - Non-documented GCC extension.
> > 
> > Is this J.5.7 of n1570?
> > https://www.iso-9899.info/n1570.html
> 
> This says that function pointer casts are a common extension.
> What we need here is documentation for GCC that assures us
> that the extension is implemented and what its semantics is.
> 
> > Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584
> 
> My opinion is that this might not be accepted by an assessor:
> if I was an assessor, I would not accept it.

I understand your point and I think it is valid. My observation was
that it is better to provide as much information for these undocumented
extensions as we can. Not necessarily to help with an assessors, but for
a new engineer working on this project, reading this document and
understanding what can be done. 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 might not be an
official documentation of the extension but it is better than no
documentation at all. Even better might be a code example.

I am not saying we should document ourselves what GCC failed to
document. I am only saying we should add enough description to
understand what we are talking about.

For instance, I read "Pointer to a function is converted to a pointer to
an object or a pointer to an object is converted to a pointer to a
function" and I have an idea about what this is but I am not really
sure. I googled the sentence and found information on Stackoverflow. I
think it is better to link
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 or a couple of
sentences from it, although it might not be official.


> > > +   * - Ill-formed source detected by the parser
> > 
> > As we are documenting compiler extensions that we are using, I am a bit
> > confused by the name of this category of compiler extensions, and the
> > reason why they are bundled together. After all, they are all separate
> > compiler extensions? Should each of them have their own row?
> 
> Agreed.
> 
> > > +     - ARM64, X86_64
> > > +     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
> > > +          see Section "6.21 Macros with a Variable Number of Arguments"
> > > of GCC_MANUAL.
> > > +       must specify at least one argument for '...' parameter of variadic
> > > macro:
> > > +          see Section "6.21 Macros with a Variable Number of Arguments"
> > > of GCC_MANUAL.
> > > +       void function should not return void expression:
> > 
> > I understand that GCC does a poor job at documenting several of these
> > extensions. In fact a few of them are not even documented at all.
> > However, if they are extensions, they should be described for what they
> > do, not for the rule they violate. What do you think?
> 
> The point is that we don't know what they do.  We might make observations,
> and our observations might substantiate what we believe they do.
> But this would not allow us to generalize them.
>
> > For example, in this case maybe we should say "void function can return
> > a void expression" ?
> 
> We can certainly say that, but this might not convince an assessor.
> One possibility would be to submit patches to the GCC manual and see
> whether they are accepted.

I think we have two different target audiences for this document. One
target is an assessors, and I understand that extra unofficial
information might not help there.

However another target is the community. This document should help the
Xen community write better code, not just the assessors raise red flags.
Right? It should help us have better compiler compatibility, and making
sure that we are clear about the C dialect we use. Actually, I think
this document could be of great help. Do you agree?

From that point of view "void function should not return void
expression" is not understandable. At least I don't understand it.

A different approach would be to say:

- this is a MISRA C violation or compiler warning/error
- it is not C99 compliant
- it is not documented behavior by GCC

Not try to describe what the extension is at all, and instead focus on
what the MISRA C violation or compiler warning is.

I think it is OK to go down that route, but in that case we need to
reorganize the document so that:
- all documented extensions are referred to as extensions
- all undocumented extensions are referred to by the warning they
  trigger

I think that we would be OK but honestly I prefer the current approach
and we just need to add a few extra words to better explain what the
undocumented extensions are. Not to replace the GCC manual but simply
because otherwise we are not understanding each other (at least I am not
understanding.)


> > > +          see the documentation for -Wreturn-type in Section "3.8 Options
> > > to Request or Suppress Warnings" of GCC_MANUAL.
> > > +       use of GNU statement expression extension from macro expansion:
> > > +          see Section "6.1 Statements and Declarations in Expressions" of
> > > GCC_MANUAL.
> > > +       invalid application of sizeof to a void type:
> > > +          see Section "6.24 Arithmetic on void- and Function-Pointers" of
> > > GCC_MANUAL.
> > > +       redeclaration of already-defined enum is a GNU extension:
> > > +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
> > > +       static function is used in an inline function with external
> > > linkage:
> > > +          non-documented GCC extension.
> > 
> > I am not sure if I follow about this one. Did you mean "static is used
> > in an inline function with external linkage" ?
> 
> An inline function with external linkage can be inlined everywhere.
> If that calls a static functions, which is not available everywhere,
> the behavior is not defined.

Got it. Can we add this sentence you wrote to the doc?


> > > +       struct may not be nested in a struct due to flexible array member:
> > > +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> > > +       struct may not be used as an array element due to flexible array
> > > member:
> > > +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> > > +       ISO C restricts enumerator values to the range of int:
> > > +          non-documented GCC extension.
> > 
> > Should we call it instead "enumerator values can be larger than int" ?
> 
> Yes, I have rephrased that entry.
> 
> > > +   * - Unspecified escape sequence is encountered in a character constant
> > > or a string literal token
> > > +     - X86_64
> > > +     - \\m:
> > > +          non-documented GCC extension.
> > 
> > Are you saying that we are using \m and \m is not allowed by the C
> > standard?
> 
> The C standard does not specify that escape sequence, so what is
> done with it, in particular by the preprocessor, is not specified.
> 
> > > +   * - Non-standard type
> > 
> > Should we call it "128-bit Integers" ?
> 
> I have rephrased this as suggested by Jan.
> 
> Thanks for your review.  I will submit a revised patch
> on Monday.
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 1 week ago
On 16/06/23 22:43, Stefano Stabellini wrote:
> On Fri, 16 Jun 2023, Roberto Bagnara wrote:
>>>> +   * - Implicit conversion from a pointer to an incompatible pointer
>>>> +     - ARM64, X86_64
>>>> +     - Non-documented GCC extension.
>>>
>>> Is this related to -Wincompatible-pointer-types?
>>
>> In my opinion, this does not specify what the result of the
>> conversion is.
> 
> Fair enough. However, if -Wincompatible-pointer-types and "Implicit
> conversion from a pointer to an incompatible pointer" are related, it
> would add -Wincompatible-pointer-types as extra info about it. See also
> below.
> 
> 
>>>> +   * - Pointer to a function is converted to a pointer to an object or a
>>>> pointer to an object is converted to a pointer to a function
>>>> +     - X86_64
>>>> +     - Non-documented GCC extension.
>>>
>>> Is this J.5.7 of n1570?
>>> https://www.iso-9899.info/n1570.html
>>
>> This says that function pointer casts are a common extension.
>> What we need here is documentation for GCC that assures us
>> that the extension is implemented and what its semantics is.
>>
>>> Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584
>>
>> My opinion is that this might not be accepted by an assessor:
>> if I was an assessor, I would not accept it.
> 
> I understand your point and I think it is valid. My observation was
> that it is better to provide as much information for these undocumented
> extensions as we can. Not necessarily to help with an assessors, but for
> a new engineer working on this project, reading this document and
> understanding what can be done.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 might not be an
> official documentation of the extension but it is better than no
> documentation at all. Even better might be a code example.
> 
> I am not saying we should document ourselves what GCC failed to
> document. I am only saying we should add enough description to
> understand what we are talking about.
> 
> For instance, I read "Pointer to a function is converted to a pointer to
> an object or a pointer to an object is converted to a pointer to a
> function" and I have an idea about what this is but I am not really
> sure. I googled the sentence and found information on Stackoverflow. I
> think it is better to link
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 or a couple of
> sentences from it, although it might not be official.
> 
> 
>>>> +   * - Ill-formed source detected by the parser
>>>
>>> As we are documenting compiler extensions that we are using, I am a bit
>>> confused by the name of this category of compiler extensions, and the
>>> reason why they are bundled together. After all, they are all separate
>>> compiler extensions? Should each of them have their own row?
>>
>> Agreed.
>>
>>>> +     - ARM64, X86_64
>>>> +     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
>>>> +          see Section "6.21 Macros with a Variable Number of Arguments"
>>>> of GCC_MANUAL.
>>>> +       must specify at least one argument for '...' parameter of variadic
>>>> macro:
>>>> +          see Section "6.21 Macros with a Variable Number of Arguments"
>>>> of GCC_MANUAL.
>>>> +       void function should not return void expression:
>>>
>>> I understand that GCC does a poor job at documenting several of these
>>> extensions. In fact a few of them are not even documented at all.
>>> However, if they are extensions, they should be described for what they
>>> do, not for the rule they violate. What do you think?
>>
>> The point is that we don't know what they do.  We might make observations,
>> and our observations might substantiate what we believe they do.
>> But this would not allow us to generalize them.
>>
>>> For example, in this case maybe we should say "void function can return
>>> a void expression" ?
>>
>> We can certainly say that, but this might not convince an assessor.
>> One possibility would be to submit patches to the GCC manual and see
>> whether they are accepted.
> 
> I think we have two different target audiences for this document. One
> target is an assessors, and I understand that extra unofficial
> information might not help there.
> 
> However another target is the community. This document should help the
> Xen community write better code, not just the assessors raise red flags.
> Right? It should help us have better compiler compatibility, and making
> sure that we are clear about the C dialect we use. Actually, I think
> this document could be of great help. Do you agree?
> 
>  From that point of view "void function should not return void
> expression" is not understandable. At least I don't understand it.
> 
> A different approach would be to say:
> 
> - this is a MISRA C violation or compiler warning/error
> - it is not C99 compliant
> - it is not documented behavior by GCC
> 
> Not try to describe what the extension is at all, and instead focus on
> what the MISRA C violation or compiler warning is.
> 
> I think it is OK to go down that route, but in that case we need to
> reorganize the document so that:
> - all documented extensions are referred to as extensions
> - all undocumented extensions are referred to by the warning they
>    trigger
> 
> I think that we would be OK but honestly I prefer the current approach
> and we just need to add a few extra words to better explain what the
> undocumented extensions are. Not to replace the GCC manual but simply
> because otherwise we are not understanding each other (at least I am not
> understanding.)
> 
> 
>>>> +          see the documentation for -Wreturn-type in Section "3.8 Options
>>>> to Request or Suppress Warnings" of GCC_MANUAL.
>>>> +       use of GNU statement expression extension from macro expansion:
>>>> +          see Section "6.1 Statements and Declarations in Expressions" of
>>>> GCC_MANUAL.
>>>> +       invalid application of sizeof to a void type:
>>>> +          see Section "6.24 Arithmetic on void- and Function-Pointers" of
>>>> GCC_MANUAL.
>>>> +       redeclaration of already-defined enum is a GNU extension:
>>>> +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
>>>> +       static function is used in an inline function with external
>>>> linkage:
>>>> +          non-documented GCC extension.
>>>
>>> I am not sure if I follow about this one. Did you mean "static is used
>>> in an inline function with external linkage" ?
>>
>> An inline function with external linkage can be inlined everywhere.
>> If that calls a static functions, which is not available everywhere,
>> the behavior is not defined.
> 
> Got it. Can we add this sentence you wrote to the doc?

Hi Stefano.

I think all the feedback received has been taken into account.
I will send a revised patch soon.
Kind regards,

    Roberto
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Jan Beulich 10 months, 2 weeks ago
On 16.06.2023 01:26, Stefano Stabellini wrote:
> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
> I have a few comments below, mostly to clarify the description of some
> of the less documented GCC extensions, for the purpose of having all
> community members be able to understand what they can and cannot use.

What do you mean by "can and cannot use"? Is this document intended to
forbid the use of any extensions we may not currently use, or we use
but which aren't enumerated here?

One of the reasons that kept me from replying to this submission is
that the full purpose of this new doc isn't stated in the description.
Which in turn leaves open whether certain items actually need to be
here (see e.g. the libc related remark below). Another is that it's
hard to tell how to convince oneself of this being an exhaustive
enumeration. One extension we use extensively yet iirc is missing here
is omission of the middle operand of the ternary operator.

>> --- /dev/null
>> +++ b/docs/misra/C-language-toolchain.rst
>> @@ -0,0 +1,465 @@
>> +=============================================
>> +C Dialect and Translation Assumptions for Xen
>> +=============================================
>> +
>> +This document specifies the C language dialect used by Xen and
>> +the assumptions Xen makes on the translation toolchain.
>> +It covers, in particular:
>> +
>> +1. the used language extensions;
>> +2. the translation limits that the translation toolchains must be able
>> +   to accommodate;
>> +3. the implementation-defined behaviors upon which Xen may depend.
>> +
>> +All points are of course relevant for portability.  In addition,
>> +programming in C is impossible without a detailed knowledge of the
>> +implementation-defined behaviors.  For this reason, it is recommended
>> +that Xen developers have familiarity with this document and the
>> +documentation referenced therein.
>> +
>> +This document needs maintenance and adaptation in the following
>> +circumstances:
>> +
>> +- whenever the compiler is changed or updated;
>> +- whenever the use of a certain language extension is added or removed;
>> +- whenever code modifications cause exceeding the stated translation limits.
>> +
>> +
>> +Applicable C Language Standard
>> +______________________________
>> +
>> +Xen is written in C99 with extensions.  The relevant ISO standard is
>> +
>> +    *ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
>> +    Technical Corrigendum 3.
>> +    ISO/IEC, Geneva, Switzerland, 2007.
>> +
>> +
>> +Reference Documentation
>> +_______________________
>> +
>> +The following documents are referred to in the sequel:
>> +
>> +GCC_MANUAL:
>> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
>> +CPP_MANUAL:
>> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf

Why 12.1 when meanwhile there's 12.3 and 13.1?

>> +ARM64_ABI_MANUAL:
>> +  https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
>> +X86_64_ABI_MANUAL:
>> +  https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
>> +ARM64_LIBC_MANUAL:
>> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
>> +X86_64_LIBC_MANUAL:
>> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf

How is libc relevant to the hypervisor?

>> +   * - Empty declaration
>> +     - ARM64, X86_64
>> +     - Non-documented GCC extension.
> 
> For the non-documented GCC extensions, would it be possible to add a
> very brief example or a couple of words in the "References" sections?
> Otherwise I think people might not understand what we are talking about.
> 
> For instance in this case I would say:
> 
> An empty declaration is a semicolon with nothing before it.
> Non-documented GCC extension.

Which then could be confused with empty statements. I think in a document
like this language needs to be very precise, to avoid ambiguities and
confusion as much as possible. (Iirc from going over this doc yesterday
this applies elsewhere as well.)

>> +   * - Ill-formed source detected by the parser
> 
> As we are documenting compiler extensions that we are using, I am a bit
> confused by the name of this category of compiler extensions, and the
> reason why they are bundled together. After all, they are all separate
> compiler extensions? Should each of them have their own row?

+1

>> +
>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>> +     - X86_64
>> +     - \\m:
>> +          non-documented GCC extension.
> 
> Are you saying that we are using \m and \m is not allowed by the C
> standard?

This exists in the __ASSEMBLY__ part of a header, and I had previously
commented on Roberto's diagnosis (possibly derived from Eclair's) here.
As per that I don't think the item should be here, but I'm of course
open to be shown that my understanding of translation phases is wrong.

>> +   * - Non-standard type
> 
> Should we call it "128-bit Integers" ?

Or better more generally "Extended integer types" (or something along
these lines, i.e. as these are called in the spec)?

Jan
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 2 weeks ago
On 16/06/23 08:53, Jan Beulich wrote:
> On 16.06.2023 01:26, Stefano Stabellini wrote:
>> On Thu, 15 Jun 2023, Roberto Bagnara wrote:
>> I have a few comments below, mostly to clarify the description of some
>> of the less documented GCC extensions, for the purpose of having all
>> community members be able to understand what they can and cannot use.
> 
> What do you mean by "can and cannot use"? Is this document intended to
> forbid the use of any extensions we may not currently use, or we use
> but which aren't enumerated here?
> 
> One of the reasons that kept me from replying to this submission is
> that the full purpose of this new doc isn't stated in the description.

My full purpose was to give the community a starting point for the
discussion on the assumptions the project makes on the programming
language and the translation toolchains that are intended to be used
now or in the future.  As far as I know, no documentation is currently
provided on these topics, so I believe the document fills a gap and
I hope it is good enough as a starting point.

> Which in turn leaves open whether certain items actually need to be
> here (see e.g. the libc related remark below).

Because the analyzed build used to included some of the tools, which in turn
relied on libc for program termination.  Once confirmation is given
that the analyzed build is now what is intended, all references to
libc can be removed.

> Another is that it's
> hard to tell how to convince oneself of this being an exhaustive
> enumeration. One extension we use extensively yet iirc is missing here
> is omission of the middle operand of the ternary operator.

Not sure I understand: do you mean something different from the following
entry in the document?

    * - Binary conditional expression
      - ARM64, X86_64
      - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.


>>> +Reference Documentation
>>> +_______________________
>>> +
>>> +The following documents are referred to in the sequel:
>>> +
>>> +GCC_MANUAL:
>>> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
>>> +CPP_MANUAL:
>>> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
> 
> Why 12.1 when meanwhile there's 12.3 and 13.1?

For no special reason: as I said, my purpose is only to provide
a starting point for discussion and customization of the
assumptions.

>>> +ARM64_ABI_MANUAL:
>>> +  https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
>>> +X86_64_ABI_MANUAL:
>>> +  https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
>>> +ARM64_LIBC_MANUAL:
>>> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
>>> +X86_64_LIBC_MANUAL:
>>> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
> 
> How is libc relevant to the hypervisor?

See above.

>>> +   * - Empty declaration
>>> +     - ARM64, X86_64
>>> +     - Non-documented GCC extension.
>>
>> For the non-documented GCC extensions, would it be possible to add a
>> very brief example or a couple of words in the "References" sections?
>> Otherwise I think people might not understand what we are talking about.
>>
>> For instance in this case I would say:
>>
>> An empty declaration is a semicolon with nothing before it.
>> Non-documented GCC extension.
> 
> Which then could be confused with empty statements. I think in a document
> like this language needs to be very precise, to avoid ambiguities and
> confusion as much as possible. (Iirc from going over this doc yesterday
> this applies elsewhere as well.)

OK.

>>> +   * - Ill-formed source detected by the parser
>>
>> As we are documenting compiler extensions that we are using, I am a bit
>> confused by the name of this category of compiler extensions, and the
>> reason why they are bundled together. After all, they are all separate
>> compiler extensions? Should each of them have their own row?
> 
> +1

OK.

>>> +
>>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>>> +     - X86_64
>>> +     - \\m:
>>> +          non-documented GCC extension.
>>
>> Are you saying that we are using \m and \m is not allowed by the C
>> standard?
> 
> This exists in the __ASSEMBLY__ part of a header, and I had previously
> commented on Roberto's diagnosis (possibly derived from Eclair's) here.
> As per that I don't think the item should be here, but I'm of course
> open to be shown that my understanding of translation phases is wrong.

I was not convinced by your explanation but, as I think I have said already,
I am not the one to be convinced.  In the specific case, independently
from __ASSEMBLY__ or any other considerations, that thing reaches the C
preprocessor and, to the best of my knowledge, the C preprocessor documentation
does not say how that would be handled.  I have spent a lot of time in the
past 10 years on the study of functional-safety standards, and what I
am providing is a honest opinion on what I believe is compliant
and what is not.  But I may be wrong of course: if you or anyone else feels
like they would not have any problems in arguing a different position
from mine in front of an assessor, then please go for it, but please
do not ask me to go beyond my judgment.

>>> +   * - Non-standard type
>>
>> Should we call it "128-bit Integers" ?
> 
> Or better more generally "Extended integer types" (or something along
> these lines, i.e. as these are called in the spec)?

OK, "Extended integer types" is indeed a good summary of item 1 of
C99 Section "J.3.5 Integers", which is
"Any extended integer types that exist in the implementation (6.2.5)."
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Jan Beulich 10 months, 2 weeks ago
On 16.06.2023 09:45, Roberto Bagnara wrote:
> On 16/06/23 08:53, Jan Beulich wrote:
>> On 16.06.2023 01:26, Stefano Stabellini wrote:
>> Another is that it's
>> hard to tell how to convince oneself of this being an exhaustive
>> enumeration. One extension we use extensively yet iirc is missing here
>> is omission of the middle operand of the ternary operator.
> 
> Not sure I understand: do you mean something different from the following
> entry in the document?
> 
>     * - Binary conditional expression
>       - ARM64, X86_64
>       - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.

Ah, yes, that is it. I find gcc's title misleading (because there are
far more "conditionals" than just the ternary operator), and hence
when going through you doc I didn't spot this (I'm sorry), and when
then searching for "ternary" and "?:" there were no hits.

>>>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>>>> +     - X86_64
>>>> +     - \\m:
>>>> +          non-documented GCC extension.
>>>
>>> Are you saying that we are using \m and \m is not allowed by the C
>>> standard?
>>
>> This exists in the __ASSEMBLY__ part of a header, and I had previously
>> commented on Roberto's diagnosis (possibly derived from Eclair's) here.
>> As per that I don't think the item should be here, but I'm of course
>> open to be shown that my understanding of translation phases is wrong.
> 
> I was not convinced by your explanation but, as I think I have said already,
> I am not the one to be convinced.  In the specific case, independently
> from __ASSEMBLY__ or any other considerations, that thing reaches the C
> preprocessor and, to the best of my knowledge, the C preprocessor documentation
> does not say how that would be handled.  I have spent a lot of time in the
> past 10 years on the study of functional-safety standards, and what I
> am providing is a honest opinion on what I believe is compliant
> and what is not.  But I may be wrong of course: if you or anyone else feels
> like they would not have any problems in arguing a different position
> from mine in front of an assessor, then please go for it, but please
> do not ask me to go beyond my judgment.

Well, disagreement on purely a technical matter can usually be resolved,
unless something is truly unspecified. Since you referred to translation
phases, and since I pointed out that preprocessing directives are carried
out before escape sequences are converted to the execution character set
(which is the point where unknown escape sequences would matter afaict),
there must be something you view differently in this process. It would be
helpful if you could point out what this is, possibly leading to me
recognizing a mistake of mine.

Actually, maybe I figured what you're concerned about: Already at the
stage of decomposing into preprocessing-token-s there is an issue, as
e.g. "\mode" doesn't form a valid string-literal. For other, unquoted
\m I would assume though that the final "each non-white-space character
that cannot be one of the above" (in the enumeration of what a
preprocessing-token is) would catch it.

Furthermore it is entirely unclear to me what it is that you suggest we
do instead. It can't reasonably be "name all you assembler macro
parameters such that they start with a, b, f, n, r, t, or v". Splitting
headers also wouldn't be very nice - we try to keep related things
together, after all. It also doesn't look like __stringify(\mode) would
be okay, as macro expansion shares a translation phase with execution
of preprocessing directives (so in principle the body of "#if 0" could
be macro-expanded before being discarded). (Plus I think this would
result in "\\mode", i.e. also wouldn't work in the first place. But it
would rule out other possible C macro trickery as well.)

Jan
Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.
Posted by Roberto Bagnara 10 months, 2 weeks ago
On 16/06/23 12:03, Jan Beulich wrote:
> On 16.06.2023 09:45, Roberto Bagnara wrote:
>> On 16/06/23 08:53, Jan Beulich wrote:
>>> On 16.06.2023 01:26, Stefano Stabellini wrote:
>>>>> +   * - Unspecified escape sequence is encountered in a character constant or a string literal token
>>>>> +     - X86_64
>>>>> +     - \\m:
>>>>> +          non-documented GCC extension.
>>>>
>>>> Are you saying that we are using \m and \m is not allowed by the C
>>>> standard?
>>>
>>> This exists in the __ASSEMBLY__ part of a header, and I had previously
>>> commented on Roberto's diagnosis (possibly derived from Eclair's) here.
>>> As per that I don't think the item should be here, but I'm of course
>>> open to be shown that my understanding of translation phases is wrong.
>>
>> I was not convinced by your explanation but, as I think I have said already,
>> I am not the one to be convinced.  In the specific case, independently
>> from __ASSEMBLY__ or any other considerations, that thing reaches the C
>> preprocessor and, to the best of my knowledge, the C preprocessor documentation
>> does not say how that would be handled.  I have spent a lot of time in the
>> past 10 years on the study of functional-safety standards, and what I
>> am providing is a honest opinion on what I believe is compliant
>> and what is not.  But I may be wrong of course: if you or anyone else feels
>> like they would not have any problems in arguing a different position
>> from mine in front of an assessor, then please go for it, but please
>> do not ask me to go beyond my judgment.
> 
> Well, disagreement on purely a technical matter can usually be resolved,
> unless something is truly unspecified. Since you referred to translation
> phases, and since I pointed out that preprocessing directives are carried
> out before escape sequences are converted to the execution character set
> (which is the point where unknown escape sequences would matter afaict),
> there must be something you view differently in this process. It would be
> helpful if you could point out what this is, possibly leading to me
> recognizing a mistake of mine.
> 
> Actually, maybe I figured what you're concerned about: Already at the
> stage of decomposing into preprocessing-token-s there is an issue, as
> e.g. "\mode" doesn't form a valid string-literal. For other, unquoted
> \m I would assume though that the final "each non-white-space character
> that cannot be one of the above" (in the enumeration of what a
> preprocessing-token is) would catch it.

Yes but, more generally, my concern is that the behavior in presence
of unspecified escape sequences is not specified in the C99 standard
and it is not a documented extension according to the documentation
I have examined.  For this reason, I don't think that feature is
usable for safety-related development unless other (potentially
quite expensive) activities are performed (such as prescribing
extra validation activities for the preprocessor).

> Furthermore it is entirely unclear to me what it is that you suggest we
> do instead. It can't reasonably be "name all you assembler macro
> parameters such that they start with a, b, f, n, r, t, or v". Splitting
> headers also wouldn't be very nice - we try to keep related things
> together, after all. It also doesn't look like __stringify(\mode) would
> be okay, as macro expansion shares a translation phase with execution
> of preprocessing directives (so in principle the body of "#if 0" could
> be macro-expanded before being discarded). (Plus I think this would
> result in "\\mode", i.e. also wouldn't work in the first place. But it
> would rule out other possible C macro trickery as well.)

My suggestion is avoiding the use of the C preprocessor
outside its specification.  This includes, among other
possibilities:

a) using a different preprocessor or substitution mechanism;
b) amend the preprocessor specification by, e.g., submitting
    patches with suitable additions for "The C Preprocessor"
    manual of GCC.

In view of that, naming macro parameters so that you never
have an unspecified escape sequence is probably the cheapest
(yet bulletproof) solution.
Kind regards,

    Roberto