[RFC PATCH 0/5] clang-format for Xen

Luca Fancellu posted 5 patches 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20230728081144.4124309-1-luca.fancellu@arm.com
There is a newer version of this series
docs/misra/exclude-list.json                  |  88 +++
xen/.clang-format                             | 693 ++++++++++++++++++
xen/arch/x86/hvm/mtrr.c                       |   2 +
xen/scripts/codestyle.py                      | 261 +++++++
xen/scripts/xen_analysis/cppcheck_analysis.py |   6 +-
.../xen_analysis/exclusion_file_list.py       |  31 +-
6 files changed, 1063 insertions(+), 18 deletions(-)
create mode 100644 xen/.clang-format
create mode 100755 xen/scripts/codestyle.py
[RFC PATCH 0/5] clang-format for Xen
Posted by Luca Fancellu 9 months ago
## Introduction ################################################################

In this serie, I would like to get feedbacks on the output generated by the
configuration of clang-format, unfortunately we can't use only clang-format, but
we need to call it using a wrapper, because we need the information of what
files need to be excluded from the tool.

Another reason is that clang-format has some limitation when formatting asm()
instruction and most of the time it format them in a very ugly way or it breaks
the code for example removing spaces that were there for a reason (I don't think
it's a tool to format asm), so in the wrapper script we protect all asm()
invocation or macros where there are asm() invocation with in-code comments that
stops clang-format to act on that section:

/* clang-format off */section/* clang-format on */

I've read the past threads about the brave people who dared to try to introduce
clang-format for the xen codebase, some of them from 5 years ago, two points
were clear: 1) goto label needs to be indented and 2) do-while loops have the
braket in the same line.
While point 1) was quite a blocker, it seemd to me that point 2) was less
controversial to be changed in the Xen codestyle, so the current wrapper script
handles only the point 1 (which is easy), the point 2 can be more tricky to
handle.

## The clang-format configuration ##############################################

In my clang-format configuration I've taken inspiration from EPAM's work, then
from the configuration in Linux and finally from the clang-format manual, to try
to produce a comprehensive configuration.

Every configuration parameter has on top a comment with the description and
when it was supported, finally I've added also a [not specified] if that
behavior is not clearly specified in the Xen coding style, I've done that so
we could discuss about adding more specification in our CODING_STYLE.
Every comment can be stripped out in the final release of the file, but I think
that now they are useful for the discussion.

The minimum clang-format version for the file is 15, my ubuntu 22.04 comes with
it, we can reason if it's too high, or if we could also use the latest version
maybe shipped inside a docker image.

For every [not specified] behavior, I've tried to guess it from the codebase,
I've seen that also in that case it's not easy as there is (sometimes) low
consistency between modules, so we can discuss on every configurable.

Worth to mention, the public header are all excluded from the format tool,
because formatting them breaks the build on X86, because there are scripts for
auto-generation that don't handle the formatted headers, I didn't investigate
on it, maybe it can be added as technical debt.

So I've tried building arm32, arm64 and x86_64 with the formatted output and
they build, I've used Yocto for that.

## How to try it? ##############################################################

So how to generate everything? Just invoke the codestyle.py script without
parameter and it will format every .c and .h file in the hypervisor codebase.

./xen/scripts/codestyle.py

Optionally you can also pass one or more relative path from the folder you are
invoking the script and it will format only them.

## What I expect from this RFC #################################################

I expect feedback on the output, some agreement on what configuration to use,
and I expect to find possible blocker before working seriously on this serie,
because if there are outstanding blockers on the adoption of the tool and we
can't reach an agreement, I won't spend further time on it.

I understand that the release is coming soon and some people are on holiday in
this period, so worst case I will ping again after the release.

Luca Fancellu (5):
  [WIP]misra: add entries to the excluded list
  [WIP]cppcheck: rework exclusion_file_list.py code
  [WIP]xen/scripts: add codestyle.py script
  x86/HVM: protect mm_type_tbl format from clang-format
  xen: Add clang-format configuration

 docs/misra/exclude-list.json                  |  88 +++
 xen/.clang-format                             | 693 ++++++++++++++++++
 xen/arch/x86/hvm/mtrr.c                       |   2 +
 xen/scripts/codestyle.py                      | 261 +++++++
 xen/scripts/xen_analysis/cppcheck_analysis.py |   6 +-
 .../xen_analysis/exclusion_file_list.py       |  31 +-
 6 files changed, 1063 insertions(+), 18 deletions(-)
 create mode 100644 xen/.clang-format
 create mode 100755 xen/scripts/codestyle.py

-- 
2.34.1
Re: [RFC PATCH 0/5] clang-format for Xen
Posted by Jan Beulich 9 months ago
On 28.07.2023 10:11, Luca Fancellu wrote:
> I've read the past threads about the brave people who dared to try to introduce
> clang-format for the xen codebase, some of them from 5 years ago, two points
> were clear: 1) goto label needs to be indented and 2) do-while loops have the
> braket in the same line.
> While point 1) was quite a blocker, it seemd to me that point 2) was less
> controversial to be changed in the Xen codestyle, so the current wrapper script
> handles only the point 1 (which is easy), the point 2 can be more tricky to
> handle.

I'm afraid I view the do/while part pretty much as a blocker as well.
While placing the opening brace according to our style elsewhere would
be okay-ish (just a little wasteful to have two almost empty lines),
having the closing brace on a separate line is problematic: At least I
consider a block / scope to end at the line where the closing brace is.
So the farther do and while are apart, the more

    do
    {
        ...;
    }
    while ( cond );
    ...;

is going to be misleading. While normally we would write potentially
conflicting constructs this way

    while ( cond )
        ;

the alternative spelling still isn't outright wrong in our style (I
believe):

    while ( cond );

Jan
Re: [RFC PATCH 0/5] clang-format for Xen
Posted by Luca Fancellu 9 months ago

> On 28 Jul 2023, at 11:12, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 28.07.2023 10:11, Luca Fancellu wrote:
>> I've read the past threads about the brave people who dared to try to introduce
>> clang-format for the xen codebase, some of them from 5 years ago, two points
>> were clear: 1) goto label needs to be indented and 2) do-while loops have the
>> braket in the same line.
>> While point 1) was quite a blocker, it seemd to me that point 2) was less
>> controversial to be changed in the Xen codestyle, so the current wrapper script
>> handles only the point 1 (which is easy), the point 2 can be more tricky to
>> handle.
> 
> I'm afraid I view the do/while part pretty much as a blocker as well.
> While placing the opening brace according to our style elsewhere would
> be okay-ish (just a little wasteful to have two almost empty lines),
> having the closing brace on a separate line is problematic: At least I
> consider a block / scope to end at the line where the closing brace is.
> So the farther do and while are apart, the more
> 
>    do
>    {
>        ...;
>    }
>    while ( cond );
>    ...;
> 
> is going to be misleading. While normally we would write potentially
> conflicting constructs this way
> 
>    while ( cond )
>        ;
> 
> the alternative spelling still isn't outright wrong in our style (I
> believe):
> 
>    while ( cond );

Hi Jan,

Thank you for your feedback, I could maybe misunderstood your reply, so please
tell me if I am wrong, the Xen coding style mandates this style for do-while loops:

do {
/* Do stuff. */
} while ( condition );

Currently clang-format is able to do only this:

do
{
/* Do stuff. */
} while ( condition );

So the issue is only in the opening brackets, not the closing one. Is it a blocker too?

Cheers,
Luca

> 
> Jan
Re: [RFC PATCH 0/5] clang-format for Xen
Posted by Jan Beulich 9 months ago
On 28.07.2023 12:50, Luca Fancellu wrote:
> 
> 
>> On 28 Jul 2023, at 11:12, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.07.2023 10:11, Luca Fancellu wrote:
>>> I've read the past threads about the brave people who dared to try to introduce
>>> clang-format for the xen codebase, some of them from 5 years ago, two points
>>> were clear: 1) goto label needs to be indented and 2) do-while loops have the
>>> braket in the same line.
>>> While point 1) was quite a blocker, it seemd to me that point 2) was less
>>> controversial to be changed in the Xen codestyle, so the current wrapper script
>>> handles only the point 1 (which is easy), the point 2 can be more tricky to
>>> handle.
>>
>> I'm afraid I view the do/while part pretty much as a blocker as well.
>> While placing the opening brace according to our style elsewhere would
>> be okay-ish (just a little wasteful to have two almost empty lines),
>> having the closing brace on a separate line is problematic: At least I
>> consider a block / scope to end at the line where the closing brace is.
>> So the farther do and while are apart, the more
>>
>>    do
>>    {
>>        ...;
>>    }
>>    while ( cond );
>>    ...;
>>
>> is going to be misleading. While normally we would write potentially
>> conflicting constructs this way
>>
>>    while ( cond )
>>        ;
>>
>> the alternative spelling still isn't outright wrong in our style (I
>> believe):
>>
>>    while ( cond );
> 
> Hi Jan,
> 
> Thank you for your feedback, I could maybe misunderstood your reply, so please
> tell me if I am wrong, the Xen coding style mandates this style for do-while loops:
> 
> do {
> /* Do stuff. */
> } while ( condition );
> 
> Currently clang-format is able to do only this:
> 
> do
> {
> /* Do stuff. */
> } while ( condition );

Oh, I hadn't understood your description that way.

> So the issue is only in the opening brackets, not the closing one. Is it a blocker too?

No. I don't like the longer form, but I could live with it.

Jan
Re: [RFC PATCH 0/5] clang-format for Xen
Posted by Stefano Stabellini 9 months ago
On Fri, 28 Jul 2023, Luca Fancellu wrote:
> ## Introduction ################################################################
> 
> In this serie, I would like to get feedbacks on the output generated by the
> configuration of clang-format, unfortunately we can't use only clang-format, but
> we need to call it using a wrapper, because we need the information of what
> files need to be excluded from the tool.
> 
> Another reason is that clang-format has some limitation when formatting asm()
> instruction and most of the time it format them in a very ugly way or it breaks
> the code for example removing spaces that were there for a reason (I don't think
> it's a tool to format asm), so in the wrapper script we protect all asm()
> invocation or macros where there are asm() invocation with in-code comments that
> stops clang-format to act on that section:
> 
> /* clang-format off */section/* clang-format on */
> 
> I've read the past threads about the brave people who dared to try to introduce
> clang-format for the xen codebase, some of them from 5 years ago, two points
> were clear: 1) goto label needs to be indented and 2) do-while loops have the
> braket in the same line.
> While point 1) was quite a blocker, it seemd to me that point 2) was less
> controversial to be changed in the Xen codestyle, so the current wrapper script
> handles only the point 1 (which is easy), the point 2 can be more tricky to
> handle.

Are these the only 2 points to discuss?

I think as a next step it would be worth listing all the code style
decision points we need to make as a group and then go over them during
one of the next calls.


> ## The clang-format configuration ##############################################
> 
> In my clang-format configuration I've taken inspiration from EPAM's work, then
> from the configuration in Linux and finally from the clang-format manual, to try
> to produce a comprehensive configuration.
> 
> Every configuration parameter has on top a comment with the description and
> when it was supported, finally I've added also a [not specified] if that
> behavior is not clearly specified in the Xen coding style, I've done that so
> we could discuss about adding more specification in our CODING_STYLE.
> Every comment can be stripped out in the final release of the file, but I think
> that now they are useful for the discussion.
> 
> The minimum clang-format version for the file is 15, my ubuntu 22.04 comes with
> it, we can reason if it's too high, or if we could also use the latest version
> maybe shipped inside a docker image.
> 
> For every [not specified] behavior, I've tried to guess it from the codebase,
> I've seen that also in that case it's not easy as there is (sometimes) low
> consistency between modules, so we can discuss on every configurable.
> 
> Worth to mention, the public header are all excluded from the format tool,
> because formatting them breaks the build on X86, because there are scripts for
> auto-generation that don't handle the formatted headers, I didn't investigate
> on it, maybe it can be added as technical debt.
> 
> So I've tried building arm32, arm64 and x86_64 with the formatted output and
> they build, I've used Yocto for that.
> 
> ## How to try it? ##############################################################
> 
> So how to generate everything? Just invoke the codestyle.py script without
> parameter and it will format every .c and .h file in the hypervisor codebase.
> 
> ./xen/scripts/codestyle.py
> 
> Optionally you can also pass one or more relative path from the folder you are
> invoking the script and it will format only them.

Thanks for the amazing work Luca. I did as described above and generate
a 9MB diff patch :-)

I scrolled through it and most of it makes sense but a few things look
weird. Copy/pasting them here individually not to add a 9MB diff in
attachment.

> diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
> index db5085e15d..398dea92e6 100644
> --- a/xen/arch/arm/acpi/boot.c
> +++ b/xen/arch/arm/acpi/boot.c
> @@ -52,7 +52,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
>  {
>      int i;
>      int rc;
> -    u64 mpidr = processor->arm_mpidr & MPIDR_HWID_MASK;
> +    u64 mpidr    = processor->arm_mpidr & MPIDR_HWID_MASK;
>      bool enabled = processor->flags & ACPI_MADT_ENABLED;
>  
>      if ( mpidr == MPIDR_INVALID )

Do we need the = alignment?


It is causing other weird looking changes:

>    rsdp = (struct acpi_table_rsdp *)base_ptr;
>     /* Replace xsdt_physical_address */
>     rsdp->xsdt_physical_address = tbl_add[TBL_XSDT].start;
>-    checksum = acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp), table_size);
>+    checksum       = acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp), table_size);
>     rsdp->checksum = rsdp->checksum - checksum;
 
[...]

>  #ifdef CONFIG_MULTIBOOT
> -int __init xsm_multiboot_policy_init(
> -    unsigned long *module_map, const multiboot_info_t *mbi,
> -    void **policy_buffer, size_t *policy_size)
> +int __init xsm_multiboot_policy_init(unsigned long *module_map,
> +                                     const multiboot_info_t *mbi,
> +                                     void **policy_buffer, size_t *policy_size)
>  {
>      int i;
>      module_t *mod = (module_t *)__va(mbi->mods_addr);
> -    int rc = 0;
> +    int rc        = 0;
>      u32 *_policy_start;
>      unsigned long _policy_len;
 
Can we take the = alignment out? Without it, most other things look
good, at least at first glance.

Also this looks a bit unnecessary, but not a blocker for me:

> @@ -908,8 +895,7 @@ static int __init efi_check_dt_boot(const EFI_LOADED_IMAGE *loaded_image)
>  }
>  
>  static void __init efi_arch_cpu(void)
> -{   
> -}       
> +{}
>      

I think we should:
- identify "weird" changes like the above
- discuss them as a group, like we do with MISRA, and decide how to
  address them
- once everyone is happy with the new format, come up with a plan on how
  to merge a 9MB patch
Re: [RFC PATCH 0/5] clang-format for Xen
Posted by Luca Fancellu 9 months ago

> On 29 Jul 2023, at 01:30, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Fri, 28 Jul 2023, Luca Fancellu wrote:
>> ## Introduction ################################################################
>> 
>> In this serie, I would like to get feedbacks on the output generated by the
>> configuration of clang-format, unfortunately we can't use only clang-format, but
>> we need to call it using a wrapper, because we need the information of what
>> files need to be excluded from the tool.
>> 
>> Another reason is that clang-format has some limitation when formatting asm()
>> instruction and most of the time it format them in a very ugly way or it breaks
>> the code for example removing spaces that were there for a reason (I don't think
>> it's a tool to format asm), so in the wrapper script we protect all asm()
>> invocation or macros where there are asm() invocation with in-code comments that
>> stops clang-format to act on that section:
>> 
>> /* clang-format off */section/* clang-format on */
>> 
>> I've read the past threads about the brave people who dared to try to introduce
>> clang-format for the xen codebase, some of them from 5 years ago, two points
>> were clear: 1) goto label needs to be indented and 2) do-while loops have the
>> braket in the same line.
>> While point 1) was quite a blocker, it seemd to me that point 2) was less
>> controversial to be changed in the Xen codestyle, so the current wrapper script
>> handles only the point 1 (which is easy), the point 2 can be more tricky to
>> handle.

Hi Stefano,

thank you for taking the time to try the changes.

> 
> Are these the only 2 points to discuss?

Probably not, the idea with this serie is to have the maintainers to look into
the changes and spot things that are not great, so that we can list them and
check if the behaviour is fixable or not, and based on that come up with
an action or a decision.

> 
> I think as a next step it would be worth listing all the code style
> decision points we need to make as a group and then go over them during
> one of the next calls.

Yes, my idea was to have some rounds on the last patch where [not specified]
options are listed, basically all of their value were chosen by me just guessing
from a brief codebase inspection, I expect these values to be agreed and (hopefully)
we can formalise some of them with a requirement in the CODING_STYLE, so that
we can have finally a document and the implementation of that by a tool, which can
be changed in the future if we find some more powerful or reliable tool.

> 
> 
>> ## The clang-format configuration ##############################################
>> 
>> In my clang-format configuration I've taken inspiration from EPAM's work, then
>> from the configuration in Linux and finally from the clang-format manual, to try
>> to produce a comprehensive configuration.
>> 
>> Every configuration parameter has on top a comment with the description and
>> when it was supported, finally I've added also a [not specified] if that
>> behavior is not clearly specified in the Xen coding style, I've done that so
>> we could discuss about adding more specification in our CODING_STYLE.
>> Every comment can be stripped out in the final release of the file, but I think
>> that now they are useful for the discussion.
>> 
>> The minimum clang-format version for the file is 15, my ubuntu 22.04 comes with
>> it, we can reason if it's too high, or if we could also use the latest version
>> maybe shipped inside a docker image.
>> 
>> For every [not specified] behavior, I've tried to guess it from the codebase,
>> I've seen that also in that case it's not easy as there is (sometimes) low
>> consistency between modules, so we can discuss on every configurable.
>> 
>> Worth to mention, the public header are all excluded from the format tool,
>> because formatting them breaks the build on X86, because there are scripts for
>> auto-generation that don't handle the formatted headers, I didn't investigate
>> on it, maybe it can be added as technical debt.
>> 
>> So I've tried building arm32, arm64 and x86_64 with the formatted output and
>> they build, I've used Yocto for that.
>> 
>> ## How to try it? ##############################################################
>> 
>> So how to generate everything? Just invoke the codestyle.py script without
>> parameter and it will format every .c and .h file in the hypervisor codebase.
>> 
>> ./xen/scripts/codestyle.py
>> 
>> Optionally you can also pass one or more relative path from the folder you are
>> invoking the script and it will format only them.
> 
> Thanks for the amazing work Luca. I did as described above and generate
> a 9MB diff patch :-)
> 
> I scrolled through it and most of it makes sense but a few things look
> weird. Copy/pasting them here individually not to add a 9MB diff in
> attachment.

:) that’s great! This is exactly the feedback I’m looking for!

> 
>> diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
>> index db5085e15d..398dea92e6 100644
>> --- a/xen/arch/arm/acpi/boot.c
>> +++ b/xen/arch/arm/acpi/boot.c
>> @@ -52,7 +52,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
>> {
>>     int i;
>>     int rc;
>> -    u64 mpidr = processor->arm_mpidr & MPIDR_HWID_MASK;
>> +    u64 mpidr    = processor->arm_mpidr & MPIDR_HWID_MASK;
>>     bool enabled = processor->flags & ACPI_MADT_ENABLED;
>> 
>>     if ( mpidr == MPIDR_INVALID )
> 
> Do we need the = alignment?
> 
> 
> It is causing other weird looking changes:
> 
>>   rsdp = (struct acpi_table_rsdp *)base_ptr;
>>    /* Replace xsdt_physical_address */
>>    rsdp->xsdt_physical_address = tbl_add[TBL_XSDT].start;
>> -    checksum = acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp), table_size);
>> +    checksum       = acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp), table_size);
>>    rsdp->checksum = rsdp->checksum - checksum;
> 
> [...]
> 
>> #ifdef CONFIG_MULTIBOOT
>> -int __init xsm_multiboot_policy_init(
>> -    unsigned long *module_map, const multiboot_info_t *mbi,
>> -    void **policy_buffer, size_t *policy_size)
>> +int __init xsm_multiboot_policy_init(unsigned long *module_map,
>> +                                     const multiboot_info_t *mbi,
>> +                                     void **policy_buffer, size_t *policy_size)
>> {
>>     int i;
>>     module_t *mod = (module_t *)__va(mbi->mods_addr);
>> -    int rc = 0;
>> +    int rc        = 0;
>>     u32 *_policy_start;
>>     unsigned long _policy_len;
> 
> Can we take the = alignment out? Without it, most other things look
> good, at least at first glance.

Alejandro was the first one to point that out, we can disable it by the option
AlignConsecutiveAssignments.

> 
> Also this looks a bit unnecessary, but not a blocker for me:
> 
>> @@ -908,8 +895,7 @@ static int __init efi_check_dt_boot(const EFI_LOADED_IMAGE *loaded_image)
>> }
>> 
>> static void __init efi_arch_cpu(void)
>> -{   
>> -}       
>> +{}
>> 

Also this one can be changed I think by tweaking AllowShortFunctionsOnASingleLine,
so that we can have just: 

static void __init efi_arch_cpu(void) {}

> 
> I think we should:
> - identify "weird" changes like the above
> - discuss them as a group, like we do with MISRA, and decide how to
>  address them
> - once everyone is happy with the new format, come up with a plan on how
>  to merge a 9MB patch

Yes, clearly we will have the release soon, so until that, I can collect all the feedbacks and
organise them so that after the release, when people have more time, we can discuss every
point and identify priorities and outstanding issues.


Cheers,
Luca