[PATCH v2 2/3] scripts: validate SPDX license choices

Daniel P. Berrangé posted 3 patches 1 year, 2 months ago
There is a newer version of this series
[PATCH v2 2/3] scripts: validate SPDX license choices
Posted by Daniel P. Berrangé 1 year, 2 months ago
We expect all new code to be contributed with the "GPL-2.0-or-later"
license tag. Divergance is permitted if the new file is derived from
pre-existing code under a different license, whether from elsewhere
in QEMU codebase, or outside.

Issue a warning if the declared license is not "GPL-2.0-or-later",
and an error if the license is not one of the handful of the
expected licenses to prevent unintended proliferation. The warning
asks users to explain their unusual choice of license in the commit
message.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 scripts/checkpatch.pl | 68 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d946121b8e..b507da8e2b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1353,6 +1353,69 @@ sub checkfilename {
 	}
 }
 
+sub checkspdx {
+    my ($file, $expr) = @_;
+
+    # Imported Linux headers probably have SPDX tags, but if they
+    # don't we're not requiring contributors to fix this, as these
+    # files are not expected to be modified locally in QEMU
+    if ($file =~ m,include/standard-headers, ||
+	$file =~ m,linux-headers,) {
+	return;
+    }
+
+    my $origexpr = $expr;
+
+    # Flatten sub-expressions
+    $expr =~ s/\(|\)/ /g;
+    $expr =~ s/OR|AND/ /g;
+
+    # Merge WITH exceptions to the license
+    $expr =~ s/\s+WITH\s+/-WITH-/g;
+
+    # Cull more leading/trailing whitespace
+    $expr =~ s/^\s*//g;
+    $expr =~ s/\s*$//g;
+
+    my @bits = split / +/, $expr;
+
+    my $prefer = "GPL-2.0-or-later";
+    my @valid = qw(
+	LGPL-2.0-or-later
+	LGPL-2.1-or-later
+	GPL-2.0-only
+	LGPL-2.0-only
+	LGPL-2.0-only
+	BSD-2-Clause
+	BSD-3-Clause
+	MIT
+	);
+
+    my $nonpreferred = 0;
+    my @unknown = ();
+    foreach my $bit (@bits) {
+	if ($bit eq $prefer) {
+	    next;
+	}
+	if (grep /^$bit$/, @valid) {
+	    $nonpreferred = 1;
+	} else {
+	    push @unknown, $bit;
+	}
+    }
+    if (@unknown) {
+	ERROR("Saw unacceptable licenses '" . join(',', @unknown) .
+	      "', valid choices for QEMU are:\n" . join("\n", $prefer, @valid));
+    }
+
+    if ($nonpreferred) {
+	WARN("Saw acceptable license '$origexpr' but note '$prefer' is preferred " .
+	     "for new files unless the code is derived from a source with an " .
+	     "existed declared license that must be followed. Please explain " .
+	     "license choice in the commit message");
+    }
+}
+
 sub process {
 	my $filename = shift;
 
@@ -1641,6 +1704,11 @@ sub process {
 		    }
 		}
 
+# Check SPDX-License-Identifier references a permitted license
+		if ($rawline =~ m,SPDX-License-Identifier: (.*?)(\*/)?\s*$,) {
+		    &checkspdx($realfile, $1);
+		}
+
 # Check for wrappage within a valid hunk of the file
 		if ($realcnt != 0 && $line !~ m{^(?:\+|-| |\\ No newline|$)}) {
 			ERROR("patch seems to be corrupt (line wrapped?)\n" .
-- 
2.46.0


Re: [PATCH v2 2/3] scripts: validate SPDX license choices
Posted by Peter Maydell 1 year, 2 months ago
On Tue, 19 Nov 2024 at 11:29, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> We expect all new code to be contributed with the "GPL-2.0-or-later"
> license tag. Divergance is permitted if the new file is derived from

"divergence"

> pre-existing code under a different license, whether from elsewhere
> in QEMU codebase, or outside.
>
> Issue a warning if the declared license is not "GPL-2.0-or-later",
> and an error if the license is not one of the handful of the
> expected licenses to prevent unintended proliferation. The warning
> asks users to explain their unusual choice of license in the commit
> message.

Should we update LICENSE (or something under docs/devel ?) to
state our policy ?

> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  scripts/checkpatch.pl | 68 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index d946121b8e..b507da8e2b 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1353,6 +1353,69 @@ sub checkfilename {
>         }
>  }
>
> +sub checkspdx {
> +    my ($file, $expr) = @_;
> +
> +    # Imported Linux headers probably have SPDX tags, but if they
> +    # don't we're not requiring contributors to fix this, as these
> +    # files are not expected to be modified locally in QEMU
> +    if ($file =~ m,include/standard-headers, ||
> +       $file =~ m,linux-headers,) {
> +       return;
> +    }
> +
> +    my $origexpr = $expr;
> +
> +    # Flatten sub-expressions
> +    $expr =~ s/\(|\)/ /g;
> +    $expr =~ s/OR|AND/ /g;
> +
> +    # Merge WITH exceptions to the license
> +    $expr =~ s/\s+WITH\s+/-WITH-/g;
> +
> +    # Cull more leading/trailing whitespace
> +    $expr =~ s/^\s*//g;
> +    $expr =~ s/\s*$//g;
> +
> +    my @bits = split / +/, $expr;
> +
> +    my $prefer = "GPL-2.0-or-later";
> +    my @valid = qw(
> +       LGPL-2.0-or-later
> +       LGPL-2.1-or-later
> +       GPL-2.0-only
> +       LGPL-2.0-only
> +       LGPL-2.0-only

Lists LGPL-2.0-only twice ? I'm guessing the second should be 2.1.

I'm not sure we really want to allow more LGPL-2.0-only
code...we don't have a reason like we do with GPL-2.0-only
where the reason is "code from the kernel", and I feel like
LGPL-2.0-only is quite rare anyway, and at least sometimes
a mistake where the author meant LGPL-2.1-only or GPL-2.0-only.
But maybe this list should be generous enough to only warn,
not error, for code copied within QEMU.

AFAICT the only code we have that is LGPL-2.0-only is
util/error.c. But that also refers to our COPYING.LIB,
which is LGPL2.1. In 2011, 12 years after the publication
of LGPL2.1, did Anthony Liguori *really* mean to use
LGPL2.0 only? Answers on a postcard :-)

> +       BSD-2-Clause
> +       BSD-3-Clause
> +       MIT
> +       );
> +
> +    my $nonpreferred = 0;
> +    my @unknown = ();
> +    foreach my $bit (@bits) {
> +       if ($bit eq $prefer) {
> +           next;
> +       }
> +       if (grep /^$bit$/, @valid) {
> +           $nonpreferred = 1;
> +       } else {
> +           push @unknown, $bit;
> +       }
> +    }
> +    if (@unknown) {
> +       ERROR("Saw unacceptable licenses '" . join(',', @unknown) .
> +             "', valid choices for QEMU are:\n" . join("\n", $prefer, @valid));
> +    }
> +
> +    if ($nonpreferred) {
> +       WARN("Saw acceptable license '$origexpr' but note '$prefer' is preferred " .
> +            "for new files unless the code is derived from a source with an " .
> +            "existed declared license that must be followed. Please explain " .
> +            "license choice in the commit message");
> +    }
> +}
> +
>  sub process {
>         my $filename = shift;
>
> @@ -1641,6 +1704,11 @@ sub process {
>                     }
>                 }
>
> +# Check SPDX-License-Identifier references a permitted license
> +               if ($rawline =~ m,SPDX-License-Identifier: (.*?)(\*/)?\s*$,) {
> +                   &checkspdx($realfile, $1);
> +               }
> +

The code changes look OK to me.

thanks
-- PMM
Re: [PATCH v2 2/3] scripts: validate SPDX license choices
Posted by Daniel P. Berrangé 1 year, 2 months ago
On Mon, Dec 02, 2024 at 04:41:48PM +0000, Peter Maydell wrote:
> On Tue, 19 Nov 2024 at 11:29, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > We expect all new code to be contributed with the "GPL-2.0-or-later"
> > license tag. Divergance is permitted if the new file is derived from
> 
> "divergence"
> 
> > pre-existing code under a different license, whether from elsewhere
> > in QEMU codebase, or outside.
> >
> > Issue a warning if the declared license is not "GPL-2.0-or-later",
> > and an error if the license is not one of the handful of the
> > expected licenses to prevent unintended proliferation. The warning
> > asks users to explain their unusual choice of license in the commit
> > message.
> 
> Should we update LICENSE (or something under docs/devel ?) to
> state our policy ?

Yeah, we really ought to, i'll have a look at it.

> 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  scripts/checkpatch.pl | 68 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 68 insertions(+)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index d946121b8e..b507da8e2b 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1353,6 +1353,69 @@ sub checkfilename {
> >         }
> >  }
> >
> > +sub checkspdx {
> > +    my ($file, $expr) = @_;
> > +
> > +    # Imported Linux headers probably have SPDX tags, but if they
> > +    # don't we're not requiring contributors to fix this, as these
> > +    # files are not expected to be modified locally in QEMU
> > +    if ($file =~ m,include/standard-headers, ||
> > +       $file =~ m,linux-headers,) {
> > +       return;
> > +    }
> > +
> > +    my $origexpr = $expr;
> > +
> > +    # Flatten sub-expressions
> > +    $expr =~ s/\(|\)/ /g;
> > +    $expr =~ s/OR|AND/ /g;
> > +
> > +    # Merge WITH exceptions to the license
> > +    $expr =~ s/\s+WITH\s+/-WITH-/g;
> > +
> > +    # Cull more leading/trailing whitespace
> > +    $expr =~ s/^\s*//g;
> > +    $expr =~ s/\s*$//g;
> > +
> > +    my @bits = split / +/, $expr;
> > +
> > +    my $prefer = "GPL-2.0-or-later";
> > +    my @valid = qw(
> > +       LGPL-2.0-or-later
> > +       LGPL-2.1-or-later
> > +       GPL-2.0-only
> > +       LGPL-2.0-only
> > +       LGPL-2.0-only
> 
> Lists LGPL-2.0-only twice ? I'm guessing the second should be 2.1.

Opps, indeed 2.1

> I'm not sure we really want to allow more LGPL-2.0-only
> code...we don't have a reason like we do with GPL-2.0-only
> where the reason is "code from the kernel", and I feel like
> LGPL-2.0-only is quite rare anyway, and at least sometimes
> a mistake where the author meant LGPL-2.1-only or GPL-2.0-only.
> But maybe this list should be generous enough to only warn,
> not error, for code copied within QEMU.

Reliably identifying that a patch is merely "copying code within
QEMU" is a non-trivial task. I'm not sure its worth the effort,
given that we always have the option of ignoring the script's
advice if a human knows better.

> AFAICT the only code we have that is LGPL-2.0-only is
> util/error.c. But that also refers to our COPYING.LIB,
> which is LGPL2.1. In 2011, 12 years after the publication
> of LGPL2.1, did Anthony Liguori *really* mean to use
> LGPL2.0 only? Answers on a postcard :-)

I'm fine dropping LGPL2.0-or-later and LGPL2.0-only,
for the very reasons you state.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH v2 2/3] scripts: validate SPDX license choices
Posted by Philippe Mathieu-Daudé 1 year, 1 month ago
Hi Daniel, Peter,

On 2/12/24 17:54, Daniel P. Berrangé wrote:
> On Mon, Dec 02, 2024 at 04:41:48PM +0000, Peter Maydell wrote:
>> On Tue, 19 Nov 2024 at 11:29, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>
>>> We expect all new code to be contributed with the "GPL-2.0-or-later"
>>> license tag. Divergance is permitted if the new file is derived from
>>
>> "divergence"
>>
>>> pre-existing code under a different license, whether from elsewhere
>>> in QEMU codebase, or outside.
>>>
>>> Issue a warning if the declared license is not "GPL-2.0-or-later",
>>> and an error if the license is not one of the handful of the
>>> expected licenses to prevent unintended proliferation. The warning
>>> asks users to explain their unusual choice of license in the commit
>>> message.
>>
>> Should we update LICENSE (or something under docs/devel ?) to
>> state our policy ?
> 
> Yeah, we really ought to, i'll have a look at it.

Could we merge the previous (reviewed) patch (1/3) without having to
wait for a v3?

> 
>>
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> ---
>>>   scripts/checkpatch.pl | 68 +++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 68 insertions(+)
>>>
>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>> index d946121b8e..b507da8e2b 100755
>>> --- a/scripts/checkpatch.pl
>>> +++ b/scripts/checkpatch.pl
>>> @@ -1353,6 +1353,69 @@ sub checkfilename {
>>>          }
>>>   }
>>>
>>> +sub checkspdx {
>>> +    my ($file, $expr) = @_;
>>> +
>>> +    # Imported Linux headers probably have SPDX tags, but if they
>>> +    # don't we're not requiring contributors to fix this, as these
>>> +    # files are not expected to be modified locally in QEMU
>>> +    if ($file =~ m,include/standard-headers, ||
>>> +       $file =~ m,linux-headers,) {
>>> +       return;
>>> +    }
>>> +
>>> +    my $origexpr = $expr;
>>> +
>>> +    # Flatten sub-expressions
>>> +    $expr =~ s/\(|\)/ /g;
>>> +    $expr =~ s/OR|AND/ /g;
>>> +
>>> +    # Merge WITH exceptions to the license
>>> +    $expr =~ s/\s+WITH\s+/-WITH-/g;
>>> +
>>> +    # Cull more leading/trailing whitespace
>>> +    $expr =~ s/^\s*//g;
>>> +    $expr =~ s/\s*$//g;
>>> +
>>> +    my @bits = split / +/, $expr;
>>> +
>>> +    my $prefer = "GPL-2.0-or-later";
>>> +    my @valid = qw(
>>> +       LGPL-2.0-or-later
>>> +       LGPL-2.1-or-later
>>> +       GPL-2.0-only
>>> +       LGPL-2.0-only
>>> +       LGPL-2.0-only
>>
>> Lists LGPL-2.0-only twice ? I'm guessing the second should be 2.1.
> 
> Opps, indeed 2.1
> 
>> I'm not sure we really want to allow more LGPL-2.0-only
>> code...we don't have a reason like we do with GPL-2.0-only
>> where the reason is "code from the kernel", and I feel like
>> LGPL-2.0-only is quite rare anyway, and at least sometimes
>> a mistake where the author meant LGPL-2.1-only or GPL-2.0-only.
>> But maybe this list should be generous enough to only warn,
>> not error, for code copied within QEMU.
> 
> Reliably identifying that a patch is merely "copying code within
> QEMU" is a non-trivial task. I'm not sure its worth the effort,
> given that we always have the option of ignoring the script's
> advice if a human knows better.
> 
>> AFAICT the only code we have that is LGPL-2.0-only is
>> util/error.c. But that also refers to our COPYING.LIB,
>> which is LGPL2.1. In 2011, 12 years after the publication
>> of LGPL2.1, did Anthony Liguori *really* mean to use
>> LGPL2.0 only? Answers on a postcard :-)
> 
> I'm fine dropping LGPL2.0-or-later and LGPL2.0-only,
> for the very reasons you state.
> 
> 
> With regards,
> Daniel