[v1] docs/code-provenance: make AI policy clearer and more practical

[RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions

Posted by Paolo Bonzini 6 days, 4 hours ago

Using phrasing from https://openinfra.org/legal/ai-policy (with just
"commit" replaced by "submission", because we do not submit changes
as commits but rather emails), clarify that the maintainer who bestows
their blessing on the AI-generated contribution is not responsible
for its copyright or license status beyond what is required by the
Developer's Certificate of Origin.

[This is not my preferred phrasing.  I would prefer something lighter
like "the "Signed-off-by" label in the contribution gives the author
responsibility".  But for the sake of not reinventing the wheel I am
keeping the exact works from the OpenInfra policy.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index d435ab145cf..a5838f63649 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
 Maintainers are not allow to grant an exception on their own patch
 submissions.
 
+Even after an exception is granted, the "Signed-off-by" label in the
+contribution is a statement that the author takes responsibility for the
+entire contents of the submission, including any parts that were generated
+or assisted by AI tools or other tools.
+
 Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
 ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
 generation agents which are built on top of such tools.
-- 
2.51.0

Re: [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions

Posted by Alex Bennée 6 days, 2 hours ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> Using phrasing from https://openinfra.org/legal/ai-policy (with just
> "commit" replaced by "submission", because we do not submit changes
> as commits but rather emails), clarify that the maintainer who bestows
> their blessing on the AI-generated contribution is not responsible
> for its copyright or license status beyond what is required by the
> Developer's Certificate of Origin.
>
> [This is not my preferred phrasing.  I would prefer something lighter
> like "the "Signed-off-by" label in the contribution gives the author
> responsibility".  But for the sake of not reinventing the wheel I am
> keeping the exact works from the OpenInfra policy.]
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index d435ab145cf..a5838f63649 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
>  Maintainers are not allow to grant an exception on their own patch
>  submissions.
>  
> +Even after an exception is granted, the "Signed-off-by" label in the
> +contribution is a statement that the author takes responsibility for the
> +entire contents of the submission, including any parts that were generated
> +or assisted by AI tools or other tools.
> +

I quite like the LLVM wording which makes expectations clear to the
submitter:

  While the LLVM project has a liberal policy on AI tool use, contributors
  are considered responsible for their contributions. We encourage
  contributors to review all generated code before sending it for review
  to verify its correctness and to understand it so that they can answer
  questions during code review. Reviewing and maintaining generated code
  that the original contributor does not understand is not a good use of
  limited project resources.

It could perhaps be even stronger (must rather than encourage). The key
point to emphasise is we don't want submissions the user of the
generative AI doesn't understand.

While we don't see them because our github lockdown policy auto-closes
PRs we are already seeing a growth in submissions where the authors seem
to have YOLO'd the code generator without really understanding the
changes.

>  Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
>  ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
>  generation agents which are built on top of such tools.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions

Posted by Daniel P. Berrangé 6 days, 2 hours ago

On Mon, Sep 22, 2025 at 02:02:23PM +0100, Alex Bennée wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > Using phrasing from https://openinfra.org/legal/ai-policy (with just
> > "commit" replaced by "submission", because we do not submit changes
> > as commits but rather emails), clarify that the maintainer who bestows
> > their blessing on the AI-generated contribution is not responsible
> > for its copyright or license status beyond what is required by the
> > Developer's Certificate of Origin.
> >
> > [This is not my preferred phrasing.  I would prefer something lighter
> > like "the "Signed-off-by" label in the contribution gives the author
> > responsibility".  But for the sake of not reinventing the wheel I am
> > keeping the exact works from the OpenInfra policy.]
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index d435ab145cf..a5838f63649 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
> >  Maintainers are not allow to grant an exception on their own patch
> >  submissions.
> >  
> > +Even after an exception is granted, the "Signed-off-by" label in the
> > +contribution is a statement that the author takes responsibility for the
> > +entire contents of the submission, including any parts that were generated
> > +or assisted by AI tools or other tools.
> > +
> 
> I quite like the LLVM wording which makes expectations clear to the
> submitter:
> 
>   While the LLVM project has a liberal policy on AI tool use, contributors
>   are considered responsible for their contributions. We encourage
>   contributors to review all generated code before sending it for review
>   to verify its correctness and to understand it so that they can answer
>   questions during code review. Reviewing and maintaining generated code
>   that the original contributor does not understand is not a good use of
>   limited project resources.
> 
> It could perhaps be even stronger (must rather than encourage). The key
> point to emphasise is we don't want submissions the user of the
> generative AI doesn't understand.
> 
> While we don't see them because our github lockdown policy auto-closes
> PRs we are already seeing a growth in submissions where the authors seem
> to have YOLO'd the code generator without really understanding the
> changes.

While I understand where the LLVM maintainers are coming from, IMHO
their proposed policy leaves alot to be desired. 80% of the material
in the policy has nothing to do with AI content. Rather it is stating
the general contribution norms that the project expects to be followed
regardless of what tools may have been used.

I think perhaps alot of the contributions norms are previously informal
and learnt on the job as you gradually acclimatize to participation in
a specific project, or first learning about open source in general.

This reliance on informal norms was always somewhat of a problem, but
it is being supercharged by AI. It is now much more likely to see project
interactions from less experienced people, who are relying on AI tools
to provide a quick on-ramp to the project, bypassing the more gradual
learning experience.

As an example of why the distinction between AI policy and general
contribution policy matters, consider the great many bugs / security
reports we've had based off the output of static analysis tools.

Almost none of this was related to AI, but the people submitting
them often failed on basic expectations such as sanity checking
what the tool claimed, or understanding what they were reporting,
or understanding why they're changing the code they way they did.

If we don't already have our "contribution norms" sufficiently
clearly documented, we should improve that independently of any
AI related policy.  The AI related section in our docs should
merely refer the reader over to our other contribution policies
for anything that isn't directly related to AI.

We do have a gap wrt to bug reporting where I think we should document
an expectation that any use of automated tools in the bug report must
be diclosed, whether those tools are AI or not. This should apply to
any static analysis tool.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[RFC PATCH 1/4] docs/code-provenance: clarify scope very early
[RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
[RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions
[RFC PATCH 4/4] docs/code-provenance: make the exception process feasible