[v1] docs/code-provenance: make AI policy clearer and more practical

[RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent

Posted by Paolo Bonzini 6 days, 4 hours ago

The exception process is a second thought in QEMU's policy for AI-generated
content.  It is not really possible to understand how people want to use
these tools without formalizing it a bit more and encouraging people to
request exceptions if they see a good use for AI-generated content.

Note that right now, in my opinion, the exception process remains
infeasible, because there is no agreement on how to "demonstrate
clarity of the license and copyright status for the tool's output".
This will be sorted out separately.

What is missing: do we want a formal way to identify commits for which an
exception to the AI policy was granted?  The common way to do so seems to
be "Generated-by" or "Assisted-by" but I don't want to turn commit message
into an ad space.  I would lean more towards something like

  AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>

but at the same time I don't want to invent something just for QEMU.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index dba99a26f64..d435ab145cf 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -292,7 +292,8 @@ TL;DR:
 
   **Current QEMU project policy is to DECLINE any contributions which are
   believed to include or derive from AI generated content. This includes
-  ChatGPT, Claude, Copilot, Llama and similar tools.**
+  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
+  requested on a case-by-case basis.**
 
   **This policy does not apply to other uses of AI, such as researching APIs
   or algorithms, static analysis, or debugging, provided their output is not
@@ -322,18 +323,19 @@ How contributors could comply with DCO terms (b) or (c) for the output of AI
 content generators commonly available today is unclear.  The QEMU project is
 not willing or able to accept the legal risks of non-compliance.
 
-The QEMU project thus requires that contributors refrain from using AI content
-generators on patches intended to be submitted to the project, and will
-decline any contribution if use of AI is either known or suspected.
+The QEMU project requires contributors to refrain from using AI content
+generators without going through an exception request process.
+AI-generated code will only be included in the project after the
+exception request has been evaluated by the QEMU project.  To be
+granted an exception, a contributor will need to demonstrate clarity of
+the license and copyright status for the tool's output in relation to its
+training model and code, to the satisfaction of the project maintainers.
 
+Maintainers are not allow to grant an exception on their own patch
+submissions.
 
 Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
 ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
 generation agents which are built on top of such tools.
-
 This policy may evolve as AI tools mature and the legal situation is
-clarifed. In the meanwhile, requests for exceptions to this policy will be
-evaluated by the QEMU project on a case by case basis. To be granted an
-exception, a contributor will need to demonstrate clarity of the license and
-copyright status for the tool's output in relation to its training model and
-code, to the satisfaction of the project maintainers.
+clarified.
-- 
2.51.0

Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent

Posted by Daniel P. Berrangé 6 days, 2 hours ago

On Mon, Sep 22, 2025 at 01:32:17PM +0200, Paolo Bonzini wrote:
> The exception process is a second thought in QEMU's policy for AI-generated
> content.  It is not really possible to understand how people want to use
> these tools without formalizing it a bit more and encouraging people to
> request exceptions if they see a good use for AI-generated content.
> 
> Note that right now, in my opinion, the exception process remains
> infeasible, because there is no agreement on how to "demonstrate
> clarity of the license and copyright status for the tool's output".
> This will be sorted out separately.

FWIW, I considered that the "exception process" would end up
being something like...

 * someone wants to use a particular tool for something they
   believe is compelling
 * they complain on qemu-devel that our policy blocks their
   valid use
 * we debate it
 * if agreed, we add a note to this code-proveance.rst doc to
   allow it

I would imagine that exceptions might fall into two buckets

 * Descriptions of techniques/scenarios for using tools
   that limit the licensing risk
 * Details of specific tools (or more likely models) that
   are judged to have limited licensing risk

it is hard to predict the future though, so this might be
too simplistic. Time will tell when someone starts the
debate...

IOW, my suggestion would be that the document simply tells
people to raise a thread on qemu-devel if they would like
to discuss need for a particular exception, and mention
that any exceptions will be documented in this doc if they
are aggreed upon.

> What is missing: do we want a formal way to identify commits for which an
> exception to the AI policy was granted?  The common way to do so seems to
> be "Generated-by" or "Assisted-by" but I don't want to turn commit message
> into an ad space.  I would lean more towards something like
> 
>   AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>

IMHO the code-provenance.rst doc is what grants the exception, not
any individual person, nor any individual commit.

Whether we want to reference that a given commit is relying on an
exception or not is hard to say at this point as we don't know what
any exception would be like.

Ideally the applicability of an exception could be self-evident
from the commit. Realiyt might be more fuzzy. So if self-evident,
then it likely warrants a sentence two of english text in the
commit to justify its applicability.

IOW, a tag like AI-exception-granted-by doesn't feel like it is
particularly useful.

> 
> but at the same time I don't want to invent something just for QEMU.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index dba99a26f64..d435ab145cf 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -292,7 +292,8 @@ TL;DR:
>  
>    **Current QEMU project policy is to DECLINE any contributions which are
>    believed to include or derive from AI generated content. This includes
> -  ChatGPT, Claude, Copilot, Llama and similar tools.**
> +  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
> +  requested on a case-by-case basis.**

I'm not sure what you mean by 'case-by-case basis' ? I certainly don't
think we should entertain debating use of AI in individual patch series,
as that'll be a never ending burden on reviewer/maintainer resources.

Exceptions should be things that can be applied somewhat generically to
tools, or models or usage scenarios IMHO.

>  
>    **This policy does not apply to other uses of AI, such as researching APIs
>    or algorithms, static analysis, or debugging, provided their output is not
> @@ -322,18 +323,19 @@ How contributors could comply with DCO terms (b) or (c) for the output of AI
>  content generators commonly available today is unclear.  The QEMU project is
>  not willing or able to accept the legal risks of non-compliance.
>  
> -The QEMU project thus requires that contributors refrain from using AI content
> -generators on patches intended to be submitted to the project, and will
> -decline any contribution if use of AI is either known or suspected.
> +The QEMU project requires contributors to refrain from using AI content
> +generators without going through an exception request process.
> +AI-generated code will only be included in the project after the
> +exception request has been evaluated by the QEMU project.  To be
> +granted an exception, a contributor will need to demonstrate clarity of
> +the license and copyright status for the tool's output in relation to its
> +training model and code, to the satisfaction of the project maintainers.
>  
> +Maintainers are not allow to grant an exception on their own patch
> +submissions.
>  
>  Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
>  ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
>  generation agents which are built on top of such tools.
> -
>  This policy may evolve as AI tools mature and the legal situation is
> -clarifed. In the meanwhile, requests for exceptions to this policy will be
> -evaluated by the QEMU project on a case by case basis. To be granted an
> -exception, a contributor will need to demonstrate clarity of the license and
> -copyright status for the tool's output in relation to its training model and
> -code, to the satisfaction of the project maintainers.
> +clarified.

I would suggest only this last paragraph be changed

  This policy may evolve as AI tools mature and the legal situation is
  clarifed.

  Exceptions
  ----------

  The QEMU project welcomes discussion on any exceptions to this policy,
  or more general revisions. This can be done by contacting the qemu-devel
  mailing list with details of a proposed tool / model / usage scenario /
  etc that is beneficial to QEMU, while still mitigating the legal risks
  to the project.

  After discussion, any exceptions that can be relied upon in contributions
  will be listed below. The listing of an exception does not remove the
  need for contributors to comply with all other pre-existing contribution
  requirements, including DCO signoff.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent

Posted by Paolo Bonzini 6 days, 2 hours ago

On 9/22/25 15:24, Daniel P. Berrangé wrote:
> FWIW, I considered that the "exception process" would end up
> being something like...
> 
>   * someone wants to use a particular tool for something they
>     believe is compelling
>   * they complain on qemu-devel that our policy blocks their
>     valid use
>   * we debate it

I guess we're here, except for hiding the complaint behind a patch. :)

>   * if agreed, we add a note to this code-proveance.rst doc to
>     allow it
> 
> 
> I would imagine that exceptions might fall into two buckets
> 
>   * Descriptions of techniques/scenarios for using tools
>     that limit the licensing risk
>   * Details of specific tools (or more likely models) that
>     are judged to have limited licensing risk
>
> it is hard to predict the future though, so this might be
> too simplistic. Time will tell when someone starts the
> debate...

Yeah, I'm afraid it is; allowing specific tools might not be feasible, 
as the scope of "allow Claude Code" or "allow cut and paste for ChatGPT 
chats" is obviously way too large.  Allowing some usage scenarios seems 
more feasible (as done in patch 4).

>> What is missing: do we want a formal way to identify commits for which an
>> exception to the AI policy was granted?  The common way to do so seems to
>> be "Generated-by" or "Assisted-by" but I don't want to turn commit message
>> into an ad space.  I would lean more towards something like
>>
>>    AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>
> 
> IMHO the code-provenance.rst doc is what grants the exception, not
> any individual person, nor any individual commit.
> 
> Whether we want to reference that a given commit is relying on an
> exception or not is hard to say at this point as we don't know what
> any exception would be like.
> 
> Ideally the applicability of an exception could be self-evident
> from the commit. Realiyt might be more fuzzy. So if self-evident,
> then it likely warrants a sentence two of english text in the
> commit to justify its applicability.
> IOW, a tag like AI-exception-granted-by doesn't feel like it is
> particularly useful.

I meant it as more of an audit trail, especially for the case where a 
new submaintainer would prefer to ask someone else, or for the case of a 
maintainer contributing AI-generated code.  If we can keep it simple and 
avoid this, that's fine (it's not even in the policy, only in the commit 
message).

What I do *not* want is Generated-by or Assisted-by.  The exact model or 
tool should matter in deciding whether a contribution fits the 
exception.  Companies tell their employees "you can use this model 
because we have an indemnification contract in place", but I don't think 
we should care about what contracts they have---we have no way to check 
if it's true or if the indemnification extends to QEMU, for example.

>>     **Current QEMU project policy is to DECLINE any contributions which are
>>     believed to include or derive from AI generated content. This includes
>> -  ChatGPT, Claude, Copilot, Llama and similar tools.**
>> +  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
>> +  requested on a case-by-case basis.**
> 
> I'm not sure what you mean by 'case-by-case basis' ? I certainly don't
> think we should entertain debating use of AI in individual patch series,
> as that'll be a never ending burden on reviewer/maintainer resources.
> 
> Exceptions should be things that can be applied somewhat generically to
> tools, or models or usage scenarios IMHO.

I meant that at some point a human will have to agree that it fits the 
exception, but yeah it is not the right place to say that.

> I would suggest only this last paragraph be changed
> 
> 
>    This policy may evolve as AI tools mature and the legal situation is
>    clarifed.
> 
>    Exceptions
>    ----------
> 
>    The QEMU project welcomes discussion on any exceptions to this policy,
>    or more general revisions. This can be done by contacting the qemu-devel
>    mailing list with details of a proposed tool / model / usage scenario /
>    etc that is beneficial to QEMU, while still mitigating the legal risks
>    to the project.
> 
>    After discussion, any exceptions that can be relied upon in contributions
>    will be listed below. The listing of an exception does not remove the
>    need for contributors to comply with all other pre-existing contribution
>    requirements, including DCO signoff.

This sounds good (I'd like to keep the requirement that maintainers ask 
for a second opinion when contributing AI-generated code, but that can 
be weaved into your proposal).  Another benefit is that this phrasing is 
independent of the existence of any exceptions.

I'll split the first three patches into its own non-RFC series, and we 
can keep discussing the "refactoring scenario" in this thread.

Paolo

Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent

Posted by Daniel P. Berrangé 6 days, 1 hour ago

On Mon, Sep 22, 2025 at 03:56:51PM +0200, Paolo Bonzini wrote:
> On 9/22/25 15:24, Daniel P. Berrangé wrote:
> > FWIW, I considered that the "exception process" would end up
> > being something like...
> > 
> >   * someone wants to use a particular tool for something they
> >     believe is compelling
> >   * they complain on qemu-devel that our policy blocks their
> >     valid use
> >   * we debate it
> 
> I guess we're here, except for hiding the complaint behind a patch. :)
> 
> >   * if agreed, we add a note to this code-proveance.rst doc to
> >     allow it
> > 
> > 
> > I would imagine that exceptions might fall into two buckets
> > 
> >   * Descriptions of techniques/scenarios for using tools
> >     that limit the licensing risk
> >   * Details of specific tools (or more likely models) that
> >     are judged to have limited licensing risk
> > 
> > it is hard to predict the future though, so this might be
> > too simplistic. Time will tell when someone starts the
> > debate...
> 
> Yeah, I'm afraid it is; allowing specific tools might not be feasible, as
> the scope of "allow Claude Code" or "allow cut and paste for ChatGPT chats"
> is obviously way too large.  Allowing some usage scenarios seems more
> feasible (as done in patch 4).

Agreed, when I say an exception a tool, I would find it highly
unlikely we would do so for such a highly generic tool as
Claude/ChatGPT. That would effectively be removing all policy
limitations.

Rather I was thinking about the possibility that certain very
specialized tools might appear.

The usage scenarios exception seems the much more likely one
in the near future.

> > > What is missing: do we want a formal way to identify commits for which an
> > > exception to the AI policy was granted?  The common way to do so seems to
> > > be "Generated-by" or "Assisted-by" but I don't want to turn commit message
> > > into an ad space.  I would lean more towards something like
> > > 
> > >    AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>
> > 
> > IMHO the code-provenance.rst doc is what grants the exception, not
> > any individual person, nor any individual commit.
> > 
> > Whether we want to reference that a given commit is relying on an
> > exception or not is hard to say at this point as we don't know what
> > any exception would be like.
> > 
> > Ideally the applicability of an exception could be self-evident
> > from the commit. Realiyt might be more fuzzy. So if self-evident,
> > then it likely warrants a sentence two of english text in the
> > commit to justify its applicability.
> > IOW, a tag like AI-exception-granted-by doesn't feel like it is
> > particularly useful.
> 
> I meant it as more of an audit trail, especially for the case where a new
> submaintainer would prefer to ask someone else, or for the case of a
> maintainer contributing AI-generated code.  If we can keep it simple and
> avoid this, that's fine (it's not even in the policy, only in the commit
> message).

When a maintainer gives an Acked-by or Signed-off-by tag they
are stating the contribution complies with our policies and
that includes this AI policy.

If a maintainer isn't comfortable with the AI exception
applicability they should not give Acked-by/Signed-off-by,
and/or ask another maintainer to give their own NNN-by tag
as a second opinion.

> What I do *not* want is Generated-by or Assisted-by.

Yes, I don't want to see us advertizing commercial products in
git history

>                                                      The exact model or
> tool should matter in deciding whether a contribution fits the exception.
> Companies tell their employees "you can use this model because we have an
> indemnification contract in place", but I don't think we should care about
> what contracts they have---we have no way to check if it's true or if the
> indemnification extends to QEMU, for example.

Employees likely don't have any way to check that either. They'll
just be blindly trusting what little information their employer
provides, if any. We don't want to put our contributions into an
impossible situation wrt determining compliance. It needs to be
practical for them to make a judgement call.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[RFC PATCH 1/4] docs/code-provenance: clarify scope very early
[RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
[RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions
[RFC PATCH 4/4] docs/code-provenance: make the exception process feasible