[v5] docs: define policy forbidding use of "AI" / LLM code generators

[PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Markus Armbruster 7 months, 3 weeks ago

More than a year ago, Daniel posted patches to put an AI policy in
writing.  Reception was mostly positive.  A v2 to address feedback
followed with some delay.  But no pull request.

I asked Daniel why, and he told me he was concerned it might go too
far in its interpretation of the DCO requirements.  After a bit of
discussion, I think Daniel's text is basically fine.  The policy it
describes is simple and strict.  Relaxing policy is easier than
tightening it.  I softened the phrasing slightly, addressed open
review comments, and fixed a few minor things I found myself.

Here's Daniel's cover letter for v2:

This patch kicks the hornet's nest of AI / LLM code generators.

With the increasing interest in code generators in recent times,
it is inevitable that QEMU contributions will include AI generated
code. Thus far we have remained silent on the matter. Given that
everyone knows these tools exist, our current position has to be
considered tacit acceptance of the use of AI generated code in QEMU.

The question for the project is whether that is a good position for
QEMU to take or not ?

IANAL, but I like to think I'm reasonably proficient at understanding
open source licensing. I am not inherantly against the use of AI tools,
rather I am anti-risk. I also want to see OSS licenses respected and
complied with.

AFAICT at its current state of (im)maturity the question of licensing
of AI code generator output does not have a broadly accepted / settled
legal position. This is an inherant bias/self-interest from the vendors
promoting their usage, who tend to minimize/dismiss the legal questions.
>From my POV, this puts such tools in a position of elevated legal risk.

Given the fuzziness over the legal position of generated code from
such tools, I don't consider it credible (today) for a contributor
to assert compliance with the DCO terms (b) or (c) (which is a stated
pre-requisite for QEMU accepting patches) when a patch includes (or is
derived from) AI generated code.

By implication, I think that QEMU must (for now) explicitly decline
to (knowingly) accept AI generated code.

Perhaps a few years down the line the legal uncertainty will have
reduced and we can re-evaluate this policy.

Discuss...

Changes in v4 [Markus Armbruster]:
 * PATCH 2:
   - Drop "follow a deterministic process" clause [Peter]

Changes in v4 [Markus Armbruster]:
 * PATCH 1:
   - Revert v3's "known identity", and instead move existing paragraph
     from submitting-a-patch.rst to code-provenance.rst [Philippe]
   - Add a paragraph on recording maintainer modifications [Alex]
 * PATCH 3:
   - Talk about "AI-assisted software development", "AI content
     generators", and "content", not just "AI code generators" and
     "code" [Stefan, Daniel]
   - Fix spelling of Copilot, and mention Claude [Stefan]
   - Fix link text for reference to the DCO
   - Reiterate the policy does not apply to other uses of AI [Stefan,
     Daniel]
   - Add agents to the examples of tools impacted by the policy
     [Daniel]

Changes in v3 [Markus Armbruster]:

 * PATCH 1:
   - Require "known identity" (phrasing stolen from Linux kernel docs)
     [Peter]
   - Clarify use of multiple addresses [Michael]
   - Improve markup
   - Fix a few misspellings
   - Left for later: explain our use of Message-Id: [Alex]
 * PATCH 2:
   - Minor phrasing tweaks and spelling fixes
 * PATCH 3:
   - Don't claim DCO compliance is currently impossible, do point out
     it's unclear how, and that we consider the legal risk not
     acceptable.
   - Stress that the policy is open to revision some more by adding
     "as AI tools mature".  Also rephrase the commit message.
   - Improve markup

Changes in v2 [Daniel Berrangé]:

 * Fix a huge number of typos in docs
 * Clarify that maintainers should still add R-b where relevant, even
   if they are already adding their own S-oB.
 * Clarify situation when contributor re-starts previously abandoned
   work from another contributor.
 * Add info about Suggested-by tag
 * Add new docs section dealing with the broad topic of "generated
   files" (whether code generators or compilers)
 * Simplify the section related to prohibition of AI generated files
   and give further examples of tools considered covered
 * Remove repeated references to "LLM" as a specific technology, just
   use the broad "AI" term, except for one use of LLM as an example.
 * Add note that the policy may evolve if the legal clarity improves
 * Add note that exceptions can be requested on case-by-case basis
   if contributor thinks they can demonstrate a credible copyright
   and licensing status

Daniel P. Berrangé (3):
  docs: introduce dedicated page about code provenance / sign-off
  docs: define policy limiting the inclusion of generated files
  docs: define policy forbidding use of AI code generators

 docs/devel/code-provenance.rst    | 338 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  23 +-
 3 files changed, 341 insertions(+), 21 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

-- 
2.49.0

Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Stefan Hajnoczi 7 months, 2 weeks ago

On Mon, Jun 16, 2025 at 11:22:38AM +0200, Markus Armbruster wrote:
> More than a year ago, Daniel posted patches to put an AI policy in
> writing.  Reception was mostly positive.  A v2 to address feedback
> followed with some delay.  But no pull request.
> 
> I asked Daniel why, and he told me he was concerned it might go too
> far in its interpretation of the DCO requirements.  After a bit of
> discussion, I think Daniel's text is basically fine.  The policy it
> describes is simple and strict.  Relaxing policy is easier than
> tightening it.  I softened the phrasing slightly, addressed open
> review comments, and fixed a few minor things I found myself.
> 
> Here's Daniel's cover letter for v2:
> 
> This patch kicks the hornet's nest of AI / LLM code generators.
> 
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
> 
> The question for the project is whether that is a good position for
> QEMU to take or not ?
> 
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
> 
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> >From my POV, this puts such tools in a position of elevated legal risk.
> 
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
> 
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
> 
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
> 
> Discuss...
> 
> Changes in v4 [Markus Armbruster]:
>  * PATCH 2:
>    - Drop "follow a deterministic process" clause [Peter]
> 
> Changes in v4 [Markus Armbruster]:
>  * PATCH 1:
>    - Revert v3's "known identity", and instead move existing paragraph
>      from submitting-a-patch.rst to code-provenance.rst [Philippe]
>    - Add a paragraph on recording maintainer modifications [Alex]
>  * PATCH 3:
>    - Talk about "AI-assisted software development", "AI content
>      generators", and "content", not just "AI code generators" and
>      "code" [Stefan, Daniel]
>    - Fix spelling of Copilot, and mention Claude [Stefan]
>    - Fix link text for reference to the DCO
>    - Reiterate the policy does not apply to other uses of AI [Stefan,
>      Daniel]
>    - Add agents to the examples of tools impacted by the policy
>      [Daniel]
> 
> Changes in v3 [Markus Armbruster]:
> 
>  * PATCH 1:
>    - Require "known identity" (phrasing stolen from Linux kernel docs)
>      [Peter]
>    - Clarify use of multiple addresses [Michael]
>    - Improve markup
>    - Fix a few misspellings
>    - Left for later: explain our use of Message-Id: [Alex]
>  * PATCH 2:
>    - Minor phrasing tweaks and spelling fixes
>  * PATCH 3:
>    - Don't claim DCO compliance is currently impossible, do point out
>      it's unclear how, and that we consider the legal risk not
>      acceptable.
>    - Stress that the policy is open to revision some more by adding
>      "as AI tools mature".  Also rephrase the commit message.
>    - Improve markup
> 
> Changes in v2 [Daniel Berrangé]:
> 
>  * Fix a huge number of typos in docs
>  * Clarify that maintainers should still add R-b where relevant, even
>    if they are already adding their own S-oB.
>  * Clarify situation when contributor re-starts previously abandoned
>    work from another contributor.
>  * Add info about Suggested-by tag
>  * Add new docs section dealing with the broad topic of "generated
>    files" (whether code generators or compilers)
>  * Simplify the section related to prohibition of AI generated files
>    and give further examples of tools considered covered
>  * Remove repeated references to "LLM" as a specific technology, just
>    use the broad "AI" term, except for one use of LLM as an example.
>  * Add note that the policy may evolve if the legal clarity improves
>  * Add note that exceptions can be requested on case-by-case basis
>    if contributor thinks they can demonstrate a credible copyright
>    and licensing status
> 
> Daniel P. Berrangé (3):
>   docs: introduce dedicated page about code provenance / sign-off
>   docs: define policy limiting the inclusion of generated files
>   docs: define policy forbidding use of AI code generators
> 
>  docs/devel/code-provenance.rst    | 338 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  23 +-
>  3 files changed, 341 insertions(+), 21 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> -- 
> 2.49.0
> 

Thanks, applied:
https://gitlab.com/qemu-project/qemu/-/commits/master

Stefan

Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Stefan Hajnoczi 7 months, 2 weeks ago

On Mon, Jun 16, 2025 at 5:27 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> More than a year ago, Daniel posted patches to put an AI policy in
> writing.  Reception was mostly positive.  A v2 to address feedback
> followed with some delay.  But no pull request.
>
> I asked Daniel why, and he told me he was concerned it might go too
> far in its interpretation of the DCO requirements.  After a bit of
> discussion, I think Daniel's text is basically fine.  The policy it
> describes is simple and strict.  Relaxing policy is easier than
> tightening it.  I softened the phrasing slightly, addressed open
> review comments, and fixed a few minor things I found myself.
>
> Here's Daniel's cover letter for v2:
>
> This patch kicks the hornet's nest of AI / LLM code generators.
>
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
>
> The question for the project is whether that is a good position for
> QEMU to take or not ?
>
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
>
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> >From my POV, this puts such tools in a position of elevated legal risk.
>
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
>
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
>
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
>
> Discuss...

Any final comments before I merge this?

Stefan

>
> Changes in v4 [Markus Armbruster]:
>  * PATCH 2:
>    - Drop "follow a deterministic process" clause [Peter]
>
> Changes in v4 [Markus Armbruster]:
>  * PATCH 1:
>    - Revert v3's "known identity", and instead move existing paragraph
>      from submitting-a-patch.rst to code-provenance.rst [Philippe]
>    - Add a paragraph on recording maintainer modifications [Alex]
>  * PATCH 3:
>    - Talk about "AI-assisted software development", "AI content
>      generators", and "content", not just "AI code generators" and
>      "code" [Stefan, Daniel]
>    - Fix spelling of Copilot, and mention Claude [Stefan]
>    - Fix link text for reference to the DCO
>    - Reiterate the policy does not apply to other uses of AI [Stefan,
>      Daniel]
>    - Add agents to the examples of tools impacted by the policy
>      [Daniel]
>
> Changes in v3 [Markus Armbruster]:
>
>  * PATCH 1:
>    - Require "known identity" (phrasing stolen from Linux kernel docs)
>      [Peter]
>    - Clarify use of multiple addresses [Michael]
>    - Improve markup
>    - Fix a few misspellings
>    - Left for later: explain our use of Message-Id: [Alex]
>  * PATCH 2:
>    - Minor phrasing tweaks and spelling fixes
>  * PATCH 3:
>    - Don't claim DCO compliance is currently impossible, do point out
>      it's unclear how, and that we consider the legal risk not
>      acceptable.
>    - Stress that the policy is open to revision some more by adding
>      "as AI tools mature".  Also rephrase the commit message.
>    - Improve markup
>
> Changes in v2 [Daniel Berrangé]:
>
>  * Fix a huge number of typos in docs
>  * Clarify that maintainers should still add R-b where relevant, even
>    if they are already adding their own S-oB.
>  * Clarify situation when contributor re-starts previously abandoned
>    work from another contributor.
>  * Add info about Suggested-by tag
>  * Add new docs section dealing with the broad topic of "generated
>    files" (whether code generators or compilers)
>  * Simplify the section related to prohibition of AI generated files
>    and give further examples of tools considered covered
>  * Remove repeated references to "LLM" as a specific technology, just
>    use the broad "AI" term, except for one use of LLM as an example.
>  * Add note that the policy may evolve if the legal clarity improves
>  * Add note that exceptions can be requested on case-by-case basis
>    if contributor thinks they can demonstrate a credible copyright
>    and licensing status
>
> Daniel P. Berrangé (3):
>   docs: introduce dedicated page about code provenance / sign-off
>   docs: define policy limiting the inclusion of generated files
>   docs: define policy forbidding use of AI code generators
>
>  docs/devel/code-provenance.rst    | 338 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  23 +-
>  3 files changed, 341 insertions(+), 21 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
>
> --
> 2.49.0
>
>

Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Alex Bennée 7 months, 2 weeks ago

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Mon, Jun 16, 2025 at 5:27 AM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> More than a year ago, Daniel posted patches to put an AI policy in
>> writing.  Reception was mostly positive.  A v2 to address feedback
>> followed with some delay.  But no pull request.
>>
>> I asked Daniel why, and he told me he was concerned it might go too
>> far in its interpretation of the DCO requirements.  After a bit of
>> discussion, I think Daniel's text is basically fine.  The policy it
>> describes is simple and strict.  Relaxing policy is easier than
>> tightening it.  I softened the phrasing slightly, addressed open
>> review comments, and fixed a few minor things I found myself.
>>
>> Here's Daniel's cover letter for v2:
>>
>> This patch kicks the hornet's nest of AI / LLM code generators.
>>
>> With the increasing interest in code generators in recent times,
>> it is inevitable that QEMU contributions will include AI generated
>> code. Thus far we have remained silent on the matter. Given that
>> everyone knows these tools exist, our current position has to be
>> considered tacit acceptance of the use of AI generated code in QEMU.
>>
>> The question for the project is whether that is a good position for
>> QEMU to take or not ?
>>
>> IANAL, but I like to think I'm reasonably proficient at understanding
>> open source licensing. I am not inherantly against the use of AI tools,
>> rather I am anti-risk. I also want to see OSS licenses respected and
>> complied with.
>>
>> AFAICT at its current state of (im)maturity the question of licensing
>> of AI code generator output does not have a broadly accepted / settled
>> legal position. This is an inherant bias/self-interest from the vendors
>> promoting their usage, who tend to minimize/dismiss the legal questions.
>> >From my POV, this puts such tools in a position of elevated legal risk.
>>
>> Given the fuzziness over the legal position of generated code from
>> such tools, I don't consider it credible (today) for a contributor
>> to assert compliance with the DCO terms (b) or (c) (which is a stated
>> pre-requisite for QEMU accepting patches) when a patch includes (or is
>> derived from) AI generated code.
>>
>> By implication, I think that QEMU must (for now) explicitly decline
>> to (knowingly) accept AI generated code.
>>
>> Perhaps a few years down the line the legal uncertainty will have
>> reduced and we can re-evaluate this policy.
>>
>> Discuss...
>
> Any final comments before I merge this?

It's well reviewed lets get it merged.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Markus Armbruster 7 months, 2 weeks ago

Alex Bennée <alex.bennee@linaro.org> writes:

> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> Any final comments before I merge this?
>
> It's well reviewed lets get it merged.

Stefan, would you like a PR from me?

Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators

Posted by Stefan Hajnoczi 7 months, 2 weeks ago

On Tue, Jun 24, 2025 at 1:02 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >
> >> Any final comments before I merge this?
> >
> > It's well reviewed lets get it merged.
>
> Stefan, would you like a PR from me?

No, that won't be necessary. I will merge the series directly.

Stefan