[v5] docs: define policy forbidding use of "AI" / LLM code generators

[PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Markus Armbruster 7 months, 4 weeks ago

From: Daniel P. Berrangé <berrange@redhat.com>

There has been an explosion of interest in so called AI code
generators. Thus far though, this is has not been matched by a broadly
accepted legal interpretation of the licensing implications for code
generator outputs. While the vendors may claim there is no problem and
a free choice of license is possible, they have an inherent conflict
of interest in promoting this interpretation. More broadly there is,
as yet, no broad consensus on the licensing implications of code
generators trained on inputs under a wide variety of licenses

The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack of
consensus on the licensing of AI code generator output, it is not
considered credible to assert compliance with the DCO clause (b) or (c)
where a patch includes such generated code.

This patch thus defines a policy that the QEMU project will currently
not accept contributions where use of AI code generators is either
known, or suspected.

These are early days of AI-assisted software development. The legal
questions will be resolved eventually. The tools will mature, and we
can expect some to become safely usable in free software projects.
The policy we set now must be for today, and be open to revision. It's
best to start strict and safe, then relax.

Meanwhile requests for exceptions can also be considered on a case by
case basis.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/code-provenance.rst | 55 +++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index c25afed98d..b5aae2e253 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -282,4 +282,57 @@ boilerplate code template which is then filled in to produce the final patch.
 The output of such a tool would still be considered the "preferred format",
 since it is intended to be a foundation for further human authored changes.
 Such tools are acceptable to use, provided there is clearly defined copyright
-and licensing for their output.
+and licensing for their output. Note in particular the caveats applying to AI
+content generators below.
+
+Use of AI content generators
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+  **Current QEMU project policy is to DECLINE any contributions which are
+  believed to include or derive from AI generated content. This includes
+  ChatGPT, Claude, Copilot, Llama and similar tools.**
+
+The increasing prevalence of AI-assisted software development results in a
+number of difficult legal questions and risks for software projects, including
+QEMU.  Of particular concern is content generated by `Large Language Models
+<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the `Developer's Certificate of
+Origin (DCO) <dco>`.
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of content they are contributing to QEMU. With AI
+content generators, the copyright and license status of the output is
+ill-defined with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+How contributors could comply with DCO terms (b) or (c) for the output of AI
+content generators commonly available today is unclear.  The QEMU project is
+not willing or able to accept the legal risks of non-compliance.
+
+The QEMU project thus requires that contributors refrain from using AI content
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+This policy does not apply to other uses of AI, such as researching APIs or
+algorithms, static analysis, or debugging, provided their output is not to be
+included in contributions.
+
+Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
+ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
+generation agents which are built on top of such tools.
+
+This policy may evolve as AI tools mature and the legal situation is
+clarifed. In the meanwhile, requests for exceptions to this policy will be
+evaluated by the QEMU project on a case by case basis. To be granted an
+exception, a contributor will need to demonstrate clarity of the license and
+copyright status for the tool's output in relation to its training model and
+code, to the satisfaction of the project maintainers.
-- 
2.49.0

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.

What is this suspected thing by the way? Suspected by whom? You do not
think this is draconian?

-- 
MST

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Daniel P. Berrangé 7 months, 2 weeks ago

On Thu, Jun 26, 2025 at 02:34:57AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > +The QEMU project thus requires that contributors refrain from using AI content
> > +generators on patches intended to be submitted to the project, and will
> > +decline any contribution if use of AI is either known or suspected.
> 
> What is this suspected thing by the way? Suspected by whom? You do not
> think this is draconian?

Suspected as in, as a reviewer you see obvious signs of LLM slop and
or hallucinations in the contributions, while the contributor has not
declared such.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> From: Daniel P. Berrangé <berrange@redhat.com>
> 
> There has been an explosion of interest in so called AI code
> generators. Thus far though, this is has not been matched by a broadly
> accepted legal interpretation of the licensing implications for code
> generator outputs. While the vendors may claim there is no problem and
> a free choice of license is possible, they have an inherent conflict
> of interest in promoting this interpretation. More broadly there is,
> as yet, no broad consensus on the licensing implications of code
> generators trained on inputs under a wide variety of licenses
> 
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack of
> consensus on the licensing of AI code generator output, it is not
> considered credible to assert compliance with the DCO clause (b) or (c)
> where a patch includes such generated code.
> 
> This patch thus defines a policy that the QEMU project will currently
> not accept contributions where use of AI code generators is either
> known, or suspected.
> 
> These are early days of AI-assisted software development. The legal
> questions will be resolved eventually. The tools will mature, and we
> can expect some to become safely usable in free software projects.
> The policy we set now must be for today, and be open to revision. It's
> best to start strict and safe, then relax.
> 
> Meanwhile requests for exceptions can also be considered on a case by
> case basis.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Sorry about only reacting now, was AFK.

So one usecase that to me seems entirely valid, is refactoring.

For example, change a function prototype, or a structure,
and have an LLM update all callers.

The only part of the patch that is expressive is the
actual change, the rest is a technicality and has IMHO nothing to do with
copyright. LLMs can just do it with no hassle.


Can we soften this to only apply to expressive code?

I feel a lot of cleanups would be enabled by this.


> ---
>  docs/devel/code-provenance.rst | 55 +++++++++++++++++++++++++++++++++-
>  1 file changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index c25afed98d..b5aae2e253 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -282,4 +282,57 @@ boilerplate code template which is then filled in to produce the final patch.
>  The output of such a tool would still be considered the "preferred format",
>  since it is intended to be a foundation for further human authored changes.
>  Such tools are acceptable to use, provided there is clearly defined copyright
> -and licensing for their output.
> +and licensing for their output. Note in particular the caveats applying to AI
> +content generators below.
> +
> +Use of AI content generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions which are
> +  believed to include or derive from AI generated content. This includes
> +  ChatGPT, Claude, Copilot, Llama and similar tools.**
> +
> +The increasing prevalence of AI-assisted software development results in a
> +number of difficult legal questions and risks for software projects, including
> +QEMU.  Of particular concern is content generated by `Large Language Models
> +<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> +
> +The QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the `Developer's Certificate of
> +Origin (DCO) <dco>`.
> +
> +To satisfy the DCO, the patch contributor has to fully understand the
> +copyright and license status of content they are contributing to QEMU. With AI
> +content generators, the copyright and license status of the output is
> +ill-defined with no generally accepted, settled legal foundation.
> +
> +Where the training material is known, it is common for it to include large
> +volumes of material under restrictive licensing/copyright terms. Even where
> +the training material is all known to be under open source licenses, it is
> +likely to be under a variety of terms, not all of which will be compatible
> +with QEMU's licensing requirements.
> +
> +How contributors could comply with DCO terms (b) or (c) for the output of AI
> +content generators commonly available today is unclear.  The QEMU project is
> +not willing or able to accept the legal risks of non-compliance.
> +
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
> +
> +This policy does not apply to other uses of AI, such as researching APIs or
> +algorithms, static analysis, or debugging, provided their output is not to be
> +included in contributions.
> +
> +Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
> +ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
> +generation agents which are built on top of such tools.
> +
> +This policy may evolve as AI tools mature and the legal situation is
> +clarifed. In the meanwhile, requests for exceptions to this policy will be
> +evaluated by the QEMU project on a case by case basis. To be granted an
> +exception, a contributor will need to demonstrate clarity of the license and
> +copyright status for the tool's output in relation to its training model and
> +code, to the satisfaction of the project maintainers.
> -- 
> 2.49.0

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Kevin Wolf 7 months, 2 weeks ago

Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > There has been an explosion of interest in so called AI code
> > generators. Thus far though, this is has not been matched by a broadly
> > accepted legal interpretation of the licensing implications for code
> > generator outputs. While the vendors may claim there is no problem and
> > a free choice of license is possible, they have an inherent conflict
> > of interest in promoting this interpretation. More broadly there is,
> > as yet, no broad consensus on the licensing implications of code
> > generators trained on inputs under a wide variety of licenses
> > 
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> > 
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> > 
> > These are early days of AI-assisted software development. The legal
> > questions will be resolved eventually. The tools will mature, and we
> > can expect some to become safely usable in free software projects.
> > The policy we set now must be for today, and be open to revision. It's
> > best to start strict and safe, then relax.
> > 
> > Meanwhile requests for exceptions can also be considered on a case by
> > case basis.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> 
> Sorry about only reacting now, was AFK.
> 
> So one usecase that to me seems entirely valid, is refactoring.
> 
> For example, change a function prototype, or a structure,
> and have an LLM update all callers.
> 
> The only part of the patch that is expressive is the
> actual change, the rest is a technicality and has IMHO nothing to do with
> copyright. LLMs can just do it with no hassle.
> 
> 
> Can we soften this to only apply to expressive code?
> 
> I feel a lot of cleanups would be enabled by this.

Hasn't refactoring been a (deterministically) solved problem long before
LLMs became capable to do the same with a good enough probability?

Kevin

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Stefan Hajnoczi 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > >
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > >
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > >
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >
> > Sorry about only reacting now, was AFK.
> >
> > So one usecase that to me seems entirely valid, is refactoring.
> >
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> >
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
> >
> >
> > Can we soften this to only apply to expressive code?
> >
> > I feel a lot of cleanups would be enabled by this.
>
> Hasn't refactoring been a (deterministically) solved problem long before
> LLMs became capable to do the same with a good enough probability?

It's easier to describe a desired refactoring to an LLM in natural
language than to figure out the regexes, semantic patches, etc needed
for traditional refactoring tools.

Also, LLMs can perform higher level refactorings that might not be
supported by traditional tools. Things like "split this interface into
callbacks that take a Foo * argument and implement the callbacks for
both a.c and b.c".

I think what Daniel mentioned is a good guide: if it's something that
you think it copyrightable, then avoid it.

Stefan

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > >
> > > > There has been an explosion of interest in so called AI code
> > > > generators. Thus far though, this is has not been matched by a broadly
> > > > accepted legal interpretation of the licensing implications for code
> > > > generator outputs. While the vendors may claim there is no problem and
> > > > a free choice of license is possible, they have an inherent conflict
> > > > of interest in promoting this interpretation. More broadly there is,
> > > > as yet, no broad consensus on the licensing implications of code
> > > > generators trained on inputs under a wide variety of licenses
> > > >
> > > > The DCO requires contributors to assert they have the right to
> > > > contribute under the designated project license. Given the lack of
> > > > consensus on the licensing of AI code generator output, it is not
> > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > where a patch includes such generated code.
> > > >
> > > > This patch thus defines a policy that the QEMU project will currently
> > > > not accept contributions where use of AI code generators is either
> > > > known, or suspected.
> > > >
> > > > These are early days of AI-assisted software development. The legal
> > > > questions will be resolved eventually. The tools will mature, and we
> > > > can expect some to become safely usable in free software projects.
> > > > The policy we set now must be for today, and be open to revision. It's
> > > > best to start strict and safe, then relax.
> > > >
> > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > case basis.
> > > >
> > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > >
> > > Sorry about only reacting now, was AFK.
> > >
> > > So one usecase that to me seems entirely valid, is refactoring.
> > >
> > > For example, change a function prototype, or a structure,
> > > and have an LLM update all callers.
> > >
> > > The only part of the patch that is expressive is the
> > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > copyright. LLMs can just do it with no hassle.
> > >
> > >
> > > Can we soften this to only apply to expressive code?
> > >
> > > I feel a lot of cleanups would be enabled by this.
> >
> > Hasn't refactoring been a (deterministically) solved problem long before
> > LLMs became capable to do the same with a good enough probability?
> 
> It's easier to describe a desired refactoring to an LLM in natural
> language than to figure out the regexes, semantic patches, etc needed
> for traditional refactoring tools.
> 
> Also, LLMs can perform higher level refactorings that might not be
> supported by traditional tools. Things like "split this interface into
> callbacks that take a Foo * argument and implement the callbacks for
> both a.c and b.c".
> 
> I think what Daniel mentioned is a good guide: if it's something that
> you think it copyrightable, then avoid it.
> 
> Stefan

Right. Let's put that in the doc?

-- 
MST

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Daniel P. Berrangé 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 04:49:17PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> > On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > > >
> > > > > There has been an explosion of interest in so called AI code
> > > > > generators. Thus far though, this is has not been matched by a broadly
> > > > > accepted legal interpretation of the licensing implications for code
> > > > > generator outputs. While the vendors may claim there is no problem and
> > > > > a free choice of license is possible, they have an inherent conflict
> > > > > of interest in promoting this interpretation. More broadly there is,
> > > > > as yet, no broad consensus on the licensing implications of code
> > > > > generators trained on inputs under a wide variety of licenses
> > > > >
> > > > > The DCO requires contributors to assert they have the right to
> > > > > contribute under the designated project license. Given the lack of
> > > > > consensus on the licensing of AI code generator output, it is not
> > > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > > where a patch includes such generated code.
> > > > >
> > > > > This patch thus defines a policy that the QEMU project will currently
> > > > > not accept contributions where use of AI code generators is either
> > > > > known, or suspected.
> > > > >
> > > > > These are early days of AI-assisted software development. The legal
> > > > > questions will be resolved eventually. The tools will mature, and we
> > > > > can expect some to become safely usable in free software projects.
> > > > > The policy we set now must be for today, and be open to revision. It's
> > > > > best to start strict and safe, then relax.
> > > > >
> > > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > > case basis.
> > > > >
> > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > > >
> > > > Sorry about only reacting now, was AFK.
> > > >
> > > > So one usecase that to me seems entirely valid, is refactoring.
> > > >
> > > > For example, change a function prototype, or a structure,
> > > > and have an LLM update all callers.
> > > >
> > > > The only part of the patch that is expressive is the
> > > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > > copyright. LLMs can just do it with no hassle.
> > > >
> > > >
> > > > Can we soften this to only apply to expressive code?
> > > >
> > > > I feel a lot of cleanups would be enabled by this.
> > >
> > > Hasn't refactoring been a (deterministically) solved problem long before
> > > LLMs became capable to do the same with a good enough probability?
> > 
> > It's easier to describe a desired refactoring to an LLM in natural
> > language than to figure out the regexes, semantic patches, etc needed
> > for traditional refactoring tools.
> > 
> > Also, LLMs can perform higher level refactorings that might not be
> > supported by traditional tools. Things like "split this interface into
> > callbacks that take a Foo * argument and implement the callbacks for
> > both a.c and b.c".
> > 
> > I think what Daniel mentioned is a good guide: if it's something that
> > you think it copyrightable, then avoid it.
> 
> Right. Let's put that in the doc?

In terms of mitigating risk I think it is better to avoid saying that
explicitly, and be seen to actively encourage acceptance of AI generated
code. The boundary between copyrightable and non-copyrightable code is
always pretty fuzzy and a matter of differing opinions.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Thu, Jun 26, 2025 at 09:18:22AM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 25, 2025 at 04:49:17PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> > > On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > >
> > > > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > > > >
> > > > > > There has been an explosion of interest in so called AI code
> > > > > > generators. Thus far though, this is has not been matched by a broadly
> > > > > > accepted legal interpretation of the licensing implications for code
> > > > > > generator outputs. While the vendors may claim there is no problem and
> > > > > > a free choice of license is possible, they have an inherent conflict
> > > > > > of interest in promoting this interpretation. More broadly there is,
> > > > > > as yet, no broad consensus on the licensing implications of code
> > > > > > generators trained on inputs under a wide variety of licenses
> > > > > >
> > > > > > The DCO requires contributors to assert they have the right to
> > > > > > contribute under the designated project license. Given the lack of
> > > > > > consensus on the licensing of AI code generator output, it is not
> > > > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > > > where a patch includes such generated code.
> > > > > >
> > > > > > This patch thus defines a policy that the QEMU project will currently
> > > > > > not accept contributions where use of AI code generators is either
> > > > > > known, or suspected.
> > > > > >
> > > > > > These are early days of AI-assisted software development. The legal
> > > > > > questions will be resolved eventually. The tools will mature, and we
> > > > > > can expect some to become safely usable in free software projects.
> > > > > > The policy we set now must be for today, and be open to revision. It's
> > > > > > best to start strict and safe, then relax.
> > > > > >
> > > > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > > > case basis.
> > > > > >
> > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > > > >
> > > > > Sorry about only reacting now, was AFK.
> > > > >
> > > > > So one usecase that to me seems entirely valid, is refactoring.
> > > > >
> > > > > For example, change a function prototype, or a structure,
> > > > > and have an LLM update all callers.
> > > > >
> > > > > The only part of the patch that is expressive is the
> > > > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > > > copyright. LLMs can just do it with no hassle.
> > > > >
> > > > >
> > > > > Can we soften this to only apply to expressive code?
> > > > >
> > > > > I feel a lot of cleanups would be enabled by this.
> > > >
> > > > Hasn't refactoring been a (deterministically) solved problem long before
> > > > LLMs became capable to do the same with a good enough probability?
> > > 
> > > It's easier to describe a desired refactoring to an LLM in natural
> > > language than to figure out the regexes, semantic patches, etc needed
> > > for traditional refactoring tools.
> > > 
> > > Also, LLMs can perform higher level refactorings that might not be
> > > supported by traditional tools. Things like "split this interface into
> > > callbacks that take a Foo * argument and implement the callbacks for
> > > both a.c and b.c".
> > > 
> > > I think what Daniel mentioned is a good guide: if it's something that
> > > you think it copyrightable, then avoid it.
> > 
> > Right. Let's put that in the doc?
> 
> In terms of mitigating risk I think it is better to avoid saying that
> explicitly, and be seen to actively encourage acceptance of AI generated
> code. The boundary between copyrightable and non-copyrightable code is
> always pretty fuzzy and a matter of differing opinions.
> 
> With regards,
> Daniel

Well fuzzy is not what this doc does...

-- 
MST

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 10:38:21PM +0200, Kevin Wolf wrote:
> Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > 
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > > 
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > > 
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > > 
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > > 
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > 
> > Sorry about only reacting now, was AFK.
> > 
> > So one usecase that to me seems entirely valid, is refactoring.
> > 
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> > 
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
> > 
> > 
> > Can we soften this to only apply to expressive code?
> > 
> > I feel a lot of cleanups would be enabled by this.
> 
> Hasn't refactoring been a (deterministically) solved problem long before
> LLMs became capable to do the same with a good enough probability?
> 
> Kevin

Interesting.  For example, I recently wanted to refector a bunch of bool
fields to bit flags.  Know of any tool that would do it without major
pain?

-- 
MST

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Daniel P. Berrangé 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > There has been an explosion of interest in so called AI code
> > generators. Thus far though, this is has not been matched by a broadly
> > accepted legal interpretation of the licensing implications for code
> > generator outputs. While the vendors may claim there is no problem and
> > a free choice of license is possible, they have an inherent conflict
> > of interest in promoting this interpretation. More broadly there is,
> > as yet, no broad consensus on the licensing implications of code
> > generators trained on inputs under a wide variety of licenses
> > 
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> > 
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> > 
> > These are early days of AI-assisted software development. The legal
> > questions will be resolved eventually. The tools will mature, and we
> > can expect some to become safely usable in free software projects.
> > The policy we set now must be for today, and be open to revision. It's
> > best to start strict and safe, then relax.
> > 
> > Meanwhile requests for exceptions can also be considered on a case by
> > case basis.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> 
> Sorry about only reacting now, was AFK.
> 
> So one usecase that to me seems entirely valid, is refactoring.
> 
> For example, change a function prototype, or a structure,
> and have an LLM update all callers.
> 
> The only part of the patch that is expressive is the
> actual change, the rest is a technicality and has IMHO nothing to do with
> copyright. LLMs can just do it with no hassle.

Well the policy is defined in terms of requirements to comply with
the DCO, and that implicitly indicates that the code in question
is eligible for copyright protection to begin with.

IOW, if a change is such that it is not considered eligible for
copyright protection, then you can take the view that it is trivially
DCO compliant, whether you wrote the code, an arbitrary 3rd party
wrote the code, or whether an AI wrote the code. 
 
> Can we soften this to only apply to expressive code?
> 
> I feel a lot of cleanups would be enabled by this.

Trying to detail every possible scenario is impractical and would
make the document too onerous for people to read, remember & apply.
It is better to leave it up to the contributor to decide whether a
change is non-copyrightable, than to try to draw that line crudely
in text. Even for refactoring that line will be fuzzy and contextual,
so not a scenario where we should say any use of AI for reactoring
is OK, as that will lull contributors into having a false sense of
acceptibility, rather than being aware of need to question it. 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Michael S. Tsirkin 7 months, 2 weeks ago

On Wed, Jun 25, 2025 at 08:46:54PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > 
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > > 
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > > 
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > > 
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > > 
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > 
> > Sorry about only reacting now, was AFK.
> > 
> > So one usecase that to me seems entirely valid, is refactoring.
> > 
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> > 
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
> 
> Well the policy is defined in terms of requirements to comply with
> the DCO, and that implicitly indicates that the code in question
> is eligible for copyright protection to begin with.
> 
> IOW, if a change is such that it is not considered eligible for
> copyright protection, then you can take the view that it is trivially
> DCO compliant, whether you wrote the code, an arbitrary 3rd party
> wrote the code, or whether an AI wrote the code. 

Exactly. I agree! However the patch states:

+The QEMU project thus requires that contributors refrain from using AI content
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.

and makes no exception for non copyrighteable parts of the patch.

Or do I misunderstand?


> > Can we soften this to only apply to expressive code?
> > 
> > I feel a lot of cleanups would be enabled by this.
> 
> Trying to detail every possible scenario is impractical and would
> make the document too onerous for people to read, remember & apply.
> It is better to leave it up to the contributor to decide whether a
> change is non-copyrightable, than to try to draw that line crudely
> in text. Even for refactoring that line will be fuzzy and contextual,
> so not a scenario where we should say any use of AI for reactoring
> is OK, as that will lull contributors into having a false sense of
> acceptibility, rather than being aware of need to question it. 

Agree again! What worries me is that the patch as posted here does
not make contributors question anything. It just flatly forbids using "AI
content generators".

-- 
MST

Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators

Posted by Markus Armbruster 7 months, 2 weeks ago

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Jun 25, 2025 at 08:46:54PM +0100, Daniel P. Berrangé wrote:
>> On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
>> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
>> > > From: Daniel P. Berrangé <berrange@redhat.com>
>> > > 
>> > > There has been an explosion of interest in so called AI code
>> > > generators. Thus far though, this is has not been matched by a broadly
>> > > accepted legal interpretation of the licensing implications for code
>> > > generator outputs. While the vendors may claim there is no problem and
>> > > a free choice of license is possible, they have an inherent conflict
>> > > of interest in promoting this interpretation. More broadly there is,
>> > > as yet, no broad consensus on the licensing implications of code
>> > > generators trained on inputs under a wide variety of licenses
>> > > 
>> > > The DCO requires contributors to assert they have the right to
>> > > contribute under the designated project license. Given the lack of
>> > > consensus on the licensing of AI code generator output, it is not
>> > > considered credible to assert compliance with the DCO clause (b) or (c)
>> > > where a patch includes such generated code.
>> > > 
>> > > This patch thus defines a policy that the QEMU project will currently
>> > > not accept contributions where use of AI code generators is either
>> > > known, or suspected.
>> > > 
>> > > These are early days of AI-assisted software development. The legal
>> > > questions will be resolved eventually. The tools will mature, and we
>> > > can expect some to become safely usable in free software projects.
>> > > The policy we set now must be for today, and be open to revision. It's
>> > > best to start strict and safe, then relax.
>> > > 
>> > > Meanwhile requests for exceptions can also be considered on a case by
>> > > case basis.
>> > > 
>> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> > 
>> > Sorry about only reacting now, was AFK.
>> > 
>> > So one usecase that to me seems entirely valid, is refactoring.
>> > 
>> > For example, change a function prototype, or a structure,
>> > and have an LLM update all callers.
>> > 
>> > The only part of the patch that is expressive is the
>> > actual change, the rest is a technicality and has IMHO nothing to do with
>> > copyright. LLMs can just do it with no hassle.
>> 
>> Well the policy is defined in terms of requirements to comply with
>> the DCO, and that implicitly indicates that the code in question
>> is eligible for copyright protection to begin with.
>> 
>> IOW, if a change is such that it is not considered eligible for
>> copyright protection, then you can take the view that it is trivially
>> DCO compliant, whether you wrote the code, an arbitrary 3rd party
>> wrote the code, or whether an AI wrote the code. 
>
> Exactly. I agree! However the patch states:
>
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
>
> and makes no exception for non copyrighteable parts of the patch.
>
> Or do I misunderstand?
>
>> > Can we soften this to only apply to expressive code?
>> > 
>> > I feel a lot of cleanups would be enabled by this.
>> 
>> Trying to detail every possible scenario is impractical and would
>> make the document too onerous for people to read, remember & apply.
>> It is better to leave it up to the contributor to decide whether a
>> change is non-copyrightable, than to try to draw that line crudely
>> in text. Even for refactoring that line will be fuzzy and contextual,
>> so not a scenario where we should say any use of AI for reactoring
>> is OK, as that will lull contributors into having a false sense of
>> acceptibility, rather than being aware of need to question it. 
>
> Agree again! What worries me is that the patch as posted here does
> not make contributors question anything. It just flatly forbids using "AI
> content generators".

Only if you stop reading before the last paragraph :)

I agree with Daniel that trying to legislate exceptions is not going to
work.  Instead, we put in this:

    This policy may evolve as AI tools mature and the legal situation is
    clarifed. In the meanwhile, requests for exceptions to this policy will be
    evaluated by the QEMU project on a case by case basis. To be granted an
    exception, a contributor will need to demonstrate clarity of the license and
    copyright status for the tool's output in relation to its training model and
    code, to the satisfaction of the project maintainers.

Last paragraph, i.e. a fairly prominent spot.

If you can make a convinving case that the tool's output is not
copyrightable, I like your chances of being granted an exception.

As always, if you think doc text is insufficiently clear, let's work on
improving it.

[PATCH v5 1/3] docs: introduce dedicated page about code provenance / sign-off
[PATCH v5 2/3] docs: define policy limiting the inclusion of generated files
[PATCH v5 3/3] docs: define policy forbidding use of AI code generators