From: Daniel P. Berrangé <berrange@redhat.com>
There has been an explosion of interest in so called AI code
generators. Thus far though, this is has not been matched by a broadly
accepted legal interpretation of the licensing implications for code
generator outputs. While the vendors may claim there is no problem and
a free choice of license is possible, they have an inherent conflict
of interest in promoting this interpretation. More broadly there is,
as yet, no broad consensus on the licensing implications of code
generators trained on inputs under a wide variety of licenses
The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack of
consensus on the licensing of AI code generator output, it is not
considered credible to assert compliance with the DCO clause (b) or (c)
where a patch includes such generated code.
This patch thus defines a policy that the QEMU project will currently
not accept contributions where use of AI code generators is either
known, or suspected.
These are early days of AI-assisted software development. The legal
questions will be resolved eventually. The tools will mature, and we
can expect some to become safely usable in free software projects.
The policy we set now must be for today, and be open to revision. It's
best to start strict and safe, then relax.
Meanwhile requests for exceptions can also be considered on a case by
case basis.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index c27d8fe649..261263cfba 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch.
The output of such a tool would still be considered the "preferred format",
since it is intended to be a foundation for further human authored changes.
Such tools are acceptable to use, provided they follow a deterministic process
-and there is clearly defined copyright and licensing for their output.
+and there is clearly defined copyright and licensing for their output. Note
+in particular the caveats applying to AI code generators below.
+
+Use of AI code generators
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+ **Current QEMU project policy is to DECLINE any contributions which are
+ believed to include or derive from AI generated code. This includes ChatGPT,
+ CoPilot, Llama and similar tools**
+
+The increasing prevalence of AI code generators, most notably but not limited
+to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
+(LLMs) results in a number of difficult legal questions and risks for software
+projects, including QEMU.
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the dco_ (DCO).
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of code they are contributing to QEMU. With AI
+code generators, the copyright and license status of the output is ill-defined
+with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+How contributors could comply with DCO terms (b) or (c) for the output of AI
+code generators commonly available today is unclear. The QEMU project is not
+willing or able to accept the legal risks of non-compliance.
+
+The QEMU project thus requires that contributors refrain from using AI code
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+Examples of tools impacted by this policy includes both GitHub's CoPilot,
+OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
+well known.
+
+This policy may evolve as AI tools mature and the legal situation is
+clarifed. In the meanwhile, requests for exceptions to this policy will be
+evaluated by the QEMU project on a case by case basis. To be granted an
+exception, a contributor will need to demonstrate clarity of the license and
+copyright status for the tool's output in relation to its training model and
+code, to the satisfaction of the project maintainers.
--
2.48.1
On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: > > From: Daniel P. Berrangé <berrange@redhat.com> > > There has been an explosion of interest in so called AI code > generators. Thus far though, this is has not been matched by a broadly > accepted legal interpretation of the licensing implications for code > generator outputs. While the vendors may claim there is no problem and > a free choice of license is possible, they have an inherent conflict > of interest in promoting this interpretation. More broadly there is, > as yet, no broad consensus on the licensing implications of code > generators trained on inputs under a wide variety of licenses > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack of > consensus on the licensing of AI code generator output, it is not > considered credible to assert compliance with the DCO clause (b) or (c) > where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will currently > not accept contributions where use of AI code generators is either > known, or suspected. > > These are early days of AI-assisted software development. The legal > questions will be resolved eventually. The tools will mature, and we > can expect some to become safely usable in free software projects. > The policy we set now must be for today, and be open to revision. It's > best to start strict and safe, then relax. > > Meanwhile requests for exceptions can also be considered on a case by > case basis. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > Acked-by: Stefan Hajnoczi <stefanha@gmail.com> > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index c27d8fe649..261263cfba 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch. > The output of such a tool would still be considered the "preferred format", > since it is intended to be a foundation for further human authored changes. > Such tools are acceptable to use, provided they follow a deterministic process > -and there is clearly defined copyright and licensing for their output. > +and there is clearly defined copyright and licensing for their output. Note > +in particular the caveats applying to AI code generators below. > + > +Use of AI code generators > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions which are > + believed to include or derive from AI generated code. This includes ChatGPT, > + CoPilot, Llama and similar tools** GitHub spells it "Copilot". Claude is very popular for coding at the moment and probably worth mentioning. > + > +The increasing prevalence of AI code generators, most notably but not limited More detail is needed on what an "AI code generator" is. Coding assistant tools range from autocompletion to linters to automatic code generators. In addition there are other AI-related tools like ChatGPT or Gemini as a chatbot that can people use like Stackoverflow or an API documentation summarizer. I think the intent is to say: do not put code that comes from _any_ AI tool into QEMU. It would be okay to use AI to research APIs, algorithms, brainstorm ideas, debug the code, analyze the code, etc but the actual code changes must not be generated by AI. > +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ > +(LLMs) results in a number of difficult legal questions and risks for software > +projects, including QEMU. > + > +The QEMU community requires that contributors certify their patch submissions > +are made in accordance with the rules of the dco_ (DCO). > + > +To satisfy the DCO, the patch contributor has to fully understand the > +copyright and license status of code they are contributing to QEMU. With AI > +code generators, the copyright and license status of the output is ill-defined > +with no generally accepted, settled legal foundation. > + > +Where the training material is known, it is common for it to include large > +volumes of material under restrictive licensing/copyright terms. Even where > +the training material is all known to be under open source licenses, it is > +likely to be under a variety of terms, not all of which will be compatible > +with QEMU's licensing requirements. > + > +How contributors could comply with DCO terms (b) or (c) for the output of AI > +code generators commonly available today is unclear. The QEMU project is not > +willing or able to accept the legal risks of non-compliance. > + > +The QEMU project thus requires that contributors refrain from using AI code > +generators on patches intended to be submitted to the project, and will > +decline any contribution if use of AI is either known or suspected. > + > +Examples of tools impacted by this policy includes both GitHub's CoPilot, Copilot > +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less > +well known. > + > +This policy may evolve as AI tools mature and the legal situation is > +clarifed. In the meanwhile, requests for exceptions to this policy will be > +evaluated by the QEMU project on a case by case basis. To be granted an > +exception, a contributor will need to demonstrate clarity of the license and > +copyright status for the tool's output in relation to its training model and > +code, to the satisfaction of the project maintainers. > -- > 2.48.1 >
On Tue, Jun 03, 2025 at 02:25:42PM -0400, Stefan Hajnoczi wrote: > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: > > > > From: Daniel P. Berrangé <berrange@redhat.com> > > > > There has been an explosion of interest in so called AI code > > generators. Thus far though, this is has not been matched by a broadly > > accepted legal interpretation of the licensing implications for code > > generator outputs. While the vendors may claim there is no problem and > > a free choice of license is possible, they have an inherent conflict > > of interest in promoting this interpretation. More broadly there is, > > as yet, no broad consensus on the licensing implications of code > > generators trained on inputs under a wide variety of licenses > > > > The DCO requires contributors to assert they have the right to > > contribute under the designated project license. Given the lack of > > consensus on the licensing of AI code generator output, it is not > > considered credible to assert compliance with the DCO clause (b) or (c) > > where a patch includes such generated code. > > > > This patch thus defines a policy that the QEMU project will currently > > not accept contributions where use of AI code generators is either > > known, or suspected. > > > > These are early days of AI-assisted software development. The legal > > questions will be resolved eventually. The tools will mature, and we > > can expect some to become safely usable in free software projects. > > The policy we set now must be for today, and be open to revision. It's > > best to start strict and safe, then relax. > > > > Meanwhile requests for exceptions can also be considered on a case by > > case basis. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > Acked-by: Stefan Hajnoczi <stefanha@gmail.com> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > > --- > > docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++- > > 1 file changed, 49 insertions(+), 1 deletion(-) > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index c27d8fe649..261263cfba 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch. > > The output of such a tool would still be considered the "preferred format", > > since it is intended to be a foundation for further human authored changes. > > Such tools are acceptable to use, provided they follow a deterministic process > > -and there is clearly defined copyright and licensing for their output. > > +and there is clearly defined copyright and licensing for their output. Note > > +in particular the caveats applying to AI code generators below. > > + > > +Use of AI code generators > > +~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +TL;DR: > > + > > + **Current QEMU project policy is to DECLINE any contributions which are > > + believed to include or derive from AI generated code. This includes ChatGPT, > > + CoPilot, Llama and similar tools** > > GitHub spells it "Copilot". > > Claude is very popular for coding at the moment and probably worth mentioning. > > > + > > +The increasing prevalence of AI code generators, most notably but not limited > > More detail is needed on what an "AI code generator" is. Coding > assistant tools range from autocompletion to linters to automatic code > generators. In addition there are other AI-related tools like ChatGPT > or Gemini as a chatbot that can people use like Stackoverflow or an > API documentation summarizer. > > I think the intent is to say: do not put code that comes from _any_ AI > tool into QEMU. Right, the intent is that any copyrightable portion of a commit must not have come directly from an AI/LLM tool, or from an agent which indirectly/internally uses an AI/LLM tool. "code generator" is possibly a little overly specific, as this is really about any type of tool which emits content that will make its way into qemu.git, whether code or non-code content (docs, images, etc). > It would be okay to use AI to research APIs, algorithms, brainstorm > ideas, debug the code, analyze the code, etc but the actual code > changes must not be generated by AI. Mostly yes - there's a fuzzy boundary in the debug/analyze use cases, if the tool is also suggesting code changes to fix issues. If the scope of the suggested changes meets the threshold for being (likely) copyrightable code, that would fall under the policy. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Stefan Hajnoczi <stefanha@gmail.com> writes: > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: >> >> From: Daniel P. Berrangé <berrange@redhat.com> >> >> There has been an explosion of interest in so called AI code >> generators. Thus far though, this is has not been matched by a broadly >> accepted legal interpretation of the licensing implications for code >> generator outputs. While the vendors may claim there is no problem and >> a free choice of license is possible, they have an inherent conflict >> of interest in promoting this interpretation. More broadly there is, >> as yet, no broad consensus on the licensing implications of code >> generators trained on inputs under a wide variety of licenses >> >> The DCO requires contributors to assert they have the right to >> contribute under the designated project license. Given the lack of >> consensus on the licensing of AI code generator output, it is not >> considered credible to assert compliance with the DCO clause (b) or (c) >> where a patch includes such generated code. >> >> This patch thus defines a policy that the QEMU project will currently >> not accept contributions where use of AI code generators is either >> known, or suspected. >> >> These are early days of AI-assisted software development. The legal >> questions will be resolved eventually. The tools will mature, and we >> can expect some to become safely usable in free software projects. >> The policy we set now must be for today, and be open to revision. It's >> best to start strict and safe, then relax. >> >> Meanwhile requests for exceptions can also be considered on a case by >> case basis. >> >> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >> Acked-by: Stefan Hajnoczi <stefanha@gmail.com> >> Reviewed-by: Kevin Wolf <kwolf@redhat.com> >> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> --- >> docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++- >> 1 file changed, 49 insertions(+), 1 deletion(-) >> >> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst >> index c27d8fe649..261263cfba 100644 >> --- a/docs/devel/code-provenance.rst >> +++ b/docs/devel/code-provenance.rst >> @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch. >> The output of such a tool would still be considered the "preferred format", >> since it is intended to be a foundation for further human authored changes. >> Such tools are acceptable to use, provided they follow a deterministic process >> -and there is clearly defined copyright and licensing for their output. >> +and there is clearly defined copyright and licensing for their output. Note >> +in particular the caveats applying to AI code generators below. >> + >> +Use of AI code generators >> +~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +TL;DR: >> + >> + **Current QEMU project policy is to DECLINE any contributions which are >> + believed to include or derive from AI generated code. This includes ChatGPT, >> + CoPilot, Llama and similar tools** > > GitHub spells it "Copilot". I'll fix it. > Claude is very popular for coding at the moment and probably worth mentioning. Will do. >> + >> +The increasing prevalence of AI code generators, most notably but not limited > > More detail is needed on what an "AI code generator" is. Coding > assistant tools range from autocompletion to linters to automatic code > generators. In addition there are other AI-related tools like ChatGPT > or Gemini as a chatbot that can people use like Stackoverflow or an > API documentation summarizer. > > I think the intent is to say: do not put code that comes from _any_ AI > tool into QEMU. > > It would be okay to use AI to research APIs, algorithms, brainstorm > ideas, debug the code, analyze the code, etc but the actual code > changes must not be generated by AI. The existing text is about "AI code generators". However, the "most notably LLMs" that follows it could lead readers to believe it's about more than just code generation, because LLMs are in fact used for more. I figure this is your concern. We could instead start wide, then narrow the focus to code generation. Here's my try: The increasing prevalence of AI-assisted software development results in a number of difficult legal questions and risks for software projects, including QEMU. Of particular concern is code generated by `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). If we want to mention uses of AI we consider okay, I'd do so further down, to not distract from the main point here. Perhaps: The QEMU project thus requires that contributors refrain from using AI code generators on patches intended to be submitted to the project, and will decline any contribution if use of AI is either known or suspected. This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging. Examples of tools impacted by this policy includes both GitHub's CoPilot, OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less well known. The paragraph in the middle is new, the other two are unchanged. Thoughts? >> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ >> +(LLMs) results in a number of difficult legal questions and risks for software >> +projects, including QEMU. Thanks! [...]
On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: > >> > >> From: Daniel P. Berrangé <berrange@redhat.com> >> + > >> +The increasing prevalence of AI code generators, most notably but not limited > > > > More detail is needed on what an "AI code generator" is. Coding > > assistant tools range from autocompletion to linters to automatic code > > generators. In addition there are other AI-related tools like ChatGPT > > or Gemini as a chatbot that can people use like Stackoverflow or an > > API documentation summarizer. > > > > I think the intent is to say: do not put code that comes from _any_ AI > > tool into QEMU. > > > > It would be okay to use AI to research APIs, algorithms, brainstorm > > ideas, debug the code, analyze the code, etc but the actual code > > changes must not be generated by AI. The scope of the policy is around contributions we receive as patches with SoB. Researching / brainstorming / analysis etc are not contribution activities, so not covered by the policy IMHO. > > The existing text is about "AI code generators". However, the "most > notably LLMs" that follows it could lead readers to believe it's about > more than just code generation, because LLMs are in fact used for more. > I figure this is your concern. > > We could instead start wide, then narrow the focus to code generation. > Here's my try: > > The increasing prevalence of AI-assisted software development results > in a number of difficult legal questions and risks for software > projects, including QEMU. Of particular concern is code generated by > `Large Language Models > <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). Documentation we maintain has the same concerns as code. So I'd suggest to substitute 'code' with 'code / content'. > If we want to mention uses of AI we consider okay, I'd do so further > down, to not distract from the main point here. Perhaps: > > The QEMU project thus requires that contributors refrain from using AI code > generators on patches intended to be submitted to the project, and will > decline any contribution if use of AI is either known or suspected. > > This policy does not apply to other uses of AI, such as researching APIs or > algorithms, static analysis, or debugging. > > Examples of tools impacted by this policy includes both GitHub's CoPilot, > OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less > well known. > > The paragraph in the middle is new, the other two are unchanged. > > Thoughts? IMHO its redundant, as the policy is expressly around contribution of code/content, and those activities as not contribution related, so outside the scope already. > > >> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ > >> +(LLMs) results in a number of difficult legal questions and risks for software > >> +projects, including QEMU. > > Thanks! > > [...] > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: >> Stefan Hajnoczi <stefanha@gmail.com> writes: >> >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: >> >> >> >> From: Daniel P. Berrangé <berrange@redhat.com> > >> + >> >> +The increasing prevalence of AI code generators, most notably but not limited >> > >> > More detail is needed on what an "AI code generator" is. Coding >> > assistant tools range from autocompletion to linters to automatic code >> > generators. In addition there are other AI-related tools like ChatGPT >> > or Gemini as a chatbot that can people use like Stackoverflow or an >> > API documentation summarizer. >> > >> > I think the intent is to say: do not put code that comes from _any_ AI >> > tool into QEMU. >> > >> > It would be okay to use AI to research APIs, algorithms, brainstorm >> > ideas, debug the code, analyze the code, etc but the actual code >> > changes must not be generated by AI. > > The scope of the policy is around contributions we receive as > patches with SoB. Researching / brainstorming / analysis etc > are not contribution activities, so not covered by the policy > IMHO. Yes. More below. >> The existing text is about "AI code generators". However, the "most >> notably LLMs" that follows it could lead readers to believe it's about >> more than just code generation, because LLMs are in fact used for more. >> I figure this is your concern. >> >> We could instead start wide, then narrow the focus to code generation. >> Here's my try: >> >> The increasing prevalence of AI-assisted software development results >> in a number of difficult legal questions and risks for software >> projects, including QEMU. Of particular concern is code generated by >> `Large Language Models >> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > > Documentation we maintain has the same concerns as code. > So I'd suggest to substitute 'code' with 'code / content'. Makes sense, thanks! >> If we want to mention uses of AI we consider okay, I'd do so further >> down, to not distract from the main point here. Perhaps: >> >> The QEMU project thus requires that contributors refrain from using AI code >> generators on patches intended to be submitted to the project, and will >> decline any contribution if use of AI is either known or suspected. >> >> This policy does not apply to other uses of AI, such as researching APIs or >> algorithms, static analysis, or debugging. >> >> Examples of tools impacted by this policy includes both GitHub's CoPilot, >> OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less >> well known. >> >> The paragraph in the middle is new, the other two are unchanged. >> >> Thoughts? > > IMHO its redundant, as the policy is expressly around contribution of > code/content, and those activities as not contribution related, so > outside the scope already. The very first paragraph in this file already set the scope: "provenance of patch submissions [...] to the project", so you have a point here. But does repeating the scope here hurt or help? >> >> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ >> >> +(LLMs) results in a number of difficult legal questions and risks for software >> >> +projects, including QEMU. >> >> Thanks! >> >> [...] >> > > With regards, > Daniel
On Wed, Jun 04, 2025 at 10:58:38AM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: > >> Stefan Hajnoczi <stefanha@gmail.com> writes: > >> > >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: > >> >> > >> >> From: Daniel P. Berrangé <berrange@redhat.com> > > >> + > >> >> +The increasing prevalence of AI code generators, most notably but not limited > >> > > >> > More detail is needed on what an "AI code generator" is. Coding > >> > assistant tools range from autocompletion to linters to automatic code > >> > generators. In addition there are other AI-related tools like ChatGPT > >> > or Gemini as a chatbot that can people use like Stackoverflow or an > >> > API documentation summarizer. > >> > > >> > I think the intent is to say: do not put code that comes from _any_ AI > >> > tool into QEMU. > >> > > >> > It would be okay to use AI to research APIs, algorithms, brainstorm > >> > ideas, debug the code, analyze the code, etc but the actual code > >> > changes must not be generated by AI. > > > > The scope of the policy is around contributions we receive as > > patches with SoB. Researching / brainstorming / analysis etc > > are not contribution activities, so not covered by the policy > > IMHO. > > Yes. More below. > > >> The existing text is about "AI code generators". However, the "most > >> notably LLMs" that follows it could lead readers to believe it's about > >> more than just code generation, because LLMs are in fact used for more. > >> I figure this is your concern. > >> > >> We could instead start wide, then narrow the focus to code generation. > >> Here's my try: > >> > >> The increasing prevalence of AI-assisted software development results > >> in a number of difficult legal questions and risks for software > >> projects, including QEMU. Of particular concern is code generated by > >> `Large Language Models > >> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > > > > Documentation we maintain has the same concerns as code. > > So I'd suggest to substitute 'code' with 'code / content'. > > Makes sense, thanks! > > >> If we want to mention uses of AI we consider okay, I'd do so further > >> down, to not distract from the main point here. Perhaps: > >> > >> The QEMU project thus requires that contributors refrain from using AI code > >> generators on patches intended to be submitted to the project, and will > >> decline any contribution if use of AI is either known or suspected. > >> > >> This policy does not apply to other uses of AI, such as researching APIs or > >> algorithms, static analysis, or debugging. > >> > >> Examples of tools impacted by this policy includes both GitHub's CoPilot, > >> OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less > >> well known. > >> > >> The paragraph in the middle is new, the other two are unchanged. > >> > >> Thoughts? > > > > IMHO its redundant, as the policy is expressly around contribution of > > code/content, and those activities as not contribution related, so > > outside the scope already. > > The very first paragraph in this file already set the scope: "provenance > of patch submissions [...] to the project", so you have a point here. > But does repeating the scope here hurt or help? I guess it probably doesn't hurt to have it. Perhaps tweak to This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not to be included in contributions. and for the last paragraph remove 'both' and add a tailer Examples of tools impacted by this policy include GitHub's CoPilot, OpenAI's ChatGPT, and Meta's Code Llama (amongst many others which are less well known), and code/content generation agents which are built on top of such tools. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Jun 04, 2025 at 10:58:38AM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: >> >> Stefan Hajnoczi <stefanha@gmail.com> writes: >> >> >> >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: >> >> >> >> >> >> From: Daniel P. Berrangé <berrange@redhat.com> >> > >> + >> >> >> +The increasing prevalence of AI code generators, most notably but not limited >> >> > >> >> > More detail is needed on what an "AI code generator" is. Coding >> >> > assistant tools range from autocompletion to linters to automatic code >> >> > generators. In addition there are other AI-related tools like ChatGPT >> >> > or Gemini as a chatbot that can people use like Stackoverflow or an >> >> > API documentation summarizer. >> >> > >> >> > I think the intent is to say: do not put code that comes from _any_ AI >> >> > tool into QEMU. >> >> > >> >> > It would be okay to use AI to research APIs, algorithms, brainstorm >> >> > ideas, debug the code, analyze the code, etc but the actual code >> >> > changes must not be generated by AI. >> > >> > The scope of the policy is around contributions we receive as >> > patches with SoB. Researching / brainstorming / analysis etc >> > are not contribution activities, so not covered by the policy >> > IMHO. >> >> Yes. More below. >> >> >> The existing text is about "AI code generators". However, the "most >> >> notably LLMs" that follows it could lead readers to believe it's about >> >> more than just code generation, because LLMs are in fact used for more. >> >> I figure this is your concern. >> >> >> >> We could instead start wide, then narrow the focus to code generation. >> >> Here's my try: >> >> >> >> The increasing prevalence of AI-assisted software development results >> >> in a number of difficult legal questions and risks for software >> >> projects, including QEMU. Of particular concern is code generated by >> >> `Large Language Models >> >> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). >> > >> > Documentation we maintain has the same concerns as code. >> > So I'd suggest to substitute 'code' with 'code / content'. >> >> Makes sense, thanks! >> >> >> If we want to mention uses of AI we consider okay, I'd do so further >> >> down, to not distract from the main point here. Perhaps: >> >> >> >> The QEMU project thus requires that contributors refrain from using AI code >> >> generators on patches intended to be submitted to the project, and will >> >> decline any contribution if use of AI is either known or suspected. >> >> >> >> This policy does not apply to other uses of AI, such as researching APIs or >> >> algorithms, static analysis, or debugging. >> >> >> >> Examples of tools impacted by this policy includes both GitHub's CoPilot, >> >> OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less >> >> well known. >> >> >> >> The paragraph in the middle is new, the other two are unchanged. >> >> >> >> Thoughts? >> > >> > IMHO its redundant, as the policy is expressly around contribution of >> > code/content, and those activities as not contribution related, so >> > outside the scope already. >> >> The very first paragraph in this file already set the scope: "provenance >> of patch submissions [...] to the project", so you have a point here. >> But does repeating the scope here hurt or help? > > I guess it probably doesn't hurt to have it. Perhaps tweak to > > This policy does not apply to other uses of AI, such as researching APIs or > algorithms, static analysis, or debugging, provided their output is not > to be included in contributions. > > and for the last paragraph remove 'both' and add a tailer > > Examples of tools impacted by this policy include GitHub's CoPilot, > OpenAI's ChatGPT, and Meta's Code Llama (amongst many others which are less > well known), and code/content generation agents which are built on top of > such tools. Sold!
On 4/6/25 09:15, Daniel P. Berrangé wrote: > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: >> Stefan Hajnoczi <stefanha@gmail.com> writes: >> >>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: >>>> >>>> From: Daniel P. Berrangé <berrange@redhat.com> > >> + >>>> +The increasing prevalence of AI code generators, most notably but not limited >>> >>> More detail is needed on what an "AI code generator" is. Coding >>> assistant tools range from autocompletion to linters to automatic code >>> generators. In addition there are other AI-related tools like ChatGPT >>> or Gemini as a chatbot that can people use like Stackoverflow or an >>> API documentation summarizer. >>> >>> I think the intent is to say: do not put code that comes from _any_ AI >>> tool into QEMU. >>> >>> It would be okay to use AI to research APIs, algorithms, brainstorm >>> ideas, debug the code, analyze the code, etc but the actual code >>> changes must not be generated by AI. > > The scope of the policy is around contributions we receive as > patches with SoB. Researching / brainstorming / analysis etc > are not contribution activities, so not covered by the policy > IMHO. > >> >> The existing text is about "AI code generators". However, the "most >> notably LLMs" that follows it could lead readers to believe it's about >> more than just code generation, because LLMs are in fact used for more. >> I figure this is your concern. >> >> We could instead start wide, then narrow the focus to code generation. >> Here's my try: >> >> The increasing prevalence of AI-assisted software development results >> in a number of difficult legal questions and risks for software >> projects, including QEMU. Of particular concern is code generated by >> `Large Language Models >> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > > Documentation we maintain has the same concerns as code. > So I'd suggest to substitute 'code' with 'code / content'. Why couldn't we accept documentation patches improved using LLM? As a non-native English speaker being often stuck trying to describe function APIs, I'm very tempted to use a LLM to review my sentences and make them better understandable. >> If we want to mention uses of AI we consider okay, I'd do so further >> down, to not distract from the main point here. Perhaps: >> >> The QEMU project thus requires that contributors refrain from using AI code >> generators on patches intended to be submitted to the project, and will >> decline any contribution if use of AI is either known or suspected. >> >> This policy does not apply to other uses of AI, such as researching APIs or >> algorithms, static analysis, or debugging. >> >> Examples of tools impacted by this policy includes both GitHub's CoPilot, >> OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less >> well known. >> >> The paragraph in the middle is new, the other two are unchanged. >> >> Thoughts? > > IMHO its redundant, as the policy is expressly around contribution of > code/content, and those activities as not contribution related, so > outside the scope already. > >> >>>> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ >>>> +(LLMs) results in a number of difficult legal questions and risks for software >>>> +projects, including QEMU. >> >> Thanks! >> >> [...] >> > > With regards, > Daniel
Philippe Mathieu-Daudé <philmd@linaro.org> writes:
> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>
>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>>
>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>>
>>>> More detail is needed on what an "AI code generator" is. Coding
>>>> assistant tools range from autocompletion to linters to automatic code
>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>> API documentation summarizer.
>>>>
>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>> tool into QEMU.
>>>>
>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>> changes must not be generated by AI.
>>
>> The scope of the policy is around contributions we receive as
>> patches with SoB. Researching / brainstorming / analysis etc
>> are not contribution activities, so not covered by the policy
>> IMHO.
>>
>>>
>>> The existing text is about "AI code generators". However, the "most
>>> notably LLMs" that follows it could lead readers to believe it's about
>>> more than just code generation, because LLMs are in fact used for more.
>>> I figure this is your concern.
>>>
>>> We could instead start wide, then narrow the focus to code generation.
>>> Here's my try:
>>>
>>> The increasing prevalence of AI-assisted software development results
>>> in a number of difficult legal questions and risks for software
>>> projects, including QEMU. Of particular concern is code generated by
>>> `Large Language Models
>>> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>>
>> Documentation we maintain has the same concerns as code.
>> So I'd suggest to substitute 'code' with 'code / content'.
>
> Why couldn't we accept documentation patches improved using LLM?
>
> As a non-native English speaker being often stuck trying to describe
> function APIs, I'm very tempted to use a LLM to review my sentences
> and make them better understandable.
I understand the temptation! Unfortunately, the "legal questions and
risks" Daniel described apply to *any* kind of copyrightable material,
not just to code.
Quote:
To satisfy the DCO, the patch contributor has to fully understand the
copyright and license status of code they are contributing to QEMU. With AI
code generators, the copyright and license status of the output is ill-defined
with no generally accepted, settled legal foundation.
Where the training material is known, it is common for it to include large
volumes of material under restrictive licensing/copyright terms. Even where
the training material is all known to be under open source licenses, it is
likely to be under a variety of terms, not all of which will be compatible
with QEMU's licensing requirements.
How contributors could comply with DCO terms (b) or (c) for the output of AI
code generators commonly available today is unclear. The QEMU project is not
willing or able to accept the legal risks of non-compliance.
[...]
On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote: > On 4/6/25 09:15, Daniel P. Berrangé wrote: > > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > > > > > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: > > > > > > > > > > From: Daniel P. Berrangé <berrange@redhat.com> > > >> + > > > > > +The increasing prevalence of AI code generators, most notably but not limited > > > > > > > > More detail is needed on what an "AI code generator" is. Coding > > > > assistant tools range from autocompletion to linters to automatic code > > > > generators. In addition there are other AI-related tools like ChatGPT > > > > or Gemini as a chatbot that can people use like Stackoverflow or an > > > > API documentation summarizer. > > > > > > > > I think the intent is to say: do not put code that comes from _any_ AI > > > > tool into QEMU. > > > > > > > > It would be okay to use AI to research APIs, algorithms, brainstorm > > > > ideas, debug the code, analyze the code, etc but the actual code > > > > changes must not be generated by AI. > > > > The scope of the policy is around contributions we receive as > > patches with SoB. Researching / brainstorming / analysis etc > > are not contribution activities, so not covered by the policy > > IMHO. > > > > > > > > The existing text is about "AI code generators". However, the "most > > > notably LLMs" that follows it could lead readers to believe it's about > > > more than just code generation, because LLMs are in fact used for more. > > > I figure this is your concern. > > > > > > We could instead start wide, then narrow the focus to code generation. > > > Here's my try: > > > > > > The increasing prevalence of AI-assisted software development results > > > in a number of difficult legal questions and risks for software > > > projects, including QEMU. Of particular concern is code generated by > > > `Large Language Models > > > <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > > > > Documentation we maintain has the same concerns as code. > > So I'd suggest to substitute 'code' with 'code / content'. > > Why couldn't we accept documentation patches improved using LLM? I would flip it around and ask why would documentation not be held to the same standard as code, when it comes to licensing and legal compliance ? This is all copyright content that we merge & distribute under the same QEMU licensing terms, and we have the same legal obligations whether it is "source code" or "documentation" or other content that is not traditional "source code" (images for example). > As a non-native English speaker being often stuck trying to describe > function APIs, I'm very tempted to use a LLM to review my sentences > and make them better understandable. I can understand that desire, and it is an admittedly tricky situation and tradeoff for which I don't have a great answer. As a starting point we (as reviewers/maintainers) must be broadly very tolerant & accepting of content that is not perfect English, because we know many (probably even the majority of) contributors won't have English as their first language. As a reviewer I don't mind imperfect language in submissions. Even if language is not perfect it is at least a direct expression of the author's understanding and thus we can have a level of trust in the docs based on our community experience with the contributor. If docs have been altered in any significant manner by an LLM, even if they are linguistically improved, IMHO, knowing that use of LLM would reduce my personal trust in the technically accuracy of the contribution. This is straying into the debate around the accuracy of LLMs though, which is interesting, but tangential from the purpose of this policy which aims to focus on the code provenance / legal side. So, back on track, a important point is that this policy (& the legal concerns/risks it attempts to address) are implicitly around contributions that can be considered copyrightable. Some so called "trivial" work can be so simplistic as to not meet the threshold for copyright protection, and it is thus easy for the DCO requirements to be satisfied. As a person, when you write the API documentation from scratch, your output would generally be considered to be copyrightable contribution by the author. When a reviewer then suggests changes to your docs, most of the time those changes are so trivial, that the reviewer wouldn't be claiming copyright over the resulting work. If the reviewer completely rewrites entire sentences in the docs though, though would be able to claim copyright over part of the resulting work. The tippping point between copyrightable/non-copyrightable is hard to define in a policy. It is inherantly fuzzy, and somewhat of a "you'll know it when you see it" or "lets debate it in court" situation... So back to LLMs. If you ask the LLM (or an agent using an LLM) to entirely write the API docs from scratch, I think that should be expected to fall under this proposed contribution policy in general. If you write the API docs yourself and ask the LLM to review and suggest improvements, that MAY or MAY NOT fall under this policy. If the LLM suggested tweaks were minor enough to be considered not to meet the threshold to be copyrightable it would be fine, this is little different to a human reviewer suggesting tweaks. If the LLM suggested large scale rewriting that would be harder to draw the line, but would tend towards falling under this contribution policy. So it depends on the scope of what the LLM suggested as a change to your docs. IOW, LLM-as-sparkling-auto-correct is probably OK, but LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK This is a scenario where the QEMU contributor has to use their personal judgement as to whether their use of LLM in a docs context is compliant with this policy, or not. I don't think we should try to describe this in the policy given how fuzzy the situation is. NB, this copyrightable/non-copyrightable situation applies to source code too, not just docs. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 4/6/25 10:40, Daniel P. Berrangé wrote: > On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote: >> On 4/6/25 09:15, Daniel P. Berrangé wrote: >>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote: >>>> Stefan Hajnoczi <stefanha@gmail.com> writes: >>>> >>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote: >>>>>> >>>>>> From: Daniel P. Berrangé <berrange@redhat.com> >>> >> + >>>>>> +The increasing prevalence of AI code generators, most notably but not limited >>>>> >>>>> More detail is needed on what an "AI code generator" is. Coding >>>>> assistant tools range from autocompletion to linters to automatic code >>>>> generators. In addition there are other AI-related tools like ChatGPT >>>>> or Gemini as a chatbot that can people use like Stackoverflow or an >>>>> API documentation summarizer. >>>>> >>>>> I think the intent is to say: do not put code that comes from _any_ AI >>>>> tool into QEMU. >>>>> >>>>> It would be okay to use AI to research APIs, algorithms, brainstorm >>>>> ideas, debug the code, analyze the code, etc but the actual code >>>>> changes must not be generated by AI. >>> >>> The scope of the policy is around contributions we receive as >>> patches with SoB. Researching / brainstorming / analysis etc >>> are not contribution activities, so not covered by the policy >>> IMHO. >>> >>>> >>>> The existing text is about "AI code generators". However, the "most >>>> notably LLMs" that follows it could lead readers to believe it's about >>>> more than just code generation, because LLMs are in fact used for more. >>>> I figure this is your concern. >>>> >>>> We could instead start wide, then narrow the focus to code generation. >>>> Here's my try: >>>> >>>> The increasing prevalence of AI-assisted software development results >>>> in a number of difficult legal questions and risks for software >>>> projects, including QEMU. Of particular concern is code generated by >>>> `Large Language Models >>>> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). >>> >>> Documentation we maintain has the same concerns as code. >>> So I'd suggest to substitute 'code' with 'code / content'. >> >> Why couldn't we accept documentation patches improved using LLM? > > I would flip it around and ask why would documentation not be held > to the same standard as code, when it comes to licensing and legal > compliance ? > > This is all copyright content that we merge & distribute under the > same QEMU licensing terms, and we have the same legal obligations > whether it is "source code" or "documentation" or other content > that is not traditional "source code" (images for example). > > >> As a non-native English speaker being often stuck trying to describe >> function APIs, I'm very tempted to use a LLM to review my sentences >> and make them better understandable. > > I can understand that desire, and it is an admittedly tricky situation > and tradeoff for which I don't have a great answer. > > As a starting point we (as reviewers/maintainers) must be broadly > very tolerant & accepting of content that is not perfect English, > because we know many (probably even the majority of) contributors > won't have English as their first language. > > As a reviewer I don't mind imperfect language in submissions. Even > if language is not perfect it is at least a direct expression of > the author's understanding and thus we can have a level of trust > in the docs based on our community experience with the contributor. > > If docs have been altered in any significant manner by an LLM, > even if they are linguistically improved, IMHO, knowing that use > of LLM would reduce my personal trust in the technically accuracy > of the contribution. > > This is straying into the debate around the accuracy of LLMs though, > which is interesting, but tangential from the purpose of this policy > which aims to focus on the code provenance / legal side. > > > > So, back on track, a important point is that this policy (& the > legal concerns/risks it attempts to address) are implicitly > around contributions that can be considered copyrightable. > > Some so called "trivial" work can be so simplistic as to not meet > the threshold for copyright protection, and it is thus easy for the > DCO requirements to be satisfied. > > > As a person, when you write the API documentation from scratch, > your output would generally be considered to be copyrightable > contribution by the author. > > When a reviewer then suggests changes to your docs, most of the > time those changes are so trivial, that the reviewer wouldn't be > claiming copyright over the resulting work. > > If the reviewer completely rewrites entire sentences in the > docs though, though would be able to claim copyright over part > of the resulting work. > > > The tippping point between copyrightable/non-copyrightable is > hard to define in a policy. It is inherantly fuzzy, and somewhat > of a "you'll know it when you see it" or "lets debate it in court" > situation... > > > So back to LLMs. > > > If you ask the LLM (or an agent using an LLM) to entirely write > the API docs from scratch, I think that should be expected to > fall under this proposed contribution policy in general. > > > If you write the API docs yourself and ask the LLM to review and > suggest improvements, that MAY or MAY NOT fall under this policy. > > If the LLM suggested tweaks were minor enough to be considered > not to meet the threshold to be copyrightable it would be fine, > this is little different to a human reviewer suggesting tweaks. Good. > If the LLM suggested large scale rewriting that would be harder > to draw the line, but would tend towards falling under this > contribution policy. > > So it depends on the scope of what the LLM suggested as a change > to your docs. > > IOW, LLM-as-sparkling-auto-correct is probably OK, but > LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK OK. > This is a scenario where the QEMU contributor has to use their > personal judgement as to whether their use of LLM in a docs context > is compliant with this policy, or not. I don't think we should try > to describe this in the policy given how fuzzy the situation is. Thank you very much for this detailed explanation! > > NB, this copyrightable/non-copyrightable situation applies to source > code too, not just docs. > > With regards, > Daniel
Am 03.06.2025 um 16:25 hat Markus Armbruster geschrieben: > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions which are > + believed to include or derive from AI generated code. This includes ChatGPT, > + CoPilot, Llama and similar tools** [...] > +Examples of tools impacted by this policy includes both GitHub's CoPilot, > +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less > +well known. I wonder if the best list of examples is still the same now, a year after the original version of the document was written. In particular, maybe including an example of popular vibe coding IDEs like Cursor would make sense? But it's only examples anyway, so either way is fine. Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 03.06.2025 um 16:25 hat Markus Armbruster geschrieben: >> +TL;DR: >> + >> + **Current QEMU project policy is to DECLINE any contributions which are >> + believed to include or derive from AI generated code. This includes ChatGPT, >> + CoPilot, Llama and similar tools** > > [...] > >> +Examples of tools impacted by this policy includes both GitHub's CoPilot, >> +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less >> +well known. > > I wonder if the best list of examples is still the same now, a year > after the original version of the document was written. In particular, > maybe including an example of popular vibe coding IDEs like Cursor would > make sense? > > But it's only examples anyway, so either way is fine. Stefan suggested a few more, and I'll add them. Thanks!
© 2016 - 2025 Red Hat, Inc.