docs/devel/code-provenance.rst | 142 ++++++++++++++++++++++----------- 1 file changed, 94 insertions(+), 48 deletions(-)
Until now QEMU's code provenance policy declined any contribution
believed to include or derive from AI-generated content. A blanket ban
was easy to maintain while LLM output was rarely usable on its own, but
as the tools improved an absolute prohibition has become harder to
justify.
The concern that motivated the policy is unchanged, and it is worth stating
precisely: the DCO is about whether the submitter has the legal right to
contribute the code, not about "creative expression". While the status of
LLM output seems to be converging towards non-copyrightability, questions
around unintentional reproduction of copyrighted code are still open.
What has shifted is the balance of risk:
- projects accepting AI-assisted content have not run into serious
legal trouble so far, which suggests the probability of the risk
materializing is not high;
- other organizations, such as Red Hat[1], have assessed the risk as
acceptable -- though a community of individual developers does not
have the legal backing of a company, and even an unfounded dispute
would be a long-lasting distraction from work on QEMU.
Nevertheless, even Red Hat mentions that "the possibility of occasional
replication cannot be ignored". In QEMU's view, attentiveness and
oversight are not a practical way to address this; yet as a copyleft
project, copyright and code provenance are of utmost importance to us.
Therefore, it remains prudent to only permit AI assistance where the
ramifications of copyright violations are at least easy to revert and
unlikely to spread: tests, documentation, mechanical changes, and small
bug fixes. Core code that other things depend on, and that cannot
simply be thrown away once a problem is noticed long after the fact,
stays off-limits without prior agreement from a maintainer.
Related to this, and already visible in the incredible uptick in
security reports, is the question of maintainer burnout and the shift in
effort from the author to the reviewer of the code. AI lowers the cost of
producing a patch but does nothing to lower the cost of understanding and
reviewing one; if anything it raises it, since a reviewer can no longer
assume that the submitter has reasoned through every line. The limits
above work just as much to keep the volume of review work sustainable.
Revise the policy according to the above considerations, and introduce the
"AI-used-for:" trailer as a record of where AI was used. The standard is
slightly different from the more usual "Assisted-by"; the intention is for
the metadata to provide more information for reviewers to judge the result.
In any case, use of AI does not relax any other contribution requirement:
authors still comply with the DCO and take responsibility for the whole
patch via Signed-off-by.
[Commit message largely based on
https://lore.kernel.org/qemu-devel/ahXbxzB4C_lr6b0N@redhat.com/, by
Kevin Wolf. - Paolo]
[1] https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20260524083329-mutt-send-email-mst@kernel.org/T/
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
---
docs/devel/code-provenance.rst | 142 ++++++++++++++++++++++-----------
1 file changed, 94 insertions(+), 48 deletions(-)
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index 65b8f232a08..857588c43ba 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -1,7 +1,7 @@
.. _code-provenance:
-Code provenance
-===============
+Code provenance and AI usage
+============================
Certifying patch submissions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -288,62 +288,108 @@ content generators below.
Use of AI-generated content
~~~~~~~~~~~~~~~~~~~~~~~~~~~
-TL;DR:
+.. warning::
- **Current QEMU project policy is to DECLINE any contributions which are
- believed to include or derive from AI generated content. This includes
- ChatGPT, Claude, Copilot, Llama and similar tools.**
+ Please read the below policy before using AI to contribute code or
+ documentation to QEMU. This applies to ChatGPT, Claude, Copilot,
+ Llama, and similar tools.**
- **This policy does not apply to other uses of AI, such as researching APIs
- or algorithms, static analysis, or debugging, provided their output is not
- included in contributions.**
+The increasing prevalence of AI-assisted software development,
+and especially the use of content generated by `Large Language Models
+<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs),
+poses a number of difficult questions.
-The increasing prevalence of AI-assisted software development results in a
-number of difficult legal questions and risks for software projects, including
-QEMU. Of particular concern is content generated by `Large Language Models
-<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
+Risks to open source projects include maintainer burnout from an
+increased number of contributions, as well as the risk to the project
+from unintentional inclusion of copyrighted material in the LLM's output.
+In order to mitigate these risks, the QEMU project currently allows
+using AI/LLM tools to produce patches in a limited set of scenarios:
-The QEMU community requires that contributors certify their patch submissions
-are made in accordance with the rules of the `Developer's Certificate of
-Origin (DCO) <dco>`.
+**Mechanical changes**
+ If you can use a deterministic tool, it is preferred that you use it
+ and not replace it with AI. If you don't know how to do the change
+ deterministically, you can ask the AI for help.
-To satisfy the DCO, the patch contributor has to fully understand the
-copyright and license status of content they are contributing to QEMU. With AI
-content generators, the copyright and license status of the output is
-ill-defined with no generally accepted, settled legal foundation.
+**Small bug fixes**
+ These should be limited to 20 lines of code or less, not including
+ tests. You are still expected to :ref:`understand and explain your changes
+ <write_a_meaningful_commit_message>` and the rationale behind them.
-Where the training material is known, it is common for it to include large
-volumes of material under restrictive licensing/copyright terms. Even where
-the training material is all known to be under open source licenses, it is
-likely to be under a variety of terms, not all of which will be compatible
-with QEMU's licensing requirements.
+**Documentation and code comments**
+ While AI can help draft text, it still requires significant human
+ oversight. Pay attention to the organization and flow of the generated
+ text, and strictly fact-check all technical details as LLMs are prone
+ to being confidently wrong.
-How contributors could comply with DCO terms (b) or (c) for the output of AI
-content generators commonly available today is unclear. The QEMU project is
-not willing or able to accept the legal risks of non-compliance.
+**Tests**
+ Note that you must still confirm that each test actually exercises
+ the intended behavior including, for regression tests, that it
+ fails without the code under test and passes for the right reason.
-The QEMU project thus requires that contributors refrain from using AI content
-generators on patches intended to be submitted to the project, and will
-decline any contribution if use of AI is either known or suspected.
+These boundaries do not apply to other uses of AI, such as researching
+APIs or algorithms, static analysis, or debugging, provided the model's
+output is not included in contributions.
-Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
-ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
-generation agents which are built on top of such tools.
+If you wish to send large amounts of AI-generated changes, or any other
+contribution not in the above categories, please get in touch with the
+maintainer beforehand. These can be treated as experiments, at the
+discretion of the maintainer and the community, with no obligation
+to accept them.
-This policy may evolve as AI tools mature and the legal situation is
-clarified.
+**Use of AI does not remove the need for authors to comply with all
+other requirements for contribution.** In particular, the
+``Signed-off-by`` label in a patch submission is a statement that
+the author takes responsibility for the entire contents of the patch,
+certifying that their patch submission is made in accordance with the
+rules of the `Developer's Certificate of Origin (DCO) <dco>`.
-Exceptions
-^^^^^^^^^^
+Commit messages for AI-assisted changes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The QEMU project welcomes discussion on any exceptions to this policy,
-or more general revisions. This can be done by contacting the qemu-devel
-mailing list with details of a proposed tool, model, usage scenario, etc.
-that is beneficial to QEMU, while still mitigating issues around compliance
-with the DCO. After discussion, any exception will be listed below.
+When AI/LLM tools produce or substantively shape your patch, add an
+``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your
+DCO obligations and a guide to reviewers. The text is one or more of
+``code``, ``tests``, ``docs``, ``research``, possibly followed by an
+explanation in parentheses:
-Exceptions do not remove the need for authors to comply with all other
-requirements for contribution. In particular, the "Signed-off-by"
-label in a patch submission is a statement that the author takes
-responsibility for the entire contents of the patch, including any parts
-that were generated or assisted by AI tools or other tools.
+.. code-block:: none
+
+ AI-used-for: tests, docs
+ AI-used-for: code
+ AI-used-for: code (refactoring)
+ AI-used-for: code (prototype)
+ AI-used-for: research
+
+``AI-used-for`` should not be included for "background" usage such as
+autocomplete or obtaining a pre-review of the patch.
+
+There is no requirement to include your prompts or summarize the
+conversation in the commit message or cover letter, but you may do so
+if you think it helps a reviewer judge the result. For example:
+
+**Helpful prompts**
+ These describe concrete constraints or instructions, making it easy for a
+ reviewer to see how the tool's output was guided:
+
+ * "move field ``foo`` from ``struct aa`` to ``struct bb``. If a
+ function already has a local variable or parameter of type ``struct
+ bb``, use it instead of accessing ``aa.bb``"
+
+ * "add an implementation of the trait for ``Mutex<T: MyTrait>``; it
+ takes the lock around the calls and forwards to ``T``"
+
+**Unhelpful prompts**
+ These are too generic to provide meaningful context. You can of course
+ use them in the context of a complex interaction with the LLM, but they
+ should not be included in the commit message:
+
+ * "write user-facing documentation for the new tool"
+
+ * "write testcases for the new functions"
+
+QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by``
+trailers to indicate AI usage. In particular, it is not necessary to
+specify the exact AI model or tool used to create the commit.
+
+Deterministic tooling (sed, coccinelle, formatters) is out of scope for
+the trailer, but should be mentioned in the commit message.
--
2.54.0
On Fri, 29 May 2026 at 10:46, Paolo Bonzini <pbonzini@redhat.com> wrote: > > Until now QEMU's code provenance policy declined any contribution > believed to include or derive from AI-generated content. A blanket ban > was easy to maintain while LLM output was rarely usable on its own, but > as the tools improved an absolute prohibition has become harder to > justify. > > The concern that motivated the policy is unchanged, and it is worth stating > precisely: the DCO is about whether the submitter has the legal right to > contribute the code, not about "creative expression". While the status of > LLM output seems to be converging towards non-copyrightability, questions > around unintentional reproduction of copyrighted code are still open. > What has shifted is the balance of risk: > > - projects accepting AI-assisted content have not run into serious > legal trouble so far, which suggests the probability of the risk > materializing is not high; > > - other organizations, such as Red Hat[1], have assessed the risk as > acceptable -- though a community of individual developers does not > have the legal backing of a company, and even an unfounded dispute > would be a long-lasting distraction from work on QEMU. > > Nevertheless, even Red Hat mentions that "the possibility of occasional > replication cannot be ignored". In QEMU's view, attentiveness and > oversight are not a practical way to address this; yet as a copyleft > project, copyright and code provenance are of utmost importance to us. > Therefore, it remains prudent to only permit AI assistance where the > ramifications of copyright violations are at least easy to revert and > unlikely to spread: tests, documentation, mechanical changes, and small > bug fixes. Core code that other things depend on, and that cannot > simply be thrown away once a problem is noticed long after the fact, > stays off-limits without prior agreement from a maintainer. This all makes sense to me, except for the part where we allow a maintainer to say "actually it's OK". Where our justification for not wanting AI contributions rests on "it's too much burden on maintainers to have to deal with and review it", allowing an individual maintainer to say "I'm OK with that burden in this case or for this particular contribution" logically follows as a possible relaxation. But if as a project we want to limit the blast-radius if we find we have to rip out a hypothetical tainted contribution, shouldn't that mean that we hold that as a project-wide line, rather than leaving it up to the opinion of the individual maintainer ? > Related to this, and already visible in the incredible uptick in > security reports, is the question of maintainer burnout and the shift in > effort from the author to the reviewer of the code. AI lowers the cost of > producing a patch but does nothing to lower the cost of understanding and > reviewing one; if anything it raises it, since a reviewer can no longer > assume that the submitter has reasoned through every line. The limits > above work just as much to keep the volume of review work sustainable. > > Revise the policy according to the above considerations, and introduce the > "AI-used-for:" trailer as a record of where AI was used. The standard is > slightly different from the more usual "Assisted-by"; the intention is for > the metadata to provide more information for reviewers to judge the result. > > In any case, use of AI does not relax any other contribution requirement: > authors still comply with the DCO and take responsibility for the whole > patch via Signed-off-by. > > [Commit message largely based on > https://lore.kernel.org/qemu-devel/ahXbxzB4C_lr6b0N@redhat.com/, by > Kevin Wolf. - Paolo] > +**Documentation and code comments** > + While AI can help draft text, it still requires significant human > + oversight. Pay attention to the organization and flow of the generated > + text, and strictly fact-check all technical details as LLMs are prone > + to being confidently wrong. I think the application to documentation and comments is the part I'm least enthusiastic about here. For changes to code, we have at least some guardrails on the AI output, in the fact that it has to compile and to pass tests. For changes to documentation, the only guardrails are human eyeballs. Also both comments and documentation ideally are a record of what we intended the behaviour to be. If an LLM is effectively autogenerating something documentation-shaped from the code we lose that. -- PMM
On Fri, May 29, 2026 at 04:34:45PM +0100, Peter Maydell wrote: > On Fri, 29 May 2026 at 10:46, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > Until now QEMU's code provenance policy declined any contribution > > believed to include or derive from AI-generated content. A blanket ban > > was easy to maintain while LLM output was rarely usable on its own, but > > as the tools improved an absolute prohibition has become harder to > > justify. > > > > The concern that motivated the policy is unchanged, and it is worth stating > > precisely: the DCO is about whether the submitter has the legal right to > > contribute the code, not about "creative expression". While the status of > > LLM output seems to be converging towards non-copyrightability, questions > > around unintentional reproduction of copyrighted code are still open. > > What has shifted is the balance of risk: > > > > - projects accepting AI-assisted content have not run into serious > > legal trouble so far, which suggests the probability of the risk > > materializing is not high; > > > > - other organizations, such as Red Hat[1], have assessed the risk as > > acceptable -- though a community of individual developers does not > > have the legal backing of a company, and even an unfounded dispute > > would be a long-lasting distraction from work on QEMU. > > > > Nevertheless, even Red Hat mentions that "the possibility of occasional > > replication cannot be ignored". In QEMU's view, attentiveness and > > oversight are not a practical way to address this; yet as a copyleft > > project, copyright and code provenance are of utmost importance to us. > > Therefore, it remains prudent to only permit AI assistance where the > > ramifications of copyright violations are at least easy to revert and > > unlikely to spread: tests, documentation, mechanical changes, and small > > bug fixes. Core code that other things depend on, and that cannot > > simply be thrown away once a problem is noticed long after the fact, > > stays off-limits without prior agreement from a maintainer. > > This all makes sense to me, except for the part where we allow > a maintainer to say "actually it's OK". Where our justification > for not wanting AI contributions rests on "it's too much burden > on maintainers to have to deal with and review it", allowing an > individual maintainer to say "I'm OK with that burden in this case > or for this particular contribution" logically follows as a > possible relaxation. But if as a project we want to limit the > blast-radius if we find we have to rip out a hypothetical tainted > contribution, shouldn't that mean that we hold that as a project-wide > line, rather than leaving it up to the opinion of the individual > maintainer ? I guess, the maintainer can judge that the code is unique and qemu specific enough, and follows from what it is doing automatically enough, that the chances it is accidentally copying something are nil? > > Related to this, and already visible in the incredible uptick in > > security reports, is the question of maintainer burnout and the shift in > > effort from the author to the reviewer of the code. AI lowers the cost of > > producing a patch but does nothing to lower the cost of understanding and > > reviewing one; if anything it raises it, since a reviewer can no longer > > assume that the submitter has reasoned through every line. The limits > > above work just as much to keep the volume of review work sustainable. > > > > Revise the policy according to the above considerations, and introduce the > > "AI-used-for:" trailer as a record of where AI was used. The standard is > > slightly different from the more usual "Assisted-by"; the intention is for > > the metadata to provide more information for reviewers to judge the result. > > > > In any case, use of AI does not relax any other contribution requirement: > > authors still comply with the DCO and take responsibility for the whole > > patch via Signed-off-by. > > > > [Commit message largely based on > > https://lore.kernel.org/qemu-devel/ahXbxzB4C_lr6b0N@redhat.com/, by > > Kevin Wolf. - Paolo] > > > +**Documentation and code comments** > > + While AI can help draft text, it still requires significant human > > + oversight. Pay attention to the organization and flow of the generated > > + text, and strictly fact-check all technical details as LLMs are prone > > + to being confidently wrong. > > I think the application to documentation and comments is the part > I'm least enthusiastic about here. But I am very enthusiastic about less agrammatical english in both. AI is super helpful for non native speakers. > For changes to code, we have at > least some guardrails on the AI output, in the fact that it has to > compile and to pass tests. For changes to documentation, the > only guardrails are human eyeballs. > > Also both comments and documentation ideally are a record of > what we intended the behaviour to be. If an LLM is effectively > autogenerating something documentation-shaped from the code we > lose that. > > -- PMM
Il ven 29 mag 2026, 17:46 Michael S. Tsirkin <mst@redhat.com> ha scritto: > > If as a project we want to limit the > > blast-radius if we find we have to rip out a hypothetical tainted > > contribution, shouldn't that mean that we hold that as a project-wide > > line, rather than leaving it up to the opinion of the individual > > maintainer ? > > I guess, the maintainer can judge that the code is unique and qemu > specific enough, and follows from what it is doing automatically enough, > that the chances it is accidentally copying something are nil? > One thing that I had in mind was using AI to adjust QEMU code as the kernel side goes through review and APIs change. The changes at that point may be not entirely mechanical and, more importantly for traceability, it probably will not make sense to separate them from the original code; but the code still has fundamentally a shape and design that was provided by the human. Another, which is Rust-specific, is procedural macro code, which is often boring, or very much tied to the shape of the generated code and human-written traits, or both. See https://github.com/qemu/qemu/blob/master/rust/qemu-macros/src/migration_state.rs for an example, contrasting the block starting with "self.conversion = match" with the rest. I don't think it makes sense to have a wholesale permission for procedural macros because that is not *always* true, or true for a whole file. But say a contributor wrote the overall specification/documentation first, and mostly one-shotted a skeleton with a prompt like "based on the documentation, generate basic attribute parsing code for the MigrationState derive macro, together with a code generator that provides empty methods for an implementation of the trait ::migration::MigrationState from rust/hw/migration/". Then I would absolutely not reject it. This is also the intention of the suggestion around prompts—to favor quick generation of boilerplate code over full "agentic" (blargh) implementation. > > +**Documentation and code comments** > > > + While AI can help draft text, it still requires significant human > > > + oversight. Pay attention to the organization and flow of the > generated > > > + text, and strictly fact-check all technical details as LLMs are > prone > > > + to being confidently wrong. > > > > I think the application to documentation and comments is the part > > I'm least enthusiastic about here. > > But I am very enthusiastic about less agrammatical english in both. > AI is super helpful for non native speakers. > I am also not enthusiastic for documentation; the review I gave for Philippe's unedited experiment was rather scathing. The main challenge for documentation is the structure of the work, which is really complicated to establish because the LLM doesn't have a clue about the underlying design. But there can be interesting uses nevertheless, such as integrating knowledge from functional tests into documentation, that are worth exploring. Also for Rust I am really trying to have *all* functions commented (and tested through so tests) and AI can produce good results more often than not, especially when the model has access to a human-written file-level blurb. > For changes to code, we have at > > least some guardrails on the AI output, in the fact that it has to > > compile and to pass tests. For changes to documentation, the > > only guardrails are human eyeballs. > > > > Also both comments and documentation ideally are a record of > > what we intended the behaviour to be. If an LLM is effectively > > autogenerating something documentation-shaped from the code we > > lose that. > I agree with both of these observations, for what it's worth. Paolo > > > -- PMM > >
On Fri, May 29, 2026 at 06:17:29PM +0200, Paolo Bonzini wrote: > > > Il ven 29 mag 2026, 17:46 Michael S. Tsirkin <mst@redhat.com> ha scritto: > > > If as a project we want to limit the > > blast-radius if we find we have to rip out a hypothetical tainted > > contribution, shouldn't that mean that we hold that as a project-wide > > line, rather than leaving it up to the opinion of the individual > > maintainer ? > > I guess, the maintainer can judge that the code is unique and qemu > specific enough, and follows from what it is doing automatically enough, > that the chances it is accidentally copying something are nil? > > > One thing that I had in mind was using AI to adjust QEMU code as the kernel > side goes through review and APIs change. The changes at that point may be not > entirely mechanical and, more importantly for traceability, it probably will > not make sense to separate them from the original code; but the code still has > fundamentally a shape and design that was provided by the human. > > Another, which is Rust-specific, is procedural macro code, which is often > boring, or very much tied to the shape of the generated code and human-written > traits, or both. See https://github.com/qemu/qemu/blob/master/rust/qemu-macros/ > src/migration_state.rs for an example, contrasting the block starting with > "self.conversion = match" with the rest. > > I don't think it makes sense to have a wholesale permission for procedural > macros because that is not *always* true, or true for a whole file. But say a > contributor wrote the overall specification/documentation first, and mostly > one-shotted a skeleton with a prompt like "based on the documentation, generate > basic attribute parsing code for the MigrationState derive macro, together with > a code generator that provides empty methods for an implementation of the trait > ::migration::MigrationState from rust/hw/migration/". Then I would absolutely > not reject it. This is also the intention of the suggestion around prompts—to > favor quick generation of boilerplate code over full "agentic" (blargh) > implementation. Agreed. > > > > +**Documentation and code comments** > > > + While AI can help draft text, it still requires significant human > > > + oversight. Pay attention to the organization and flow of the > generated > > > + text, and strictly fact-check all technical details as LLMs are > prone > > > + to being confidently wrong. > > > > I think the application to documentation and comments is the part > > I'm least enthusiastic about here. > > But I am very enthusiastic about less agrammatical english in both. > AI is super helpful for non native speakers. > > > I am also not enthusiastic for documentation; the review I gave for Philippe's > unedited experiment was rather scathing. The main challenge for documentation > is the structure of the work, which is really complicated to establish because > the LLM doesn't have a clue about the underlying design. > > But there can be interesting uses nevertheless, such as integrating knowledge > from functional tests into documentation, that are worth exploring. Also for > Rust I am really trying to have *all* functions commented (and tested through > so tests) and AI can produce good results more often than not, especially when > the model has access to a human-written file-level blurb. > > > > For changes to code, we have at > > least some guardrails on the AI output, in the fact that it has to > > compile and to pass tests. For changes to documentation, the > > only guardrails are human eyeballs. > > > > Also both comments and documentation ideally are a record of > > what we intended the behaviour to be. If an LLM is effectively > > autogenerating something documentation-shaped from the code we > > lose that. > > > I agree with both of these observations, for what it's worth. > > Paolo > > > > > > -- PMM > >
On Fri, 29 May 2026 at 16:46, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, May 29, 2026 at 04:34:45PM +0100, Peter Maydell wrote: > > On Fri, 29 May 2026 at 10:46, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > +**Documentation and code comments** > > > + While AI can help draft text, it still requires significant human > > > + oversight. Pay attention to the organization and flow of the generated > > > + text, and strictly fact-check all technical details as LLMs are prone > > > + to being confidently wrong. > > > > I think the application to documentation and comments is the part > > I'm least enthusiastic about here. > > But I am very enthusiastic about less agrammatical english in both. > AI is super helpful for non native speakers. There's a difference between "I wrote the comments / documentation and then asked the AI to check it for grammatical mistakes" or "I wrote the documentation in my own language and asked it to translate" and "I asked the AI to write or draft the documentation starting from nothing". I think the first two are OK, but not so much the third. -- PMM
On Fri, 29 May 2026, Paolo Bonzini wrote: > Until now QEMU's code provenance policy declined any contribution > believed to include or derive from AI-generated content. A blanket ban > was easy to maintain while LLM output was rarely usable on its own, but > as the tools improved an absolute prohibition has become harder to > justify. > > The concern that motivated the policy is unchanged, and it is worth stating > precisely: the DCO is about whether the submitter has the legal right to > contribute the code, not about "creative expression". While the status of > LLM output seems to be converging towards non-copyrightability, questions > around unintentional reproduction of copyrighted code are still open. > What has shifted is the balance of risk: > > - projects accepting AI-assisted content have not run into serious > legal trouble so far, which suggests the probability of the risk > materializing is not high; > > - other organizations, such as Red Hat[1], have assessed the risk as > acceptable -- though a community of individual developers does not > have the legal backing of a company, and even an unfounded dispute > would be a long-lasting distraction from work on QEMU. > > Nevertheless, even Red Hat mentions that "the possibility of occasional > replication cannot be ignored". In QEMU's view, attentiveness and > oversight are not a practical way to address this; yet as a copyleft > project, copyright and code provenance are of utmost importance to us. > Therefore, it remains prudent to only permit AI assistance where the > ramifications of copyright violations are at least easy to revert and > unlikely to spread: tests, documentation, mechanical changes, and small > bug fixes. Core code that other things depend on, and that cannot > simply be thrown away once a problem is noticed long after the fact, > stays off-limits without prior agreement from a maintainer. > > Related to this, and already visible in the incredible uptick in > security reports, is the question of maintainer burnout and the shift in > effort from the author to the reviewer of the code. AI lowers the cost of > producing a patch but does nothing to lower the cost of understanding and > reviewing one; if anything it raises it, since a reviewer can no longer > assume that the submitter has reasoned through every line. The limits > above work just as much to keep the volume of review work sustainable. > > Revise the policy according to the above considerations, and introduce the > "AI-used-for:" trailer as a record of where AI was used. The standard is > slightly different from the more usual "Assisted-by"; the intention is for > the metadata to provide more information for reviewers to judge the result. > > In any case, use of AI does not relax any other contribution requirement: > authors still comply with the DCO and take responsibility for the whole > patch via Signed-off-by. > > [Commit message largely based on > https://lore.kernel.org/qemu-devel/ahXbxzB4C_lr6b0N@redhat.com/, by > Kevin Wolf. - Paolo] > > [1] https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > Acked-by: Michael S. Tsirkin <mst@redhat.com> > Cc: Alex Bennée <alex.bennee@linaro.org> > Cc: Alistair Francis <alistair.francis@wdc.com> > Cc: BALATON Zoltan <balaton@eik.bme.hu> > Cc: Daniel P. Berrangé <berrange@redhat.com> > Cc: Fabiano Rosas <farosas@suse.de> > Cc: Kevin Wolf <kwolf@redhat.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Warner Losh <imp@bsdimp.com> > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > Link: https://lore.kernel.org/qemu-devel/20260524083329-mutt-send-email-mst@kernel.org/T/ > Signed-off-by: Paolo Bonzini <bonzini@gnu.org> > --- > docs/devel/code-provenance.rst | 142 ++++++++++++++++++++++----------- > 1 file changed, 94 insertions(+), 48 deletions(-) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index 65b8f232a08..857588c43ba 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -1,7 +1,7 @@ > .. _code-provenance: > > -Code provenance > -=============== > +Code provenance and AI usage > +============================ > > Certifying patch submissions > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > @@ -288,62 +288,108 @@ content generators below. > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -TL;DR: > +.. warning:: > > - **Current QEMU project policy is to DECLINE any contributions which are > - believed to include or derive from AI generated content. This includes > - ChatGPT, Claude, Copilot, Llama and similar tools.** > + Please read the below policy before using AI to contribute code or > + documentation to QEMU. This applies to ChatGPT, Claude, Copilot, > + Llama, and similar tools.** > > - **This policy does not apply to other uses of AI, such as researching APIs > - or algorithms, static analysis, or debugging, provided their output is not > - included in contributions.** > +The increasing prevalence of AI-assisted software development, > +and especially the use of content generated by `Large Language Models > +<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs), > +poses a number of difficult questions. > > -The increasing prevalence of AI-assisted software development results in a > -number of difficult legal questions and risks for software projects, including > -QEMU. Of particular concern is content generated by `Large Language Models > -<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > +Risks to open source projects include maintainer burnout from an > +increased number of contributions, as well as the risk to the project > +from unintentional inclusion of copyrighted material in the LLM's output. > +In order to mitigate these risks, the QEMU project currently allows > +using AI/LLM tools to produce patches in a limited set of scenarios: > > -The QEMU community requires that contributors certify their patch submissions > -are made in accordance with the rules of the `Developer's Certificate of > -Origin (DCO) <dco>`. > +**Mechanical changes** > + If you can use a deterministic tool, it is preferred that you use it > + and not replace it with AI. If you don't know how to do the change > + deterministically, you can ask the AI for help. > > -To satisfy the DCO, the patch contributor has to fully understand the > -copyright and license status of content they are contributing to QEMU. With AI > -content generators, the copyright and license status of the output is > -ill-defined with no generally accepted, settled legal foundation. > +**Small bug fixes** > + These should be limited to 20 lines of code or less, not including > + tests. You are still expected to :ref:`understand and explain your changes > + <write_a_meaningful_commit_message>` and the rationale behind them. > > -Where the training material is known, it is common for it to include large > -volumes of material under restrictive licensing/copyright terms. Even where > -the training material is all known to be under open source licenses, it is > -likely to be under a variety of terms, not all of which will be compatible > -with QEMU's licensing requirements. > +**Documentation and code comments** > + While AI can help draft text, it still requires significant human > + oversight. Pay attention to the organization and flow of the generated > + text, and strictly fact-check all technical details as LLMs are prone > + to being confidently wrong. > > -How contributors could comply with DCO terms (b) or (c) for the output of AI > -content generators commonly available today is unclear. The QEMU project is > -not willing or able to accept the legal risks of non-compliance. > +**Tests** > + Note that you must still confirm that each test actually exercises > + the intended behavior including, for regression tests, that it > + fails without the code under test and passes for the right reason. > > -The QEMU project thus requires that contributors refrain from using AI content > -generators on patches intended to be submitted to the project, and will > -decline any contribution if use of AI is either known or suspected. > +These boundaries do not apply to other uses of AI, such as researching > +APIs or algorithms, static analysis, or debugging, provided the model's > +output is not included in contributions. > > -Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > -ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > -generation agents which are built on top of such tools. > +If you wish to send large amounts of AI-generated changes, or any other > +contribution not in the above categories, please get in touch with the > +maintainer beforehand. These can be treated as experiments, at the > +discretion of the maintainer and the community, with no obligation > +to accept them. > > -This policy may evolve as AI tools mature and the legal situation is > -clarified. > +**Use of AI does not remove the need for authors to comply with all > +other requirements for contribution.** In particular, the > +``Signed-off-by`` label in a patch submission is a statement that > +the author takes responsibility for the entire contents of the patch, > +certifying that their patch submission is made in accordance with the > +rules of the `Developer's Certificate of Origin (DCO) <dco>`. > > -Exceptions > -^^^^^^^^^^ > +Commit messages for AI-assisted changes > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > -The QEMU project welcomes discussion on any exceptions to this policy, > -or more general revisions. This can be done by contacting the qemu-devel > -mailing list with details of a proposed tool, model, usage scenario, etc. > -that is beneficial to QEMU, while still mitigating issues around compliance > -with the DCO. After discussion, any exception will be listed below. > +When AI/LLM tools produce or substantively shape your patch, add an > +``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your > +DCO obligations and a guide to reviewers. The text is one or more of > +``code``, ``tests``, ``docs``, ``research``, possibly followed by an > +explanation in parentheses: > > -Exceptions do not remove the need for authors to comply with all other > -requirements for contribution. In particular, the "Signed-off-by" > -label in a patch submission is a statement that the author takes > -responsibility for the entire contents of the patch, including any parts > -that were generated or assisted by AI tools or other tools. > +.. code-block:: none > + > + AI-used-for: tests, docs > + AI-used-for: code > + AI-used-for: code (refactoring) > + AI-used-for: code (prototype) > + AI-used-for: research > + > +``AI-used-for`` should not be included for "background" usage such as > +autocomplete or obtaining a pre-review of the patch. > + > +There is no requirement to include your prompts or summarize the > +conversation in the commit message or cover letter, but you may do so > +if you think it helps a reviewer judge the result. For example: > + > +**Helpful prompts** > + These describe concrete constraints or instructions, making it easy for a > + reviewer to see how the tool's output was guided: > + > + * "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > + function already has a local variable or parameter of type ``struct > + bb``, use it instead of accessing ``aa.bb``" > + > + * "add an implementation of the trait for ``Mutex<T: MyTrait>``; it > + takes the lock around the calls and forwards to ``T``" > + > +**Unhelpful prompts** > + These are too generic to provide meaningful context. You can of course > + use them in the context of a complex interaction with the LLM, but they > + should not be included in the commit message: > + > + * "write user-facing documentation for the new tool" > + > + * "write testcases for the new functions" > + > +QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by`` > +trailers to indicate AI usage. In particular, it is not necessary to I think these are commonly referred to as tags and that's how elsewhere in this docs these appear so that's why I was confused by the term trailers. Otherwise this is now clearer, thanks. Regards, BALATON Zoltan > +specify the exact AI model or tool used to create the commit. > + > +Deterministic tooling (sed, coccinelle, formatters) is out of scope for > +the trailer, but should be mentioned in the commit message. >
Paolo Bonzini <pbonzini@redhat.com> writes: > Until now QEMU's code provenance policy declined any contribution > believed to include or derive from AI-generated content. A blanket ban > was easy to maintain while LLM output was rarely usable on its own, but > as the tools improved an absolute prohibition has become harder to > justify. > <snip> > > -TL;DR: > +.. warning:: > > - **Current QEMU project policy is to DECLINE any contributions which are > - believed to include or derive from AI generated content. This includes > - ChatGPT, Claude, Copilot, Llama and similar tools.** > + Please read the below policy before using AI to contribute code or > + documentation to QEMU. This applies to ChatGPT, Claude, Copilot, > + Llama, and similar tools.** > Stray **, also extra space after QEMU. > - **This policy does not apply to other uses of AI, such as researching APIs > - or algorithms, static analysis, or debugging, provided their output is not > - included in contributions.** > +The increasing prevalence of AI-assisted software development, > +and especially the use of content generated by `Large Language Models > +<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs), > +poses a number of difficult questions. > > -The increasing prevalence of AI-assisted software development results in a > -number of difficult legal questions and risks for software projects, including > -QEMU. Of particular concern is content generated by `Large Language Models > -<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > +Risks to open source projects include maintainer burnout from an > +increased number of contributions, as well as the risk to the project > +from unintentional inclusion of copyrighted material in the LLM's output. > +In order to mitigate these risks, the QEMU project currently allows > +using AI/LLM tools to produce patches in a limited set of scenarios: > > -The QEMU community requires that contributors certify their patch submissions > -are made in accordance with the rules of the `Developer's Certificate of > -Origin (DCO) <dco>`. > +**Mechanical changes** > + If you can use a deterministic tool, it is preferred that you use > it deterministic tool or script,? > + and not replace it with AI. If you don't know how to do the change > + deterministically, you can ask the AI for help. > > -To satisfy the DCO, the patch contributor has to fully understand the > -copyright and license status of content they are contributing to QEMU. With AI > -content generators, the copyright and license status of the output is > -ill-defined with no generally accepted, settled legal foundation. > +**Small bug fixes** > + These should be limited to 20 lines of code or less, not including > + tests. You are still expected to :ref:`understand and explain your changes > + <write_a_meaningful_commit_message>` and the rationale behind them. > > -Where the training material is known, it is common for it to include large > -volumes of material under restrictive licensing/copyright terms. Even where > -the training material is all known to be under open source licenses, it is > -likely to be under a variety of terms, not all of which will be compatible > -with QEMU's licensing requirements. > +**Documentation and code comments** > + While AI can help draft text, it still requires significant human > + oversight. Pay attention to the organization and flow of the generated > + text, and strictly fact-check all technical details as LLMs are prone > + to being confidently wrong. > > -How contributors could comply with DCO terms (b) or (c) for the output of AI > -content generators commonly available today is unclear. The QEMU project is > -not willing or able to accept the legal risks of non-compliance. > +**Tests** > + Note that you must still confirm that each test actually exercises > + the intended behavior including, for regression tests, that it > + fails without the code under test and passes for the right reason. > > -The QEMU project thus requires that contributors refrain from using AI content > -generators on patches intended to be submitted to the project, and will > -decline any contribution if use of AI is either known or suspected. > +These boundaries do not apply to other uses of AI, such as researching > +APIs or algorithms, static analysis, or debugging, provided the model's > +output is not included in contributions. > > -Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > -ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > -generation agents which are built on top of such tools. > +If you wish to send large amounts of AI-generated changes, or any other > +contribution not in the above categories, please get in touch with the > +maintainer beforehand. These can be treated as experiments, at the > +discretion of the maintainer and the community, with no obligation > +to accept them. > > -This policy may evolve as AI tools mature and the legal situation is > -clarified. > +**Use of AI does not remove the need for authors to comply with all > +other requirements for contribution.** In particular, the > +``Signed-off-by`` label in a patch submission is a statement that > +the author takes responsibility for the entire contents of the patch, > +certifying that their patch submission is made in accordance with the > +rules of the `Developer's Certificate of Origin (DCO) <dco>`. > > -Exceptions > -^^^^^^^^^^ > +Commit messages for AI-assisted changes > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > In my v2 I added: AI tools **should not be used to write commit messages**. The act of summarising and explaining the reasoning for the changes is an important demonstration of the human authors understanding of the commit. > -The QEMU project welcomes discussion on any exceptions to this policy, > -or more general revisions. This can be done by contacting the qemu-devel > -mailing list with details of a proposed tool, model, usage scenario, etc. > -that is beneficial to QEMU, while still mitigating issues around compliance > -with the DCO. After discussion, any exception will be listed below. > +When AI/LLM tools produce or substantively shape your patch, add an > +``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your > +DCO obligations and a guide to reviewers. The text is one or more of > +``code``, ``tests``, ``docs``, ``research``, possibly followed by an > +explanation in parentheses: > > -Exceptions do not remove the need for authors to comply with all other > -requirements for contribution. In particular, the "Signed-off-by" > -label in a patch submission is a statement that the author takes > -responsibility for the entire contents of the patch, including any parts > -that were generated or assisted by AI tools or other tools. > +.. code-block:: none > + > + AI-used-for: tests, docs > + AI-used-for: code > + AI-used-for: code (refactoring) > + AI-used-for: code (prototype) > + AI-used-for: research > + > +``AI-used-for`` should not be included for "background" usage such as > +autocomplete or obtaining a pre-review of the patch. > + > +There is no requirement to include your prompts or summarize the > +conversation in the commit message or cover letter, but you may do so > +if you think it helps a reviewer judge the result. For example: > + > +**Helpful prompts** > + These describe concrete constraints or instructions, making it easy for a > + reviewer to see how the tool's output was guided: > + > + * "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > + function already has a local variable or parameter of type ``struct > + bb``, use it instead of accessing ``aa.bb``" > + > + * "add an implementation of the trait for ``Mutex<T: MyTrait>``; it > + takes the lock around the calls and forwards to ``T``" > + > +**Unhelpful prompts** > + These are too generic to provide meaningful context. You can of course > + use them in the context of a complex interaction with the LLM, but they > + should not be included in the commit message: > + > + * "write user-facing documentation for the new tool" > + > + * "write testcases for the new functions" > + > +QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by`` > +trailers to indicate AI usage. In particular, it is not necessary to > +specify the exact AI model or tool used to create the commit. > + > +Deterministic tooling (sed, coccinelle, formatters) is out of scope for > +the trailer, but should be mentioned in the commit message. The other changes in my v2 where just different wordings for the same concept. With those have a: Reviewed-by: Alex Bennée <alex.bennee@linaro.org> -- Alex Bennée Virtualisation Tech Lead @ Linaro
On 5/29/26 13:52, Alex Bennée wrote: >> - **Current QEMU project policy is to DECLINE any contributions which are >> - believed to include or derive from AI generated content. This includes >> - ChatGPT, Claude, Copilot, Llama and similar tools.** >> + Please read the below policy before using AI to contribute code or >> + documentation to QEMU. This applies to ChatGPT, Claude, Copilot, >> + Llama, and similar tools.** > > Stray **, also extra space after QEMU. Will fix the stars (extra space is intentional, though it shows my age. I still find that it reads better on monospace fonts to have two spaces at the end of the sentence). >> +**Mechanical changes** >> + If you can use a deterministic tool, it is preferred that you use > > deterministic tool or script,? Sure. > In my v2 I added: > > AI tools **should not be used to write commit messages**. The act of > summarising and explaining the reasoning for the changes is an > important demonstration of the human authors understanding of the > commit. While I didn't include this, v2 links to the "how to write a commit message" paragraph elsewhere in the documentation. I don't want it to look like people can't even ask for copy-editing of commit messages. Paolo
On Fri, May 29, 2026 at 03:06:54PM +0200, Paolo Bonzini wrote: > On 5/29/26 13:52, Alex Bennée wrote: > > > - **Current QEMU project policy is to DECLINE any contributions which are > > > - believed to include or derive from AI generated content. This includes > > > - ChatGPT, Claude, Copilot, Llama and similar tools.** > > > + Please read the below policy before using AI to contribute code or > > > + documentation to QEMU. This applies to ChatGPT, Claude, Copilot, > > > + Llama, and similar tools.** > > > > Stray **, also extra space after QEMU. > > Will fix the stars (extra space is intentional, though it shows my age. I > still find that it reads better on monospace fonts to have two spaces at the > end of the sentence). > > > > +**Mechanical changes** > > > + If you can use a deterministic tool, it is preferred that you use > > > > deterministic tool or script,? > > Sure. > > > In my v2 I added: > > > > AI tools **should not be used to write commit messages**. The act of > > summarising and explaining the reasoning for the changes is an > > important demonstration of the human authors understanding of the > > commit. > > While I didn't include this, v2 links to the "how to write a commit message" > paragraph elsewhere in the documentation. I don't want it to look like > people can't even ask for copy-editing of commit messages. > Paolo And maybe "It is ok to ask an AI tool to correct grammar and spelling in your text, as long as you are not asking it to write it".
© 2016 - 2026 Red Hat, Inc.