MAINTAINERS | 7 + arch/arm64/include/asm/acpi.h | 3 + arch/arm64/include/asm/cpufeature.h | 10 + arch/arm64/include/asm/kvm_dirty_bit.h | 67 ++++ arch/arm64/include/asm/kvm_pgtable.h | 3 + include/acpi/actbl2.h | 1 + include/linux/kvm_dirty_bit.h | 34 ++ include/linux/kvm_dirty_ring.h | 12 + include/linux/kvm_host.h | 3 + arch/arm64/kernel/cpufeature.c | 20 ++ arch/arm64/kvm/arm.c | 5 + arch/arm64/kvm/dirty_bit.c | 418 +++++++++++++++++++++++++ arch/arm64/kvm/hyp/pgtable.c | 15 +- arch/arm64/kvm/mmu.c | 12 +- virt/kvm/dirty_ring.c | 34 +- virt/kvm/kvm_main.c | 13 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/tools/cpucaps | 2 + arch/arm64/tools/sysreg | 30 ++ virt/kvm/Kconfig | 3 + 21 files changed, 673 insertions(+), 22 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h create mode 100644 include/linux/kvm_dirty_bit.h create mode 100644 arch/arm64/kvm/dirty_bit.c
This patchset intends to create an arm64-specific dirty-bit cleaning
accelerator based on HACDBS. To do so, it's needed to add a few
snippets in arch-generic kvm code, and to do so properly, it makes
them compile-out if the arch does not implement them.
Patch 1 & 2 are here just to make this testable, as this patchset
depends on bits from HDBSS that are not upstream yet.
Patch 1 should be included in the HDBSS patchset, and patch 2
is a bunch of bits that I collected across other patches so this can
work. So few free to ignore them.
To be able to properly use HACDBS, it requires a PPI IRQ that triggers
either on error, or when processing is complete. It's called
HACDBSIRQ, and there is currently no upstream way of announcing it on
ACPI tables, so it uses the suggested table/index in [1].
It is able to accelerate the cleaning on both dirty-bitmap and
dirty-ring tracking mechanisms on KVM, and that require different
functions to be made accessible outside KVM code.
On top of finding issues, there are a few questions I would like help with:
a - Is the distribution between patches ok? should I merge or split
any of them?
b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file,
and I like the idea of maintaining this. Is there any rule or
common sense on this? Should I add this entry, or should I leave it
in the arch/arm64/kvm/ general rule?
c - There are some trace_prink() I have left in the code, as they could
be helpful to check when HACDBS is not performing as well as it
should. Should I introduce a tracepoint instead? or just ignore it?
(it's triggered on HACDBS error, but as it falls back to software in
that case, it should not impact correctness, only performance).
d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
it's cleaned and reused. Should I let users configure that over a
parameter, or is it overthinking?
Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
for dirty-bit tracking[2], on HW HACDBS enabled or disabled.
Please let me know of any question :)
Thanks for reviewing!
Leo
[1]: https://github.com/tianocore/edk2/issues/12409
[2]: dirty_log_test && dirty_log_perf_test
Leonardo Bras (12):
KVM: arm64: Enable eager hugepage splitting if HDBSS is available
KVM: arm64: HDBSS bits
arm64/cpufeature: Add system-wide FEAT_HACDBS detection
arm64/sysreg: Add HACDBS consumer and base registers
KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ
KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine
kvm: Add arch-generic interface for hw-accelerated dirty-bitmap
cleaning
KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
kvm/dirty_ring: Introduce get_memslot and move helpers to header
kvm/dirty_ring: Add arch-generic interface for hw-accelerated
dirty-ring cleaning
KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine
KVM: arm64: Enable KVM_HW_DIRTY_BIT
MAINTAINERS | 7 +
arch/arm64/include/asm/acpi.h | 3 +
arch/arm64/include/asm/cpufeature.h | 10 +
arch/arm64/include/asm/kvm_dirty_bit.h | 67 ++++
arch/arm64/include/asm/kvm_pgtable.h | 3 +
include/acpi/actbl2.h | 1 +
include/linux/kvm_dirty_bit.h | 34 ++
include/linux/kvm_dirty_ring.h | 12 +
include/linux/kvm_host.h | 3 +
arch/arm64/kernel/cpufeature.c | 20 ++
arch/arm64/kvm/arm.c | 5 +
arch/arm64/kvm/dirty_bit.c | 418 +++++++++++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 15 +-
arch/arm64/kvm/mmu.c | 12 +-
virt/kvm/dirty_ring.c | 34 +-
virt/kvm/kvm_main.c | 13 +-
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/tools/cpucaps | 2 +
arch/arm64/tools/sysreg | 30 ++
virt/kvm/Kconfig | 3 +
21 files changed, 673 insertions(+), 22 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h
create mode 100644 include/linux/kvm_dirty_bit.h
create mode 100644 arch/arm64/kvm/dirty_bit.c
--
2.54.0
On Thu, 30 Apr 2026 12:14:04 +0100, Leonardo Bras <leo.bras@arm.com> wrote: [...] I haven't had a chance to look at any of this yet, but just on these points: > b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file, > and I like the idea of maintaining this. Is there any rule or > common sense on this? Should I add this entry, or should I leave it > in the arch/arm64/kvm/ general rule? No specific entry in MAINTAINERS required (or wanted). This falls into the normal KVM/arm64 maintenance. And don't worry, we know where to find you when it will come to fixing this stuff. > c - There are some trace_prink() I have left in the code, as they could > be helpful to check when HACDBS is not performing as well as it > should. Should I introduce a tracepoint instead? or just ignore it? > (it's triggered on HACDBS error, but as it falls back to software in > that case, it should not impact correctness, only performance). Debug infrastructure should be preferably *removed* altogether. trace_printk() is definitely a big no-no. > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long > should be the buffer, so I used 1x PAGE_SIZE, and when it gets full > it's cleaned and reused. Should I let users configure that over a > parameter, or is it overthinking? How long is a piece of string? We can't know that. A single page feels very small in the 4kB case, and letting userspace define the size of that buffer seems a likely requirement. > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests > for dirty-bit tracking[2], on HW HACDBS enabled or disabled. I have absolutely no trust in these tests. Have you enabled a VMM to make use of these APIs, and actively migrated running guests? That's the level of testing I'd like to see, as the selftests are not what people run in production... Thanks, M. -- Without deviation from the norm, progress is not possible.
On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote: > On Thu, 30 Apr 2026 12:14:04 +0100, > Leonardo Bras <leo.bras@arm.com> wrote: > > [...] > > I haven't had a chance to look at any of this yet, but just on these > points: > > > b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file, > > and I like the idea of maintaining this. Is there any rule or > > common sense on this? Should I add this entry, or should I leave it > > in the arch/arm64/kvm/ general rule? > > No specific entry in MAINTAINERS required (or wanted). This falls into > the normal KVM/arm64 maintenance. And don't worry, we know where to > find you when it will come to fixing this stuff. > Got it > > c - There are some trace_prink() I have left in the code, as they could > > be helpful to check when HACDBS is not performing as well as it > > should. Should I introduce a tracepoint instead? or just ignore it? > > (it's triggered on HACDBS error, but as it falls back to software in > > that case, it should not impact correctness, only performance). > > Debug infrastructure should be preferably *removed* altogether. > trace_printk() is definitely a big no-no. > Will remove it then > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long > > should be the buffer, so I used 1x PAGE_SIZE, and when it gets full > > it's cleaned and reused. Should I let users configure that over a > > parameter, or is it overthinking? > > How long is a piece of string? We can't know that. A single page feels > very small in the 4kB case, and letting userspace define the size of > that buffer seems a likely requirement. > Ok, as a KVM parameter, or as a compile-time option? > > > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled. > > I have absolutely no trust in these tests. > > Have you enabled a VMM to make use of these APIs, and actively > migrated running guests? That's the level of testing I'd like to see, > as the selftests are not what people run in production... > There is no enablement needed on VMM side. Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it on the same host. (Inside a model) That was the first test I used, but then I found out that kvm selftests stress up multiple scenarios in an easier way. Do you prefer me to test on any specific scenario, or does whatever qemu uses as a default parameter work well enough? Thanks! Leo
On Thu, 30 Apr 2026 14:29:37 +0100, Leonardo Bras <leo.bras@arm.com> wrote: > > On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote: > > On Thu, 30 Apr 2026 12:14:04 +0100, > > Leonardo Bras <leo.bras@arm.com> wrote: > > > > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long > > > should be the buffer, so I used 1x PAGE_SIZE, and when it gets full > > > it's cleaned and reused. Should I let users configure that over a > > > parameter, or is it overthinking? > > > > How long is a piece of string? We can't know that. A single page feels > > very small in the 4kB case, and letting userspace define the size of > > that buffer seems a likely requirement. > > > > Ok, as a KVM parameter, or as a compile-time option? Noticed the "userspace" word in there? It *has* to be controlled by userspace one way or another. So not as a kernel parameter, and *never* as a compile option. > > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests > > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled. > > > > I have absolutely no trust in these tests. > > > > Have you enabled a VMM to make use of these APIs, and actively > > migrated running guests? That's the level of testing I'd like to see, > > as the selftests are not what people run in production... > > > > There is no enablement needed on VMM side. > Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it > on the same host. (Inside a model) > > That was the first test I used, but then I found out that kvm selftests > stress up multiple scenarios in an easier way. Except when they don't. In my experience, the selftests are only there to give the CI people the fuzzy feeling that they are doing something useful. I have a collection of examples indicating that what these things test is not representative of the bugs we have in KVM. > Do you prefer me to test on any specific scenario, or does whatever qemu > uses as a default parameter work well enough? I want to hear about testing at a scale that make sense for production VMs, including live migrating between hosts while under memory pressure (swapping out). I'm also interested in efficiency: how much better is HACDBS compared to the current page faulting? Just having patches for a feature is not enough to decide adoption of that feature. Show me the benefits in a quantitative way (within the limits of the model, of course). Thanks, M. -- Without deviation from the norm, progress is not possible.
On Thu, Apr 30, 2026 at 03:51:20PM +0100, Marc Zyngier wrote: > Leonardo Bras <leo.bras@arm.com> wrote: > > There is no enablement needed on VMM side. > > Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it > > on the same host. (Inside a model) > > That was the first test I used, but then I found out that kvm selftests > > stress up multiple scenarios in an easier way. > Except when they don't. In my experience, the selftests are only there > to give the CI people the fuzzy feeling that they are doing something > useful. I have a collection of examples indicating that what these > things test is not representative of the bugs we have in KVM. There's a bit of a circular thing there - if things are working well then the selftests really shouldn't be representitive of the problems that turn up since people ought to have seen any issues they show during development. If you could share your list that'd be helpful. TBH the various stress style tests like the dirty bit ones in the KVM selftests are actually a bit of a pain for CI IME. Since they all control their runtimes by iteration counts rather than time there's scaling issues, the performance is such that on lower end systems they're often on the edge of generating timeouts leading to unstable results. They also interact so poorly with the fast models that the run time blows up spectacularly which is it's own problem trying to run them, that's partly on the model side of course. These performance and/or stress tests really aren't idiomatic for kselftest so there's always going to be some tension with trying to run them in that framework.
On Thu, Apr 30, 2026 at 03:51:20PM +0100, Marc Zyngier wrote: > On Thu, 30 Apr 2026 14:29:37 +0100, > Leonardo Bras <leo.bras@arm.com> wrote: > > > > On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote: > > > On Thu, 30 Apr 2026 12:14:04 +0100, > > > Leonardo Bras <leo.bras@arm.com> wrote: > > > > > > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long > > > > should be the buffer, so I used 1x PAGE_SIZE, and when it gets full > > > > it's cleaned and reused. Should I let users configure that over a > > > > parameter, or is it overthinking? > > > > > > How long is a piece of string? We can't know that. A single page feels > > > very small in the 4kB case, and letting userspace define the size of > > > that buffer seems a likely requirement. > > > > > > > Ok, as a KVM parameter, or as a compile-time option? > > Noticed the "userspace" word in there? It *has* to be controlled by > userspace one way or another. So not as a kernel parameter, and > *never* as a compile option. Okay, I would suggest that a module parameter could be set by userspace, but I remember now that it is usually built in the kernel instead. Also, it could be bad having this set for the whole system, instead of per-VM. How do you suggest letting userspace control that? (All I could think was using an ioctl / API of any sorts, which would require changing the VMMs as well.) > > > > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests > > > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled. > > > > > > I have absolutely no trust in these tests. > > > > > > Have you enabled a VMM to make use of these APIs, and actively > > > migrated running guests? That's the level of testing I'd like to see, > > > as the selftests are not what people run in production... > > > > > > > There is no enablement needed on VMM side. > > Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it > > on the same host. (Inside a model) > > > > That was the first test I used, but then I found out that kvm selftests > > stress up multiple scenarios in an easier way. > > Except when they don't. In my experience, the selftests are only there > to give the CI people the fuzzy feeling that they are doing something > useful. LOL > I have a collection of examples indicating that what these > things test is not representative of the bugs we have in KVM. > Fair enough... it was tested in qemu live migration, and it works properly (migrated from 2 instances of qemu in the same host, emulated by model). > > Do you prefer me to test on any specific scenario, or does whatever qemu > > uses as a default parameter work well enough? > > I want to hear about testing at a scale that make sense for production > VMs, including live migrating between hosts while under memory > pressure (swapping out). I agree that's a more interesting test. > > I'm also interested in efficiency: how much better is HACDBS compared > to the current page faulting? The terms are indeed confusing, but HACDBS is just the cleaning accelerator for dirty-bit. It means it will only affect how long it takes to transverse the page table making pages in the array writable-dirty -> writable-clean. That being said, it regards to efficiency: Well, as I only have the model to test that, I am limmited to those results, which may not reflect reality. As an example, on dirty_log_perf_test, the cleaning process took much longer (8x) compared to software cleaning, even when faced with no error, and entries that fit the array (4k page above). If it took that long even in this ideal scenario, it means the HACDBS mechanism implemented in the model takes much longer than software, which is counter-intuitive. > Just having patches for a feature is not > enough to decide adoption of that feature. Show me the benefits in a > quantitative way (within the limits of the model, of course). Sure, I will try measuring migration between 2 instances of the model, and see how qemu live migration time is affected, then post the results in this thread for us to compare. Thanks! Leo
© 2016 - 2026 Red Hat, Inc.