KVM Dirty-bit cleaning accelerator (HACDBS)

[PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Leonardo Bras 1 month, 2 weeks ago

This patchset intends to create an arm64-specific dirty-bit cleaning
accelerator based on HACDBS. To do so, it's needed to add a few
snippets in arch-generic kvm code, and to do so properly, it makes
them compile-out if the arch does not implement them.

Patch 1 & 2 are here just to make this testable, as this patchset
depends on bits from HDBSS that are not upstream yet. 

Patch 1 should be included in the HDBSS patchset, and patch 2
is a bunch of bits that I collected across other patches so this can
work. So few free to ignore them.

To be able to properly use HACDBS, it requires a PPI IRQ that triggers
either on error, or when processing is complete. It's called
HACDBSIRQ, and there is currently no upstream way of announcing it on
ACPI tables, so it uses the suggested table/index in [1].

It is able to accelerate the cleaning on both dirty-bitmap and
dirty-ring tracking mechanisms on KVM, and that require different
functions to be made accessible outside KVM code.

On top of finding issues, there are a few questions I would like help with:
a - Is the distribution between patches ok? should I merge or split
    any of them?
b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file,
    and I like the idea of maintaining this. Is there any rule or
    common sense on this? Should I add this entry, or should I leave it
    in the arch/arm64/kvm/ general rule?
c - There are some trace_prink() I have left in the code, as they could
    be helpful to check when HACDBS is not performing as well as it
    should. Should I introduce a tracepoint instead? or just ignore it?
    (it's triggered on HACDBS error, but as it falls back to software in
    that case, it should not impact correctness, only performance).
d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
    should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
    it's cleaned and reused. Should I let users configure that over a
    parameter, or is it overthinking?

Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
for dirty-bit tracking[2], on HW HACDBS enabled or disabled.

Please let me know of any question :)

Thanks for reviewing!
Leo


[1]: https://github.com/tianocore/edk2/issues/12409
[2]: dirty_log_test && dirty_log_perf_test

Leonardo Bras (12):
  KVM: arm64: Enable eager hugepage splitting if HDBSS is available
  KVM: arm64: HDBSS bits
  arm64/cpufeature: Add system-wide FEAT_HACDBS detection
  arm64/sysreg: Add HACDBS consumer and base registers
  KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ
  KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine
  kvm: Add arch-generic interface for hw-accelerated dirty-bitmap
    cleaning
  KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
  kvm/dirty_ring: Introduce get_memslot and move helpers to header
  kvm/dirty_ring: Add arch-generic interface for hw-accelerated
    dirty-ring cleaning
  KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine
  KVM: arm64: Enable KVM_HW_DIRTY_BIT

 MAINTAINERS                            |   7 +
 arch/arm64/include/asm/acpi.h          |   3 +
 arch/arm64/include/asm/cpufeature.h    |  10 +
 arch/arm64/include/asm/kvm_dirty_bit.h |  67 ++++
 arch/arm64/include/asm/kvm_pgtable.h   |   3 +
 include/acpi/actbl2.h                  |   1 +
 include/linux/kvm_dirty_bit.h          |  34 ++
 include/linux/kvm_dirty_ring.h         |  12 +
 include/linux/kvm_host.h               |   3 +
 arch/arm64/kernel/cpufeature.c         |  20 ++
 arch/arm64/kvm/arm.c                   |   5 +
 arch/arm64/kvm/dirty_bit.c             | 418 +++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c           |  15 +-
 arch/arm64/kvm/mmu.c                   |  12 +-
 virt/kvm/dirty_ring.c                  |  34 +-
 virt/kvm/kvm_main.c                    |  13 +-
 arch/arm64/kvm/Kconfig                 |   1 +
 arch/arm64/kvm/Makefile                |   2 +-
 arch/arm64/tools/cpucaps               |   2 +
 arch/arm64/tools/sysreg                |  30 ++
 virt/kvm/Kconfig                       |   3 +
 21 files changed, 673 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_dirty_bit.h
 create mode 100644 include/linux/kvm_dirty_bit.h
 create mode 100644 arch/arm64/kvm/dirty_bit.c

-- 
2.54.0

Re: [PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Marc Zyngier 1 month, 2 weeks ago

On Thu, 30 Apr 2026 12:14:04 +0100,
Leonardo Bras <leo.bras@arm.com> wrote:

[...]

I haven't had a chance to look at any of this yet, but just on these
points:

> b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file,
>     and I like the idea of maintaining this. Is there any rule or
>     common sense on this? Should I add this entry, or should I leave it
>     in the arch/arm64/kvm/ general rule?

No specific entry in MAINTAINERS required (or wanted). This falls into
the normal KVM/arm64 maintenance. And don't worry, we know where to
find you when it will come to fixing this stuff.

> c - There are some trace_prink() I have left in the code, as they could
>     be helpful to check when HACDBS is not performing as well as it
>     should. Should I introduce a tracepoint instead? or just ignore it?
>     (it's triggered on HACDBS error, but as it falls back to software in
>     that case, it should not impact correctness, only performance).

Debug infrastructure should be preferably *removed* altogether.
trace_printk() is definitely a big no-no.

> d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
>     should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
>     it's cleaned and reused. Should I let users configure that over a
>     parameter, or is it overthinking?

How long is a piece of string? We can't know that. A single page feels
very small in the 4kB case, and letting userspace define the size of
that buffer seems a likely requirement.

> 
> Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
> for dirty-bit tracking[2], on HW HACDBS enabled or disabled.

I have absolutely no trust in these tests.

Have you enabled a VMM to make use of these APIs, and actively
migrated running guests? That's the level of testing I'd like to see,
as the selftests are not what people run in production...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Leonardo Bras 1 month, 2 weeks ago

On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote:
> On Thu, 30 Apr 2026 12:14:04 +0100,
> Leonardo Bras <leo.bras@arm.com> wrote:
> 
> [...]
> 
> I haven't had a chance to look at any of this yet, but just on these
> points:
> 
> > b - checkpatch.pl keeps bothering me to add an entry in MAINTAINERS file,
> >     and I like the idea of maintaining this. Is there any rule or
> >     common sense on this? Should I add this entry, or should I leave it
> >     in the arch/arm64/kvm/ general rule?
> 
> No specific entry in MAINTAINERS required (or wanted). This falls into
> the normal KVM/arm64 maintenance. And don't worry, we know where to
> find you when it will come to fixing this stuff.
> 

Got it

> > c - There are some trace_prink() I have left in the code, as they could
> >     be helpful to check when HACDBS is not performing as well as it
> >     should. Should I introduce a tracepoint instead? or just ignore it?
> >     (it's triggered on HACDBS error, but as it falls back to software in
> >     that case, it should not impact correctness, only performance).
> 
> Debug infrastructure should be preferably *removed* altogether.
> trace_printk() is definitely a big no-no.
> 

Will remove it then

> > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
> >     should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
> >     it's cleaned and reused. Should I let users configure that over a
> >     parameter, or is it overthinking?
> 
> How long is a piece of string? We can't know that. A single page feels
> very small in the 4kB case, and letting userspace define the size of
> that buffer seems a likely requirement.
> 

Ok, as a KVM parameter, or as a compile-time option?

> > 
> > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
> > for dirty-bit tracking[2], on HW HACDBS enabled or disabled.
> 
> I have absolutely no trust in these tests.
> 
> Have you enabled a VMM to make use of these APIs, and actively
> migrated running guests? That's the level of testing I'd like to see,
> as the selftests are not what people run in production...
> 

There is no enablement needed on VMM side.
Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it 
on the same host. (Inside a model)

That was the first test I used, but then I found out that kvm selftests 
stress up multiple scenarios in an easier way.

Do you prefer me to test on any specific scenario, or does whatever qemu 
uses as a default parameter work well enough?

Thanks!
Leo

Re: [PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Marc Zyngier 1 month, 2 weeks ago

On Thu, 30 Apr 2026 14:29:37 +0100,
Leonardo Bras <leo.bras@arm.com> wrote:
> 
> On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote:
> > On Thu, 30 Apr 2026 12:14:04 +0100,
> > Leonardo Bras <leo.bras@arm.com> wrote:
> > 
> > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
> > >     should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
> > >     it's cleaned and reused. Should I let users configure that over a
> > >     parameter, or is it overthinking?
> > 
> > How long is a piece of string? We can't know that. A single page feels
> > very small in the 4kB case, and letting userspace define the size of
> > that buffer seems a likely requirement.
> > 
> 
> Ok, as a KVM parameter, or as a compile-time option?

Noticed the "userspace" word in there? It *has* to be controlled by
userspace one way or another. So not as a kernel parameter, and
*never* as a compile option.

> > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
> > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled.
> > 
> > I have absolutely no trust in these tests.
> > 
> > Have you enabled a VMM to make use of these APIs, and actively
> > migrated running guests? That's the level of testing I'd like to see,
> > as the selftests are not what people run in production...
> > 
> 
> There is no enablement needed on VMM side.
> Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it 
> on the same host. (Inside a model)
> 
> That was the first test I used, but then I found out that kvm selftests 
> stress up multiple scenarios in an easier way.

Except when they don't. In my experience, the selftests are only there
to give the CI people the fuzzy feeling that they are doing something
useful. I have a collection of examples indicating that what these
things test is not representative of the bugs we have in KVM.

> Do you prefer me to test on any specific scenario, or does whatever qemu
> uses as a default parameter work well enough?

I want to hear about testing at a scale that make sense for production
VMs, including live migrating between hosts while under memory
pressure (swapping out).

I'm also interested in efficiency: how much better is HACDBS compared
to the current page faulting? Just having patches for a feature is not
enough to decide adoption of that feature. Show me the benefits in a
quantitative way (within the limits of the model, of course).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Mark Brown 1 month, 2 weeks ago

On Thu, Apr 30, 2026 at 03:51:20PM +0100, Marc Zyngier wrote:
> Leonardo Bras <leo.bras@arm.com> wrote:

> > There is no enablement needed on VMM side.
> > Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it 
> > on the same host. (Inside a model)

> > That was the first test I used, but then I found out that kvm selftests 
> > stress up multiple scenarios in an easier way.

> Except when they don't. In my experience, the selftests are only there
> to give the CI people the fuzzy feeling that they are doing something
> useful. I have a collection of examples indicating that what these
> things test is not representative of the bugs we have in KVM.

There's a bit of a circular thing there - if things are working well
then the selftests really shouldn't be representitive of the problems
that turn up since people ought to have seen any issues they show during
development.  If you could share your list that'd be helpful.

TBH the various stress style tests like the dirty bit ones in the KVM
selftests are actually a bit of a pain for CI IME.  Since they all
control their runtimes by iteration counts rather than time there's
scaling issues, the performance is such that on lower end systems
they're often on the edge of generating timeouts leading to unstable
results.  They also interact so poorly with the fast models that the run
time blows up spectacularly which is it's own problem trying to run
them, that's partly on the model side of course.  These performance
and/or stress tests really aren't idiomatic for kselftest so there's
always going to be some tension with trying to run them in that
framework.

Re: [PATCH v1 00/12] KVM Dirty-bit cleaning accelerator (HACDBS)

Posted by Leonardo Bras 1 month, 2 weeks ago

On Thu, Apr 30, 2026 at 03:51:20PM +0100, Marc Zyngier wrote:
> On Thu, 30 Apr 2026 14:29:37 +0100,
> Leonardo Bras <leo.bras@arm.com> wrote:
> > 
> > On Thu, Apr 30, 2026 at 02:14:22PM +0100, Marc Zyngier wrote:
> > > On Thu, 30 Apr 2026 12:14:04 +0100,
> > > Leonardo Bras <leo.bras@arm.com> wrote:
> > > 
> > > > d - In __kvm_arch_dirty_log_clear() there is no way to predict how long
> > > >     should be the buffer, so I used 1x PAGE_SIZE, and when it gets full
> > > >     it's cleaned and reused. Should I let users configure that over a
> > > >     parameter, or is it overthinking?
> > > 
> > > How long is a piece of string? We can't know that. A single page feels
> > > very small in the 4kB case, and letting userspace define the size of
> > > that buffer seems a likely requirement.
> > > 
> > 
> > Ok, as a KVM parameter, or as a compile-time option?
> 
> Noticed the "userspace" word in there? It *has* to be controlled by
> userspace one way or another. So not as a kernel parameter, and
> *never* as a compile option.

Okay, I would suggest that a module parameter could be set by userspace, 
but I remember now that it is usually built in the kernel instead. Also, it 
could be bad having this set for the whole system, instead of per-VM.

How do you suggest letting userspace control that?
(All I could think was using an ioctl / API of any sorts, which would 
require changing the VMMs as well.)

> 
> > > > Kernel v7.0.0 + this patchset builds properly, passing both kvm selftests
> > > > for dirty-bit tracking[2], on HW HACDBS enabled or disabled.
> > > 
> > > I have absolutely no trust in these tests.
> > > 
> > > Have you enabled a VMM to make use of these APIs, and actively
> > > migrated running guests? That's the level of testing I'd like to see,
> > > as the selftests are not what people run in production...
> > > 
> > 
> > There is no enablement needed on VMM side.
> > Yes, I have created a VM on upstream qemu with --enable-kvm and migrated it 
> > on the same host. (Inside a model)
> > 
> > That was the first test I used, but then I found out that kvm selftests 
> > stress up multiple scenarios in an easier way.
> 
> Except when they don't. In my experience, the selftests are only there
> to give the CI people the fuzzy feeling that they are doing something
> useful.

LOL

> I have a collection of examples indicating that what these
> things test is not representative of the bugs we have in KVM.
> 

Fair enough... it was tested in qemu live migration, and it works properly 
(migrated from 2 instances of qemu in the same host, emulated by model).

> > Do you prefer me to test on any specific scenario, or does whatever qemu
> > uses as a default parameter work well enough?
> 
> I want to hear about testing at a scale that make sense for production
> VMs, including live migrating between hosts while under memory
> pressure (swapping out).

I agree that's a more interesting test.

> 
> I'm also interested in efficiency: how much better is HACDBS compared
> to the current page faulting? 

The terms are indeed confusing, but HACDBS is just the cleaning accelerator 
for dirty-bit. It means it will only affect how long it takes to transverse 
the page table making pages in the array writable-dirty -> writable-clean.

That being said, it regards to efficiency:
Well, as I only have the model to test that, I am limmited to those 
results, which may not reflect reality.

As an example, on dirty_log_perf_test, the cleaning process took much 
longer (8x) compared to software cleaning, even when faced with no error, 
and entries that fit the array (4k page above). If it took that long even 
in this ideal scenario, it means the HACDBS mechanism implemented in the 
model takes much longer than software, which is counter-intuitive.

> Just having patches for a feature is not
> enough to decide adoption of that feature. Show me the benefits in a
> quantitative way (within the limits of the model, of course).

Sure, I will try measuring migration between 2 instances of the model, and 
see how qemu live migration time is affected, then post the results in this 
thread for us to compare.

Thanks!
Leo