[PATCH v2 0/4] qemu: Add support for free-page-reporting

Nico Pache posted 4 patches 3 years, 6 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/cover.1602545532.git.npache@redhat.com
docs/formatdomain.rst                         |  6 ++++
docs/schemas/domaincommon.rng                 |  5 +++
src/conf/domain_conf.c                        | 21 +++++++++++
src/conf/domain_conf.h                        |  1 +
src/qemu/qemu_capabilities.c                  |  2 ++
src/qemu/qemu_capabilities.h                  |  1 +
src/qemu/qemu_command.c                       |  5 +++
src/qemu/qemu_validate.c                      |  7 ++++
.../caps_5.1.0.x86_64.xml                     |  1 +
.../caps_5.2.0.x86_64.xml                     |  1 +
...-options-memballoon-freepage-reporting.err |  1 +
...loon-freepage-reporting.x86_64-latest.args | 36 +++++++++++++++++++
...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
tests/qemuxml2argvtest.c                      |  2 ++
...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
15 files changed, 133 insertions(+)
create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml
[PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Nico Pache 3 years, 6 months ago
gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79

The virtio-balloon device now has the ability to report free pages
back to the hypervisor for reuse by other programs.

This kernel feature allows for more stable and resource friendly systems.

This feature is available in QEMU and is enabled with free-page-reporting=on
default virt setting should be off but the user should be able to enable it.

Nico Pache (4):
  Document and parser support for the Virtio free page reporting
    feature.
  QEMU: declare qemu capabilities for the Virtio Free page reporting
    feature
  QEMU: introduce Virtio free page reporting feature
  provide testing for free-page-reporting feature in QEMU

 docs/formatdomain.rst                         |  6 ++++
 docs/schemas/domaincommon.rng                 |  5 +++
 src/conf/domain_conf.c                        | 21 +++++++++++
 src/conf/domain_conf.h                        |  1 +
 src/qemu/qemu_capabilities.c                  |  2 ++
 src/qemu/qemu_capabilities.h                  |  1 +
 src/qemu/qemu_command.c                       |  5 +++
 src/qemu/qemu_validate.c                      |  7 ++++
 .../caps_5.1.0.x86_64.xml                     |  1 +
 .../caps_5.2.0.x86_64.xml                     |  1 +
 ...-options-memballoon-freepage-reporting.err |  1 +
 ...loon-freepage-reporting.x86_64-latest.args | 36 +++++++++++++++++++
 ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
 tests/qemuxml2argvtest.c                      |  2 ++
 ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
 15 files changed, 133 insertions(+)
 create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
 create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
 create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
 create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml

-- 
2.18.4

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Daniel Henrique Barboza 3 years, 6 months ago

On 10/12/20 8:35 PM, Nico Pache wrote:
> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> 
> The virtio-balloon device now has the ability to report free pages
> back to the hypervisor for reuse by other programs.
> 
> This kernel feature allows for more stable and resource friendly systems.
> 
> This feature is available in QEMU and is enabled with free-page-reporting=on
> default virt setting should be off but the user should be able to enable it.
> 
> Nico Pache (4):
>    Document and parser support for the Virtio free page reporting
>      feature.
>    QEMU: declare qemu capabilities for the Virtio Free page reporting
>      feature
>    QEMU: introduce Virtio free page reporting feature
>    provide testing for free-page-reporting feature in QEMU
> 

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>


>   docs/formatdomain.rst                         |  6 ++++
>   docs/schemas/domaincommon.rng                 |  5 +++
>   src/conf/domain_conf.c                        | 21 +++++++++++
>   src/conf/domain_conf.h                        |  1 +
>   src/qemu/qemu_capabilities.c                  |  2 ++
>   src/qemu/qemu_capabilities.h                  |  1 +
>   src/qemu/qemu_command.c                       |  5 +++
>   src/qemu/qemu_validate.c                      |  7 ++++
>   .../caps_5.1.0.x86_64.xml                     |  1 +
>   .../caps_5.2.0.x86_64.xml                     |  1 +
>   ...-options-memballoon-freepage-reporting.err |  1 +
>   ...loon-freepage-reporting.x86_64-latest.args | 36 +++++++++++++++++++
>   ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
>   tests/qemuxml2argvtest.c                      |  2 ++
>   ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
>   15 files changed, 133 insertions(+)
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
>   create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml
> 

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/13/20 1:35 AM, Nico Pache wrote:
> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> 
> The virtio-balloon device now has the ability to report free pages
> back to the hypervisor for reuse by other programs.
> 
> This kernel feature allows for more stable and resource friendly systems.
> 
> This feature is available in QEMU and is enabled with free-page-reporting=on
> default virt setting should be off but the user should be able to enable it.
> 
> Nico Pache (4):
>    Document and parser support for the Virtio free page reporting
>      feature.
>    QEMU: declare qemu capabilities for the Virtio Free page reporting
>      feature
>    QEMU: introduce Virtio free page reporting feature
>    provide testing for free-page-reporting feature in QEMU
> 
>   docs/formatdomain.rst                         |  6 ++++
>   docs/schemas/domaincommon.rng                 |  5 +++
>   src/conf/domain_conf.c                        | 21 +++++++++++
>   src/conf/domain_conf.h                        |  1 +
>   src/qemu/qemu_capabilities.c                  |  2 ++
>   src/qemu/qemu_capabilities.h                  |  1 +
>   src/qemu/qemu_command.c                       |  5 +++
>   src/qemu/qemu_validate.c                      |  7 ++++
>   .../caps_5.1.0.x86_64.xml                     |  1 +
>   .../caps_5.2.0.x86_64.xml                     |  1 +
>   ...-options-memballoon-freepage-reporting.err |  1 +
>   ...loon-freepage-reporting.x86_64-latest.args | 36 +++++++++++++++++++
>   ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
>   tests/qemuxml2argvtest.c                      |  2 ++
>   ...-options-memballoon-freepage-reporting.xml | 22 ++++++++++++
>   15 files changed, 133 insertions(+)
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
>   create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
>   create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml
> 

Couple of white space problems, but nothing I couldn't fix before pushing.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Congratulations on your first libvirt contribution!

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/13/20 5:10 PM, Michal Privoznik wrote:
> On 10/13/20 1:35 AM, Nico Pache wrote:
>> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
>>
>> The virtio-balloon device now has the ability to report free pages
>> back to the hypervisor for reuse by other programs.

Is this something that we might want to report? I mean, we have 'virsh 
dommemstat $dom' which under the hood calls:

{"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Nico Pache 3 years, 6 months ago
I don't believe so, but David might have some more insight into that
question.

Once the feature is enabled the guest will report pages on its freelist to
the hypervisor, and the host/hypervisor will then release them.

I'm not exactly sure what information we can or would want to 'stat' out of
this process... A counter for number of pages returned?

Cheers,
Nico

On Tue, Oct 13, 2020, 12:47 PM Michal Privoznik <mprivozn@redhat.com> wrote:

> On 10/13/20 5:10 PM, Michal Privoznik wrote:
> > On 10/13/20 1:35 AM, Nico Pache wrote:
> >> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> >>
> >> The virtio-balloon device now has the ability to report free pages
> >> back to the hypervisor for reuse by other programs.
>
> Is this something that we might want to report? I mean, we have 'virsh
> dommemstat $dom' which under the hood calls:
>
>
> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}
>
> Michal
>
>
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by David Hildenbrand 3 years, 6 months ago
On 13.10.20 19:00, Nico Pache wrote:
> I don't believe so, but David might have some more insight into that
> question. 
> 
> Once the feature is enabled the guest will report pages on its freelist
> to the hypervisor, and the host/hypervisor will then release them. 

We report (some, not all) free memory of our guest to the hypervisor. So
whatever the guest reported shows up in "stat-free-memory" in the stats
already.

The hypervisor will "discard" backing storage of reported free memory,
resulting in your hypervisor having more free memory available (and the
QEMU process consuming less memory).

> 
> I'm not exactly sure what information we can or would want to 'stat' out
> of this process... A counter for number of pages returned?

As the guest can reuse memory any time without coordination with the
hypervisor that wouldn't be of too much help. A counter would only tell
you if free page reporting was once performed successfully.

You would really have to compare the QEMU process memory footprint with
the guest stats to figure out if free page reporting is doing what it's
supposed to do.

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Peter Krempa 3 years, 6 months ago
On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote:
> On 10/13/20 5:10 PM, Michal Privoznik wrote:
> > On 10/13/20 1:35 AM, Nico Pache wrote:
> > > gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> > > 
> > > The virtio-balloon device now has the ability to report free pages
> > > back to the hypervisor for reuse by other programs.
> 
> Is this something that we might want to report? I mean, we have 'virsh
> dommemstat $dom' which under the hood calls:
> 
> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}

As I've pointed out in earlier review, I think that the feature name is
a bit misleading. It sounds like a statistic, not something that
actually returns memory to the host. The docs are now better but still
leave a lot of room for imagination.

IMO, if the feature is mainly for returning memory to the host, the
'reporting' word should not have been used.

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/13/20 7:57 PM, Peter Krempa wrote:
> On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote:
>> On 10/13/20 5:10 PM, Michal Privoznik wrote:
>>> On 10/13/20 1:35 AM, Nico Pache wrote:
>>>> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
>>>>
>>>> The virtio-balloon device now has the ability to report free pages
>>>> back to the hypervisor for reuse by other programs.
>>
>> Is this something that we might want to report? I mean, we have 'virsh
>> dommemstat $dom' which under the hood calls:
>>
>> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}
> 
> As I've pointed out in earlier review, I think that the feature name is
> a bit misleading. It sounds like a statistic, not something that
> actually returns memory to the host. The docs are now better but still
> leave a lot of room for imagination.
> 
> IMO, if the feature is mainly for returning memory to the host, the
> 'reporting' word should not have been used.
> 

Oh sorry I missed that. I a penance I will post a cleanup patch. How 
does "free-pages" or "return-pages" or even "discard-pages" sound?

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Nico Pache 3 years, 6 months ago
IMO "return-pages" sounds the best out of those and stays relatively
consistent with the kernel and qemu terminology for this feature.

I personally don't see a huge problem with the current name, but I've also
been staring at the words "free page reporting" for too long.

Cheers!
-- Nico

On Wed, Oct 14, 2020 at 1:56 AM Michal Privoznik <mprivozn@redhat.com>
wrote:

> On 10/13/20 7:57 PM, Peter Krempa wrote:
> > On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote:
> >> On 10/13/20 5:10 PM, Michal Privoznik wrote:
> >>> On 10/13/20 1:35 AM, Nico Pache wrote:
> >>>> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> >>>>
> >>>> The virtio-balloon device now has the ability to report free pages
> >>>> back to the hypervisor for reuse by other programs.
> >>
> >> Is this something that we might want to report? I mean, we have 'virsh
> >> dommemstat $dom' which under the hood calls:
> >>
> >>
> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}
> >
> > As I've pointed out in earlier review, I think that the feature name is
> > a bit misleading. It sounds like a statistic, not something that
> > actually returns memory to the host. The docs are now better but still
> > leave a lot of room for imagination.
> >
> > IMO, if the feature is mainly for returning memory to the host, the
> > 'reporting' word should not have been used.
> >
>
> Oh sorry I missed that. I a penance I will post a cleanup patch. How
> does "free-pages" or "return-pages" or even "discard-pages" sound?
>
> Michal
>
>
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/14/20 8:07 AM, Nico Pache wrote:
> IMO "return-pages" sounds the best out of those and stays relatively 
> consistent with the kernel and qemu terminology for this feature.
> 
> I personally don't see a huge problem with the current name, but I've 
> also been staring at the words "free page reporting" for too long.

Right, I can see both reasons. But now that Peter raised it (again, 
sorry) at libvirt level reporting usually means "to report something to 
user". QEMU <-> KVM communication is too low level and since this is not 
being reported to user directly (even though it affects 
"stat-free-memory" attribute of the balloon) reporting does sound a bit 
weird. Let's wait for Peter's opinion.

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by David Hildenbrand 3 years, 6 months ago
On 14.10.20 08:30, Michal Privoznik wrote:
> On 10/14/20 8:07 AM, Nico Pache wrote:
>> IMO "return-pages" sounds the best out of those and stays relatively 
>> consistent with the kernel and qemu terminology for this feature.
>>
>> I personally don't see a huge problem with the current name, but I've 
>> also been staring at the words "free page reporting" for too long.
> 
> Right, I can see both reasons. But now that Peter raised it (again, 
> sorry) at libvirt level reporting usually means "to report something to 
> user". QEMU <-> KVM communication is too low level and since this is not 
> being reported to user directly (even though it affects 
> "stat-free-memory" attribute of the balloon) reporting does sound a bit 
> weird. Let's wait for Peter's opinion.

We originally wanted to use "free page hinting", but as that name is
already taken for another virtio-balloon feature to improve live
migration speed (one time hint of free pages just before migration), we
used "reporting" instead.

On a QEMU level, this feature name makes perfect sense. The guest
(continuously) reports free pages to the hypervisor. The hypervisor
tries to use the reports to free up some memory (which might not always
be possible).


Now, all I can say is that

1. "free page reporting" is the official name of the feature. Using a
different name in libvirt will crate more confusion than it might
actually help. Just saying.

2. The proposed alternatives ""free-pages" or "return-pages" or even
"discard-pages" don't match what's actually going on.

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/14/20 10:26 AM, David Hildenbrand wrote:
> On 14.10.20 08:30, Michal Privoznik wrote:

Sorry for hijacking this thread, but I need to report it somewhere (what 
is the best place?)

When I want to start a guest with this feature turned on + memfd + 
hugepages, the host kernel prints this warning into dmesg and hugepages 
stop working from then on (meaning, even if the pool of allocated HPs is 
large enough I can't start any guest with HPs):




[  139.434748] ------------[ cut here ]------------
[  139.434754] WARNING: CPU: 2 PID: 6280 at mm/page_counter.c:57 
page_counter_uncharge+0x33/0x40
[  139.434754] Modules linked in: kvm_amd amdgpu kvm btusb btrtl btbcm 
btintel sp5100_tco watchdog k10temp mfd_core gpu_sched ttm
[  139.434759] CPU: 2 PID: 6280 Comm: CPU 1/KVM Not tainted 
5.8.13-gentoo-x86_64 #2
[  139.434759] Hardware name: System manufacturer System Product 
Name/PRIME X570-PRO, BIOS 1005 08/01/2019
[  139.434760] RIP: 0010:page_counter_uncharge+0x33/0x40
[  139.434762] Code: 48 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 
89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0a 48 8b 7f 28 48 85 ff 75 dc 
c3 <0f> 0b eb f2 66 0f 1f 84 00 00 00 00 00 48 8b 17 48 39 d6 72 41 41
[  139.434762] RSP: 0018:ffffc9000355fb38 EFLAGS: 00010286
[  139.434763] RAX: fffffffffffb4000 RBX: ffff888fc267e900 RCX: 
fffffffffffb4000
[  139.434763] RDX: 0000000000000402 RSI: fffffffffffb4000 RDI: 
ffff888fd8411dd0
[  139.434764] RBP: ffff888fcba983c0 R08: 0000000000080400 R09: 
fffffffffff7fc00
[  139.434764] R10: ffffc9000355fb40 R11: 000000000000000a R12: 
0000000000000001
[  139.434765] R13: ffff888fc3d89140 R14: 00000000000001b2 R15: 
00000000000001b1
[  139.434765] FS:  00007fc9d4c35700(0000) GS:ffff888fde880000(0000) 
knlGS:0000000000000000
[  139.434766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  139.434766] CR2: 00007f09a4003000 CR3: 0000000fc06fe000 CR4: 
0000000000340ee0
[  139.434767] Call Trace:
[  139.434769]  hugetlb_cgroup_uncharge_file_region+0x46/0x70
[  139.434772]  region_del+0x1a0/0x260
[  139.434773]  hugetlb_unreserve_pages+0x32/0xa0
[  139.434775]  remove_inode_hugepages+0x19d/0x3a0
[  139.434776]  hugetlbfs_fallocate+0x3f2/0x4a0
[  139.434778]  ? __seccomp_filter+0x75/0x6a0
[  139.434779]  vfs_fallocate+0x124/0x260
[  139.434780]  __x64_sys_fallocate+0x39/0x60
[  139.434783]  do_syscall_64+0x38/0x60
[  139.434784]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  139.434785] RIP: 0033:0x7fc9e0994de7
[  139.434786] Code: 89 7c 24 08 48 89 4c 24 18 e8 45 fc f8 ff 41 89 c0 
4c 8b 54 24 18 48 8b 54 24 10 b8 1d 01 00 00 8b 74 24 0c 8b 7c 24 08 0f 
05 <48> 3d 00 f0 ff ff 77 41 44 89 c7 89 44 24 08 e8 75 fc f8 ff 8b 44
[  139.434787] RSP: 002b:00007fc9d4c337a0 EFLAGS: 00000293 ORIG_RAX: 
000000000000011d
[  139.434787] RAX: ffffffffffffffda RBX: 0000000036400000 RCX: 
00007fc9e0994de7
[  139.434788] RDX: 0000000036200000 RSI: 0000000000000003 RDI: 
000000000000001d
[  139.434788] RBP: 00007fc9d4c33800 R08: 0000000000000000 R09: 
0000000000000000
[  139.434789] R10: 0000000000200000 R11: 0000000000000293 R12: 
00007fff9a75c3fe
[  139.434789] R13: 00007fff9a75c3ff R14: 00007fc9d4c35700 R15: 
00007fc9d4c33dc0
[  139.434790] ---[ end trace fb9808303959fc01 ]---


Is this known problem?

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by David Hildenbrand 3 years, 6 months ago
On 14.10.20 13:53, Michal Privoznik wrote:
> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>> On 14.10.20 08:30, Michal Privoznik wrote:
> 
> Sorry for hijacking this thread, but I need to report it somewhere (what 
> is the best place?)
> 
> When I want to start a guest with this feature turned on + memfd + 
> hugepages, the host kernel prints this warning into dmesg and hugepages 
> stop working from then on (meaning, even if the pool of allocated HPs is 
> large enough I can't start any guest with HPs):
> 
> 
> 
> 
> [  139.434748] ------------[ cut here ]------------
> [  139.434754] WARNING: CPU: 2 PID: 6280 at mm/page_counter.c:57 
> page_counter_uncharge+0x33/0x40
> [  139.434754] Modules linked in: kvm_amd amdgpu kvm btusb btrtl btbcm 
> btintel sp5100_tco watchdog k10temp mfd_core gpu_sched ttm
> [  139.434759] CPU: 2 PID: 6280 Comm: CPU 1/KVM Not tainted 
> 5.8.13-gentoo-x86_64 #2
> [  139.434759] Hardware name: System manufacturer System Product 
> Name/PRIME X570-PRO, BIOS 1005 08/01/2019
> [  139.434760] RIP: 0010:page_counter_uncharge+0x33/0x40
> [  139.434762] Code: 48 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 
> 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0a 48 8b 7f 28 48 85 ff 75 dc 
> c3 <0f> 0b eb f2 66 0f 1f 84 00 00 00 00 00 48 8b 17 48 39 d6 72 41 41
> [  139.434762] RSP: 0018:ffffc9000355fb38 EFLAGS: 00010286
> [  139.434763] RAX: fffffffffffb4000 RBX: ffff888fc267e900 RCX: 
> fffffffffffb4000
> [  139.434763] RDX: 0000000000000402 RSI: fffffffffffb4000 RDI: 
> ffff888fd8411dd0
> [  139.434764] RBP: ffff888fcba983c0 R08: 0000000000080400 R09: 
> fffffffffff7fc00
> [  139.434764] R10: ffffc9000355fb40 R11: 000000000000000a R12: 
> 0000000000000001
> [  139.434765] R13: ffff888fc3d89140 R14: 00000000000001b2 R15: 
> 00000000000001b1
> [  139.434765] FS:  00007fc9d4c35700(0000) GS:ffff888fde880000(0000) 
> knlGS:0000000000000000
> [  139.434766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.434766] CR2: 00007f09a4003000 CR3: 0000000fc06fe000 CR4: 
> 0000000000340ee0
> [  139.434767] Call Trace:
> [  139.434769]  hugetlb_cgroup_uncharge_file_region+0x46/0x70
> [  139.434772]  region_del+0x1a0/0x260
> [  139.434773]  hugetlb_unreserve_pages+0x32/0xa0
> [  139.434775]  remove_inode_hugepages+0x19d/0x3a0
> [  139.434776]  hugetlbfs_fallocate+0x3f2/0x4a0
> [  139.434778]  ? __seccomp_filter+0x75/0x6a0
> [  139.434779]  vfs_fallocate+0x124/0x260
> [  139.434780]  __x64_sys_fallocate+0x39/0x60
> [  139.434783]  do_syscall_64+0x38/0x60
> [  139.434784]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  139.434785] RIP: 0033:0x7fc9e0994de7
> [  139.434786] Code: 89 7c 24 08 48 89 4c 24 18 e8 45 fc f8 ff 41 89 c0 
> 4c 8b 54 24 18 48 8b 54 24 10 b8 1d 01 00 00 8b 74 24 0c 8b 7c 24 08 0f 
> 05 <48> 3d 00 f0 ff ff 77 41 44 89 c7 89 44 24 08 e8 75 fc f8 ff 8b 44
> [  139.434787] RSP: 002b:00007fc9d4c337a0 EFLAGS: 00000293 ORIG_RAX: 
> 000000000000011d
> [  139.434787] RAX: ffffffffffffffda RBX: 0000000036400000 RCX: 
> 00007fc9e0994de7
> [  139.434788] RDX: 0000000036200000 RSI: 0000000000000003 RDI: 
> 000000000000001d
> [  139.434788] RBP: 00007fc9d4c33800 R08: 0000000000000000 R09: 
> 0000000000000000
> [  139.434789] R10: 0000000000200000 R11: 0000000000000293 R12: 
> 00007fff9a75c3fe
> [  139.434789] R13: 00007fff9a75c3ff R14: 00007fc9d4c35700 R15: 
> 00007fc9d4c33dc0
> [  139.434790] ---[ end trace fb9808303959fc01 ]---
> 
> 
> Is this known problem?

No, not at all. Thanks for reporting!

And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
zap reported pages. The same mechanism is also used for postcopy live
migration and virtio-mem with hugetlbfs.

Which kernel are you running?

1. Is it an upstream kernel, lkml + -mm lists are the right place
(please cc me, or I can try to reproduce and report it).

2. Is it a distro kernel? Then create a BUG there.

I was just recently testing virtio-mem with hugetlbfs and it worked on
decent upstream Fedora. But maybe I was not able to trigger it.

> 
> Michal
> 


-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/14/20 2:06 PM, David Hildenbrand wrote:
> On 14.10.20 13:53, Michal Privoznik wrote:
>> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>>> On 14.10.20 08:30, Michal Privoznik wrote:
>>
> <snip/>
> 
> No, not at all. Thanks for reporting!
> 
> And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
> does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
> zap reported pages. The same mechanism is also used for postcopy live
> migration and virtio-mem with hugetlbfs.
> 
> Which kernel are you running?
> 
> 1. Is it an upstream kernel, lkml + -mm lists are the right place
> (please cc me, or I can try to reproduce and report it).
> 
> 2. Is it a distro kernel? Then create a BUG there.
> 
> I was just recently testing virtio-mem with hugetlbfs and it worked on
> decent upstream Fedora. But maybe I was not able to trigger it.

Okay, I've upgraded to 5.9.0-gentoo, but the problem persists. Gentoo 
puts only a very few patches on top of vanilla kernel neither of which 
touches that area of the code:

https://dev.gentoo.org/~mpagano/genpatches/trunk/5.9/

So I think this is reproducible on vanilla too.

BTW: Have you tried placing the qemu inside v1 cgroups? Libvirt does 
that so maybe that's the problem. Anyway, here's the cmd line:

/home/zippy/work/qemu/qemu.git/build/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes 
\
-machine 
pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu host,migratable=on \
-m 4096 \
-object 
memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=4294967296,host-nodes=0,policy=bind 
\
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-device sga \
-chardev socket,id=charmonitor,fd=33,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' 
\
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}' 
\
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1 
\
-netdev tap,fd=35,id=hostnet0 \
-device 
virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 
\
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=36,server,nowait \
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
\
-spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on \
-sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on


Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by David Hildenbrand 3 years, 6 months ago
On 14.10.20 15:46, Michal Privoznik wrote:
> On 10/14/20 2:06 PM, David Hildenbrand wrote:
>> On 14.10.20 13:53, Michal Privoznik wrote:
>>> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>>>> On 14.10.20 08:30, Michal Privoznik wrote:
>>>
>> <snip/>
>>
>> No, not at all. Thanks for reporting!
>>
>> And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
>> does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
>> zap reported pages. The same mechanism is also used for postcopy live
>> migration and virtio-mem with hugetlbfs.
>>
>> Which kernel are you running?
>>
>> 1. Is it an upstream kernel, lkml + -mm lists are the right place
>> (please cc me, or I can try to reproduce and report it).
>>
>> 2. Is it a distro kernel? Then create a BUG there.
>>
>> I was just recently testing virtio-mem with hugetlbfs and it worked on
>> decent upstream Fedora. But maybe I was not able to trigger it.
> 
> Okay, I've upgraded to 5.9.0-gentoo, but the problem persists. Gentoo 
> puts only a very few patches on top of vanilla kernel neither of which 
> touches that area of the code:
> 
> https://dev.gentoo.org/~mpagano/genpatches/trunk/5.9/
> 
> So I think this is reproducible on vanilla too.
> 
> BTW: Have you tried placing the qemu inside v1 cgroups? Libvirt does 
> that so maybe that's the problem. Anyway, here's the cmd line:
> 
> /home/zippy/work/qemu/qemu.git/build/qemu-system-x86_64 \
> -name guest=fedora,debug-threads=on \
> -S \
> -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes 
> \
> -machine 
> pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
> -cpu host,migratable=on \
> -m 4096 \
> -object 
> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=4294967296,host-nodes=0,policy=bind 
> \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -object iothread,id=iothread1 \
> -object iothread,id=iothread2 \
> -object iothread,id=iothread3 \
> -object iothread,id=iothread4 \
> -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
> -no-user-config \
> -nodefaults \
> -device sga \
> -chardev socket,id=charmonitor,fd=33,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc \
> -no-shutdown \
> -global PIIX4_PM.disable_s3=0 \
> -global PIIX4_PM.disable_s4=0 \
> -boot menu=on,strict=on \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' 
> \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}' 
> \
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1 
> \
> -netdev tap,fd=35,id=hostnet0 \
> -device 
> virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 
> \
> -chardev pty,id=charserial0 \
> -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev socket,id=charchannel0,fd=36,server,nowait \
> -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
> \
> -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
> -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
> -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on \
> -sandbox 
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -msg timestamp=on
> 

Thanks!

Reproduced easily on Fedora 32 (5.7.16-200.fc32.x86_64).

[   70.641802] CPU: 3 PID: 2178 Comm: qemu-system-x86 Not tainted 5.7.16-200.fc32.x86_64 #1
[   70.641802] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[   70.641803] RIP: 0010:page_counter_uncharge+0x4b/0x50
[   70.641804] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 20 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[   70.641804] RSP: 0018:ffffb4044139bb18 EFLAGS: 00010286
[   70.641805] RAX: fffffffffff94600 RBX: fffffffffff94600 RCX: ffff8da63d007000
[   70.641805] RDX: 000000000000046e RSI: fffffffffff94600 RDI: ffff8da678412e28
[   70.641806] RBP: ffff8da678412e28 R08: ffff8da678412e28 R09: ffff8da63d0078c0
[   70.641806] R10: ffff8da634173000 R11: 0000000000000007 R12: 000000000008dc00
[   70.641806] R13: fffffffffff72400 R14: ffff8da63d007000 R15: 0000000000000391
[   70.641807] FS:  00007fe7ab5fe700(0000) GS:ffff8da67eac0000(0000) knlGS:0000000000000000
[   70.641808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.641808] CR2: 000055568796a468 CR3: 0000000fe9860000 CR4: 0000000000340ee0
[   70.641808] Call Trace:
[   70.641813]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[   70.641815]  region_del+0x1d3/0x300
[   70.641816]  hugetlb_unreserve_pages+0x39/0xb0
[   70.641818]  remove_inode_hugepages+0x1a8/0x3d0
[   70.641831]  ? kvm_mmu_notifier_invalidate_range+0x38/0x60 [kvm]
[   70.641832]  ? tlb_finish_mmu+0x7a/0x1d0
[   70.641833]  hugetlbfs_fallocate+0x3ac/0x5e0
[   70.641835]  ? avc_has_perm+0x3b/0x160
[   70.641836]  ? file_has_perm+0xa2/0xb0
[   70.641837]  ? selinux_inode_follow_link+0x4c/0xb0
[   70.641838]  ? selinux_file_permission+0x4e/0x120
[   70.641839]  ? security_file_permission+0x2e/0x160
[   70.641840]  vfs_fallocate+0x146/0x280
[   70.641841]  __x64_sys_fallocate+0x3e/0x70
[   70.641843]  do_syscall_64+0x5b/0xf0
[   70.641846]  entry_SYSCALL_64_after_hwframe+0x44/0xa9


Note: prealloc=yes is a bad choice in this environment. It
contradicts memory overcommit - what we want to optimize with
free page reporting. You allocate all VM memory to throw it away
once the guest is up. Is there an option to turn this of with
hugetlbfs? I hope so.

I'll try reproducing upstream and send a BUG report upstream, ccing you. Thanks!

> 
> Michal
> 


-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by Michal Privoznik 3 years, 6 months ago
On 10/14/20 4:07 PM, David Hildenbrand wrote:

> 
> 
> Note: prealloc=yes is a bad choice in this environment. It
> contradicts memory overcommit - what we want to optimize with
> free page reporting. You allocate all VM memory to throw it away
> once the guest is up. Is there an option to turn this of with
> hugetlbfs? I hope so.

Is this something that should be addressed on qemu side? Since I'm 
writing a follow up patches for libvirt I can forbid this combination in 
libvirt.

> 
> I'll try reproducing upstream and send a BUG report upstream, ccing you. Thanks!

Thank you very much!

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
Posted by David Hildenbrand 3 years, 6 months ago
On 14.10.20 16:15, Michal Privoznik wrote:
> On 10/14/20 4:07 PM, David Hildenbrand wrote:
> 
>>
>>
>> Note: prealloc=yes is a bad choice in this environment. It
>> contradicts memory overcommit - what we want to optimize with
>> free page reporting. You allocate all VM memory to throw it away
>> once the guest is up. Is there an option to turn this of with
>> hugetlbfs? I hope so.
> 
> Is this something that should be addressed on qemu side? Since I'm 
> writing a follow up patches for libvirt I can forbid this combination in 
> libvirt.

You mean, bailing out in QEMU if "prealloc=yes" is used along free page
reporting? Hmm, not sure if that's the right approach ... we would also
have to check for any DIMMs we hotplug and things like that. It's
definitely a setup issue to me (where we could try to warn the user, but
QEMU warnings tend to get ignored by everybody ...).

It's also going to be "not what you want" once we have virtio-mem
support in libvirt. (https://virtio-mem.gitlab.io/) But there, it's
easier to check+warn. You only want "prealloc=no" for the memory device
assigned to a virtio-mem device, not for all system RAM.

Thanks!

-- 
Thanks,

David / dhildenb