On 04.12.23 08:53, Xiaoyao Li wrote:
> On 12/4/2023 3:35 PM, Xiaoyao Li wrote:
>> On 11/20/2023 5:56 PM, David Hildenbrand wrote:
>>> On 16.11.23 03:56, Xiaoyao Li wrote:
>>>> On 11/16/2023 2:20 AM, David Hildenbrand wrote:
>>>>> On 15.11.23 08:14, Xiaoyao Li wrote:
>>>>>> Commit d3a5038c461 ("exec: ram_block_discard_range") introduced
>>>>>> ram_block_discard_range() which grabs some code from
>>>>>> ram_discard_range(). However, during code movement, it changed
>>>>>> alignment
>>>>>> check of host_startaddr from qemu_host_page_size to rb->page_size.
>>>>>>
>>>>>> When ramblock is back'ed by hugepage, it requires the startaddr to be
>>>>>> huge page size aligned, which is a overkill. e.g., TDX's
>>>>>> private-shared
>>>>>> page conversion is done at 4KB granularity. Shared page is discarded
>>>>>> when it gets converts to private and when shared page back'ed by
>>>>>> hugepage it is going to fail on this check.
>>>>>>
>>>>>> So change to alignment check back to qemu_host_page_size.
>>>>>>
>>>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>> ---
>>>>>> Changes in v3:
>>>>>> - Newly added in v3;
>>>>>> ---
>>>>>> system/physmem.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/system/physmem.c b/system/physmem.c
>>>>>> index c56b17e44df6..8a4e42c7cf60 100644
>>>>>> --- a/system/physmem.c
>>>>>> +++ b/system/physmem.c
>>>>>> @@ -3532,7 +3532,7 @@ int ram_block_discard_range(RAMBlock *rb,
>>>>>> uint64_t start, size_t length)
>>>>>> uint8_t *host_startaddr = rb->host + start;
>>>>>> - if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
>>>>>> + if (!QEMU_PTR_IS_ALIGNED(host_startaddr, qemu_host_page_size)) {
>>>>>
>>>>> For your use cases, rb->page_size should always match
>>>>> qemu_host_page_size.
>>>>>
>>>>> IIRC, we only set rb->page_size to different values for hugetlb. And
>>>>> guest_memfd does not support hugetlb.
>>>>>
>>>>> Even if QEMU is using THP, rb->page_size should 4k.
>>>>>
>>>>> Please elaborate how you can actually trigger that. From what I recall,
>>>>> guest_memfd is not compatible with hugetlb.
>>>>
>>>> It's the shared memory that can be back'ed by hugetlb.
>>>
>>> Serious question: does that configuration make any sense to support at
>>> this point? I claim: no.
>>>
>>>>
>>>> Later patch 9 introduces ram_block_convert_page(), which will discard
>>>> shared memory when it gets converted to private. TD guest can request
>>>> convert a 4K to private while the page is previously back'ed by hugetlb
>>>> as 2M shared page.
>>>
>>> So you can call ram_block_discard_guest_memfd_range() on subpage
>>> basis, but not ram_block_discard_range().
>>>
>>> ram_block_convert_range() would have to thought that that
>>> (questionable) combination of hugetlb for shmem and ordinary pages for
>>> guest_memfd cannot discard shared memory.
>>>
>>> And it probably shouldn't either way. There are other problems when
>>> not using hugetlb along with preallocation.
>>
>> If I understand correctly, preallocation needs to be enabled for
>> hugetlb. And in preallocation case, it doesn't need to discard memory.
>> Is it correct?
Yes The downside is that we'll end up with double-memory consumption.
But if/how to optimize that in this case ca be left for future work.
>>
>>> The check in ram_block_discard_range() is correct, whoever ends up
>>> calling it has to stop calling it.
>>>
>> > So, I need add logic to ram_block_discard_page() that if the size of
>
> Sorry, I made a typo.
>
> Correct myself, s/ram_block_discard_page()/ram_block_convert_range()
Yes, just leave any shared memory backend that uses hugetlb alone.
--
Cheers,
David / dhildenb