hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

[PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by Wei Chen 1 year, 2 months ago

A malicious guest can exploit virtio-mem to release memory back to the
hypervisor and attempt Rowhammer attacks. The only case reasonable for
unplugging is when the size > requested_size.

Signed-off-by: Wei Chen <weichenforschung@gmail.com>
Signed-off-by: Zhi Zhang <zzhangphd@gmail.com>
---
 hw/virtio/virtio-mem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 80ada89551..4ef67082a2 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -671,6 +671,10 @@ static int virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa,
         return VIRTIO_MEM_RESP_NACK;
     }
 
+    if (!plug && vmem->size <= vmem->requested_size) {
+        return VIRTIO_MEM_RESP_NACK;
+    }
+
     /* test if really all blocks are in the opposite state */
     if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) ||
         (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {
-- 
2.47.1

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 09:02, Wei Chen wrote:
> A malicious guest can exploit virtio-mem to release memory back to the
> hypervisor and attempt Rowhammer attacks.

Please provide more information how this is supposed to work, whether 
this is a purely theoretical case, and how relevant this is in practice.

Because I am not sure how relevant and accurate this statement is, and 
if any action is needed at all.

Further, what about virtio-balloon, which does not even support 
rejecting requests?

The only case reasonable for
> unplugging is when the size > requested_size.

I recall that that behavior was desired once the driver would support 
de-fragmenting unplugged memory blocks. I don't think drivers do that 
today (would have to double-check the Windows one). The spec does not 
document what is to happen in that case.

Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed, so 
this change would not cover all cases. VIRTIO_MEM_REQ_UNPLUG_ALL could 
be ratelimited -- if there is a real issue here.

> 
> Signed-off-by: Wei Chen <weichenforschung@gmail.com>
> Signed-off-by: Zhi Zhang <zzhangphd@gmail.com>
> ---
>   hw/virtio/virtio-mem.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 80ada89551..4ef67082a2 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -671,6 +671,10 @@ static int virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa,
>           return VIRTIO_MEM_RESP_NACK;
>       }
>   
> +    if (!plug && vmem->size <= vmem->requested_size) {
> +        return VIRTIO_MEM_RESP_NACK;
> +    }
> +
>       /* test if really all blocks are in the opposite state */
>       if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) ||
>           (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by Wei Chen 1 year, 2 months ago

 > Please provide more information how this is supposed to work

We initially discovered that virtio-mem could be used by a malicious
agent to trigger the Rowhammer vulnerability and further achieve a VM
escape.

Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
to a memory location might cause voltage leakage to adjacent locations,
effectively flipping bits in these locations. In other words, with
Rowhammer, an adversary can modify the data stored in the memory.

For a complete attack, an adversary needs to: a) determine which parts
of the memory are prone to bit flips, b) trick the system to store
important data on those parts of memory and c) trigger bit flips to
tamper important data.

Now, for an attacker who only has access to their VM but not to the
hypervisor, one important challenge among the three is b), i.e., to give
back the memory they determine as vulnerable to the hypervisor. This is
where the pitfall for virtio-mem lies: the attacker can modify the
virtio-mem driver in the VM's kernel and unplug memory proactively.

The current impl of virtio-mem in qemu does not check if it is valid for
the VM to unplug memory. Therefore, as is proved by our experiments,
this method works in practice.

 > whether this is a purely theoretical case, and how relevant this is in
 > practice.

In our design, on a host machine equipped with certain Intel processors
and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
and c) has a virtio-mem device, an attacker can force the EPT to use
pages that are prone to Rowhammer bit flips and thus modify the EPT to
gain read and write privileges to an arbitrary memory location.

Our efforts involved conducting end-to-end attacks on two separate
machines with the Core i3-10100 and the Xeon E2124 processors
respectively, and has achieved successful VM escapes.

 > Further, what about virtio-balloon, which does not even support
 > rejecting requests?

virtio-balloon does not work with device passthrough currently, so we
have yet to produce a feasible attack with it.

 > I recall that that behavior was desired once the driver would support
 > de-fragmenting unplugged memory blocks.

By "that behavior" do you mean to unplug memory when size <=
requested_size? I am not sure how that is to be implemented.

 > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed

That is true, but the attacker will want the capability to release a
specific sub-block.

In fact, a sub-block is still somewhat coarse, because most likely there
is only one page in a sub-block that contains potential bit flips. When
the attacker spawns EPTEs, they have to spawn enough to make sure the
target page is used to store the EPTEs.

A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
least 1GB of memory. In other words, the attack program exhausts 1GB of
memory just for the possibility that KVM uses the target page to store
EPTEs.

Best regards,
Wei Chen

On 2024/11/26 20:29, David Hildenbrand wrote:
> On 26.11.24 09:02, Wei Chen wrote:
>> A malicious guest can exploit virtio-mem to release memory back to the
>> hypervisor and attempt Rowhammer attacks.
>
> Please provide more information how this is supposed to work, whether 
> this is a purely theoretical case, and how relevant this is in practice.
>
> Because I am not sure how relevant and accurate this statement is, and 
> if any action is needed at all.
>
> Further, what about virtio-balloon, which does not even support 
> rejecting requests?
>
> The only case reasonable for
>> unplugging is when the size > requested_size.
>
> I recall that that behavior was desired once the driver would support 
> de-fragmenting unplugged memory blocks. I don't think drivers do that 
> today (would have to double-check the Windows one). The spec does not 
> document what is to happen in that case.
>
> Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed, so 
> this change would not cover all cases. VIRTIO_MEM_REQ_UNPLUG_ALL could 
> be ratelimited -- if there is a real issue here.
>
>
>>
>> Signed-off-by: Wei Chen <weichenforschung@gmail.com>
>> Signed-off-by: Zhi Zhang <zzhangphd@gmail.com>
>> ---
>>   hw/virtio/virtio-mem.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index 80ada89551..4ef67082a2 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -671,6 +671,10 @@ static int 
>> virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa,
>>           return VIRTIO_MEM_RESP_NACK;
>>       }
>>   +    if (!plug && vmem->size <= vmem->requested_size) {
>> +        return VIRTIO_MEM_RESP_NACK;
>> +    }
>> +
>>       /* test if really all blocks are in the opposite state */
>>       if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) ||
>>           (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {
>
>

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 15:20, Wei Chen wrote:
>   > Please provide more information how this is supposed to work
> 

Thanks for the information. A lot of what you wrote belongs into the 
patch description. Especially, that this might currently only be 
relevant with device passthrough + viommu.

> We initially discovered that virtio-mem could be used by a malicious
> agent to trigger the Rowhammer vulnerability and further achieve a VM
> escape.
> 
> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
> to a memory location might cause voltage leakage to adjacent locations,
> effectively flipping bits in these locations. In other words, with
> Rowhammer, an adversary can modify the data stored in the memory.
> 
> For a complete attack, an adversary needs to: a) determine which parts
> of the memory are prone to bit flips, b) trick the system to store
> important data on those parts of memory and c) trigger bit flips to
> tamper important data.
> 
> Now, for an attacker who only has access to their VM but not to the
> hypervisor, one important challenge among the three is b), i.e., to give
> back the memory they determine as vulnerable to the hypervisor. This is
> where the pitfall for virtio-mem lies: the attacker can modify the
> virtio-mem driver in the VM's kernel and unplug memory proactively.

But b), as you write, is not only about giving back that memory to the 
hypervisor. How can you be sure (IOW trigger) that the system will store 
"important data" like EPTs?

> 
> The current impl of virtio-mem in qemu does not check if it is valid for
> the VM to unplug memory. Therefore, as is proved by our experiments,
> this method works in practice.
> 
>   > whether this is a purely theoretical case, and how relevant this is in
>   > practice.
> 
> In our design, on a host machine equipped with certain Intel processors
> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
> and c) has a virtio-mem device, an attacker can force the EPT to use
> pages that are prone to Rowhammer bit flips and thus modify the EPT to
> gain read and write privileges to an arbitrary memory location.
> 
> Our efforts involved conducting end-to-end attacks on two separate
> machines with the Core i3-10100 and the Xeon E2124 processors
> respectively, and has achieved successful VM escapes.

Out of curiosity, are newer CPUs no longer affected?

> 
>   > Further, what about virtio-balloon, which does not even support
>   > rejecting requests?
> 
> virtio-balloon does not work with device passthrough currently, so we
> have yet to produce a feasible attack with it.

So is one magic bit really that for your experiments, one needs a viommu?

The only mentioning of rohammer+memory ballooning I found is: 
https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html

> 
>   > I recall that that behavior was desired once the driver would support
>   > de-fragmenting unplugged memory blocks.
> 
> By "that behavior" do you mean to unplug memory when size <=
> requested_size? I am not sure how that is to be implemented.

To defragment, the idea was to unplug one additional block, so we can 
plug another block.

> 
>   > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
> 
> That is true, but the attacker will want the capability to release a
> specific sub-block.

So it won't be sufficient to have a single sub-block plugged and then 
trigger VIRTIO_MEM_REQ_UNPLUG_ALL?

> 
> In fact, a sub-block is still somewhat coarse, because most likely there
> is only one page in a sub-block that contains potential bit flips. When
> the attacker spawns EPTEs, they have to spawn enough to make sure the
> target page is used to store the EPTEs.
> 
> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
> least 1GB of memory. In other words, the attack program exhausts 1GB of
> memory just for the possibility that KVM uses the target page to store
> EPTEs.

Ah, that makes sense.

Can you compress what you wrote into the patch description? Further, I 
assume we want to add a Fixes: tag and Cc: QEMU Stable 
<qemu-stable@nongnu.org>

Thanks!

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by Wei Chen 1 year, 2 months ago

 > How can you be sure (IOW trigger) that the system will store
 > "important data" like EPTs?

We cannot, but we have designed the attack (see below) to improve the
possibility.

 > So is one magic bit really that for your experiments, one needs a
 > viommu?

Admittedly the way we accomplish a VM escape is a bit arcane.

We require device passthrough because it pins the VM's memory down and
converts them to MIGRATE_UNMOVABLE. Hotplugged memory will also be
converted to MIGRATE_UNMOVABLE. That way when we give memory back to the
hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the
pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which
cannot be easily accomplished.

Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use
MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU
entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next
time KVM tries to allocate pages to store EPTEs, the kernel has to split
an order-9 page block, which is exactly the size of a 2MB sub-block.

 > Out of curiosity, are newer CPUs no longer affected?

When qemu pins down the VM's memory, it also establishes every possible
mapping to the VM's memory in the EPT.

To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug.
Basically, we execute a bunch of no-op functions, and KVM will have to
split hugepages into 4KB pages. This process creates a large number of
EPTEs.

The iTLB multihit bug roughly speaking is only present on non-Atom Intel
CPUs manufactured before 2020.

 > So it won't be sufficient to have a single sub-block plugged and then
 > trigger VIRTIO_MEM_REQ_UNPLUG_ALL?

Could work in theory, but if the newly plugged sub-block does not
contain vulnerable pages, there is no promise that the attacker would
get a sub-block containing a different set of pages next time.

It also depends heavily on the configuration of the virtio-mem device.
If there is not much non-virtio-mem memory for the VM, the attacker
could easily run out of memory.


Best regards,
Wei Chen

On 2024/11/26 22:46, David Hildenbrand wrote:
> On 26.11.24 15:20, Wei Chen wrote:
>>   > Please provide more information how this is supposed to work
>>
>
> Thanks for the information. A lot of what you wrote belongs into the 
> patch description. Especially, that this might currently only be 
> relevant with device passthrough + viommu.
>
>> We initially discovered that virtio-mem could be used by a malicious
>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>> escape.
>>
>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>> to a memory location might cause voltage leakage to adjacent locations,
>> effectively flipping bits in these locations. In other words, with
>> Rowhammer, an adversary can modify the data stored in the memory.
>>
>> For a complete attack, an adversary needs to: a) determine which parts
>> of the memory are prone to bit flips, b) trick the system to store
>> important data on those parts of memory and c) trigger bit flips to
>> tamper important data.
>>
>> Now, for an attacker who only has access to their VM but not to the
>> hypervisor, one important challenge among the three is b), i.e., to give
>> back the memory they determine as vulnerable to the hypervisor. This is
>> where the pitfall for virtio-mem lies: the attacker can modify the
>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>
> But b), as you write, is not only about giving back that memory to the 
> hypervisor. How can you be sure (IOW trigger) that the system will 
> store "important data" like EPTs?
>
>>
>> The current impl of virtio-mem in qemu does not check if it is valid for
>> the VM to unplug memory. Therefore, as is proved by our experiments,
>> this method works in practice.
>>
>>   > whether this is a purely theoretical case, and how relevant this 
>> is in
>>   > practice.
>>
>> In our design, on a host machine equipped with certain Intel processors
>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>> and c) has a virtio-mem device, an attacker can force the EPT to use
>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>> gain read and write privileges to an arbitrary memory location.
>>
>> Our efforts involved conducting end-to-end attacks on two separate
>> machines with the Core i3-10100 and the Xeon E2124 processors
>> respectively, and has achieved successful VM escapes.
>
> Out of curiosity, are newer CPUs no longer affected?
>
>>
>>   > Further, what about virtio-balloon, which does not even support
>>   > rejecting requests?
>>
>> virtio-balloon does not work with device passthrough currently, so we
>> have yet to produce a feasible attack with it.
>
> So is one magic bit really that for your experiments, one needs a viommu?
>
> The only mentioning of rohammer+memory ballooning I found is: 
> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
>
>>
>>   > I recall that that behavior was desired once the driver would 
>> support
>>   > de-fragmenting unplugged memory blocks.
>>
>> By "that behavior" do you mean to unplug memory when size <=
>> requested_size? I am not sure how that is to be implemented.
>
> To defragment, the idea was to unplug one additional block, so we can 
> plug another block.
>
>>
>>   > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>
>> That is true, but the attacker will want the capability to release a
>> specific sub-block.
>
> So it won't be sufficient to have a single sub-block plugged and then 
> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>
>>
>> In fact, a sub-block is still somewhat coarse, because most likely there
>> is only one page in a sub-block that contains potential bit flips. When
>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>> target page is used to store the EPTEs.
>>
>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>> memory just for the possibility that KVM uses the target page to store
>> EPTEs.
>
> Ah, that makes sense.
>
> Can you compress what you wrote into the patch description? Further, I 
> assume we want to add a Fixes: tag and Cc: QEMU Stable 
> <qemu-stable@nongnu.org>
>
> Thanks!
>

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 16:31, Wei Chen wrote:
>   > How can you be sure (IOW trigger) that the system will store
>   > "important data" like EPTs?
> 
> We cannot, but we have designed the attack (see below) to improve the
> possibility.
> 
>   > So is one magic bit really that for your experiments, one needs a
>   > viommu?
> 
> Admittedly the way we accomplish a VM escape is a bit arcane.

That's what I imagined :)

> 
> We require device passthrough because it pins the VM's memory down and
> converts them to MIGRATE_UNMOVABLE. 

Interesting, that's news to me. Can you share where GUP in the kernel 
would do that?

> Hotplugged memory will also be
> converted to MIGRATE_UNMOVABLE. 

But that's in the VM? Because we don't hotplug memory in the hypervisor.

That way when we give memory back to the
> hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the
> pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which
> cannot be easily accomplished.
> 
> Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use
> MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU
> entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next
> time KVM tries to allocate pages to store EPTEs, the kernel has to split
> an order-9 page block, which is exactly the size of a 2MB sub-block.
> 

Ah, so you also need a THP in the hypervisor I assume.

>   > Out of curiosity, are newer CPUs no longer affected?
> 
> When qemu pins down the VM's memory, it also establishes every possible
> mapping to the VM's memory in the EPT.
> 
> To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug.
> Basically, we execute a bunch of no-op functions, and KVM will have to
> split hugepages into 4KB pages. This process creates a large number of
> EPTEs.
> 
> The iTLB multihit bug roughly speaking is only present on non-Atom Intel
> CPUs manufactured before 2020.

Interesting, thanks!

> 
>   > So it won't be sufficient to have a single sub-block plugged and then
>   > trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
> 
> Could work in theory, but if the newly plugged sub-block does not
> contain vulnerable pages, there is no promise that the attacker would
> get a sub-block containing a different set of pages next time.

Right.

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by zhi zhang 1 year, 2 months ago

On Tue, Nov 26, 2024 at 11:52 PM David Hildenbrand <david@redhat.com> wrote:

> On 26.11.24 16:31, Wei Chen wrote:
> >   > How can you be sure (IOW trigger) that the system will store
> >   > "important data" like EPTs?
> >
> > We cannot, but we have designed the attack (see below) to improve the
> > possibility.
> >
> >   > So is one magic bit really that for your experiments, one needs a
> >   > viommu?
> >
> > Admittedly the way we accomplish a VM escape is a bit arcane.
>
> That's what I imagined :)
>
> >
> > We require device passthrough because it pins the VM's memory down and
> > converts them to MIGRATE_UNMOVABLE.
>
> Interesting, that's news to me. Can you share where GUP in the kernel
> would do that?
>

In /drivers/vfio/vfio_iommu_type1.c, there is a function called
vfio_iommu_type1_pin_pages where VM's memory is pinned down.

>
> > Hotplugged memory will also be
> > converted to MIGRATE_UNMOVABLE.
>
> But that's in the VM? Because we don't hotplug memory in the hypervisor.
>

Yes, the virtio-mem driver in the VM is modified to actively release memory
vulnerable to Rowhammer.

For more details, would you be interested in reading our paper? It was
recently submitted to ASPLOS for publication and we are happy to share it
with you.

Regards,
Zhi Zhang

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 27.11.24 03:00, zhi zhang wrote:
> 
> 
> On Tue, Nov 26, 2024 at 11:52 PM David Hildenbrand <david@redhat.com 
> <mailto:david@redhat.com>> wrote:
> 
>     On 26.11.24 16:31, Wei Chen wrote:
>      >   > How can you be sure (IOW trigger) that the system will store
>      >   > "important data" like EPTs?
>      >
>      > We cannot, but we have designed the attack (see below) to improve the
>      > possibility.
>      >
>      >   > So is one magic bit really that for your experiments, one needs a
>      >   > viommu?
>      >
>      > Admittedly the way we accomplish a VM escape is a bit arcane.
> 
>     That's what I imagined :)
> 
>      >
>      > We require device passthrough because it pins the VM's memory
>     down and
>      > converts them to MIGRATE_UNMOVABLE.
> 
>     Interesting, that's news to me. Can you share where GUP in the kernel
>     would do that?
> 
> 
> In /drivers/vfio/vfio_iommu_type1.c, there is a function called 
> vfio_iommu_type1_pin_pages where VM's memory is pinned down.

That doesn't explain the full story about MIGRATE_UNMOVABLE. I assume 
one precondition is missing in your explanation.

VFIO will call pin_user_pages_remote(FOLL_LONGTERM). Two cases:

a) Memory is already allocated (which would mostly be MIGRATE_MOVABLE, 
because it's ordinary user memory). We'll simply longterm pin the memory 
without changing the migratetype.

b) Memory is not allocated yet. We'll call 
faultin_page()->handle_mm_fault(). There is no FOLL_LONGTERM 
special-casing, so you'll mostly get MIGRATE_MOVABLE.

Now, there is one corner case: we disallow longterm pinning on 
ZONE_MOVABLE and MIGRATE_CMA. In case our user space allocation ended up 
on there, check_and_migrate_movable_pages() would detect that the memory 
resides on ZONE_MOVABLE or MIGRATE_CMA, and allocate a destination page 
in migrate_longterm_unpinnable_folios() using "GFP_USER | __GFP_NOWARN".

So I assume one precondition is that your hypervisor has at least some 
ZONE_MOVABLE or CMA memory? Otherwise I don't see how you would reliably 
get MIGRATE_UNMOVABLE.

> 
> 
>      > Hotplugged memory will also be
>      > converted to MIGRATE_UNMOVABLE.
> 
>     But that's in the VM? Because we don't hotplug memory in the hypervisor.
> 
> 
> Yes, the virtio-mem driver in the VM is modified to actively release 
> memory vulnerable to Rowhammer.

I think I now understand that statement: Memory to-be-hotplugged to the 
VM will be migrated to MIGRATE_UNMOVABLE during longterm pinning, if it 
resides on ZONE_MOVABLE or MIGRATE_CMA.

> For more details, would you be interested in reading our paper? It was 
> recently submitted to ASPLOS for publication and we are happy to share 
> it with you.

Yes, absolutely! Please send a private mail :)

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by Wei Chen 1 year, 2 months ago

> That doesn't explain the full story about MIGRATE_UNMOVABLE. I assume
> one precondition is missing in your explanation.

I have double-checked the source code. My initial description of the
process seems somewhat imprecise. vIOMMU does not convert pages to
UNMOVABLE during pinning, it is that pinning causes page faults, and the
fault handler allocates UNMOVABLE pages. (vaddr_get_pfns() calls
__gup_longterm_locked(), who then calls memalloc_pin_save(), and it
implicitly removes the __GFP_MOVABLE flag.)

Therefore, there is no requirement of ZONE_MOVABLE and MIGRATE_CMA.


Best regards,
Wei Chen

On Wed, Nov 27, 2024 at 5:37 PM David Hildenbrand <david@redhat.com> wrote:

> On 27.11.24 03:00, zhi zhang wrote:
> >
> >
> > On Tue, Nov 26, 2024 at 11:52 PM David Hildenbrand <david@redhat.com
> > <mailto:david@redhat.com>> wrote:
> >
> >     On 26.11.24 16:31, Wei Chen wrote:
> >      >   > How can you be sure (IOW trigger) that the system will store
> >      >   > "important data" like EPTs?
> >      >
> >      > We cannot, but we have designed the attack (see below) to improve
> the
> >      > possibility.
> >      >
> >      >   > So is one magic bit really that for your experiments, one
> needs a
> >      >   > viommu?
> >      >
> >      > Admittedly the way we accomplish a VM escape is a bit arcane.
> >
> >     That's what I imagined :)
> >
> >      >
> >      > We require device passthrough because it pins the VM's memory
> >     down and
> >      > converts them to MIGRATE_UNMOVABLE.
> >
> >     Interesting, that's news to me. Can you share where GUP in the kernel
> >     would do that?
> >
> >
> > In /drivers/vfio/vfio_iommu_type1.c, there is a function called
> > vfio_iommu_type1_pin_pages where VM's memory is pinned down.
>
> That doesn't explain the full story about MIGRATE_UNMOVABLE. I assume
> one precondition is missing in your explanation.
>
> VFIO will call pin_user_pages_remote(FOLL_LONGTERM). Two cases:
>
> a) Memory is already allocated (which would mostly be MIGRATE_MOVABLE,
> because it's ordinary user memory). We'll simply longterm pin the memory
> without changing the migratetype.
>
> b) Memory is not allocated yet. We'll call
> faultin_page()->handle_mm_fault(). There is no FOLL_LONGTERM
> special-casing, so you'll mostly get MIGRATE_MOVABLE.
>
>
> Now, there is one corner case: we disallow longterm pinning on
> ZONE_MOVABLE and MIGRATE_CMA. In case our user space allocation ended up
> on there, check_and_migrate_movable_pages() would detect that the memory
> resides on ZONE_MOVABLE or MIGRATE_CMA, and allocate a destination page
> in migrate_longterm_unpinnable_folios() using "GFP_USER | __GFP_NOWARN".
>
> So I assume one precondition is that your hypervisor has at least some
> ZONE_MOVABLE or CMA memory? Otherwise I don't see how you would reliably
> get MIGRATE_UNMOVABLE.
>
> >
> >
> >      > Hotplugged memory will also be
> >      > converted to MIGRATE_UNMOVABLE.
> >
> >     But that's in the VM? Because we don't hotplug memory in the
> hypervisor.
> >
> >
> > Yes, the virtio-mem driver in the VM is modified to actively release
> > memory vulnerable to Rowhammer.
>
> I think I now understand that statement: Memory to-be-hotplugged to the
> VM will be migrated to MIGRATE_UNMOVABLE during longterm pinning, if it
> resides on ZONE_MOVABLE or MIGRATE_CMA.
>
> > For more details, would you be interested in reading our paper? It was
> > recently submitted to ASPLOS for publication and we are happy to share
> > it with you.
>
> Yes, absolutely! Please send a private mail :)
>
> --
> Cheers,
>
> David / dhildenb
>
>

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 30.11.24 13:48, Wei Chen wrote:
>  > That doesn't explain the full story about MIGRATE_UNMOVABLE. I assume
>  > one precondition is missing in your explanation.
> 
> I have double-checked the source code. My initial description of the
> process seems somewhat imprecise. vIOMMU does not convert pages to
> UNMOVABLE during pinning, it is that pinning causes page faults, and the
> fault handler allocates UNMOVABLE pages. (vaddr_get_pfns() calls
> __gup_longterm_locked(), who then calls memalloc_pin_save(), and it
> implicitly removes the __GFP_MOVABLE flag.)

Ah, that makes sense! I forgot about memalloc_pin_save(), which we 
primarily added to avoid allocation+immediate migration during longterm 
pinning IIRC.

> 
> Therefore, there is no requirement of ZONE_MOVABLE and MIGRATE_CMA.

Indeed. On systems without that, one workaround would be driving 
virtio-mem in "prealloc" mode (prealloc=on in QEMU on the device), 
whereby we first preallocate the memory using MADV_POPULATW_WRITE, to 
then longterm pin it.

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 15:46, David Hildenbrand wrote:
> On 26.11.24 15:20, Wei Chen wrote:
>>    > Please provide more information how this is supposed to work
>>
> 
> Thanks for the information. A lot of what you wrote belongs into the
> patch description. Especially, that this might currently only be
> relevant with device passthrough + viommu.
> 
>> We initially discovered that virtio-mem could be used by a malicious
>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>> escape.
>>
>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>> to a memory location might cause voltage leakage to adjacent locations,
>> effectively flipping bits in these locations. In other words, with
>> Rowhammer, an adversary can modify the data stored in the memory.
>>
>> For a complete attack, an adversary needs to: a) determine which parts
>> of the memory are prone to bit flips, b) trick the system to store
>> important data on those parts of memory and c) trigger bit flips to
>> tamper important data.
>>
>> Now, for an attacker who only has access to their VM but not to the
>> hypervisor, one important challenge among the three is b), i.e., to give
>> back the memory they determine as vulnerable to the hypervisor. This is
>> where the pitfall for virtio-mem lies: the attacker can modify the
>> virtio-mem driver in the VM's kernel and unplug memory proactively.
> 
> But b), as you write, is not only about giving back that memory to the
> hypervisor. How can you be sure (IOW trigger) that the system will store
> "important data" like EPTs?
> 
>>
>> The current impl of virtio-mem in qemu does not check if it is valid for
>> the VM to unplug memory. Therefore, as is proved by our experiments,
>> this method works in practice.
>>
>>    > whether this is a purely theoretical case, and how relevant this is in
>>    > practice.
>>
>> In our design, on a host machine equipped with certain Intel processors
>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>> and c) has a virtio-mem device, an attacker can force the EPT to use
>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>> gain read and write privileges to an arbitrary memory location.
>>
>> Our efforts involved conducting end-to-end attacks on two separate
>> machines with the Core i3-10100 and the Xeon E2124 processors
>> respectively, and has achieved successful VM escapes.
> 
> Out of curiosity, are newer CPUs no longer affected?
> 
>>
>>    > Further, what about virtio-balloon, which does not even support
>>    > rejecting requests?
>>
>> virtio-balloon does not work with device passthrough currently, so we
>> have yet to produce a feasible attack with it.
> 
> So is one magic bit really that for your experiments, one needs a viommu?
> 
> The only mentioning of rohammer+memory ballooning I found is:
> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
> 
>>
>>    > I recall that that behavior was desired once the driver would support
>>    > de-fragmenting unplugged memory blocks.
>>
>> By "that behavior" do you mean to unplug memory when size <=
>> requested_size? I am not sure how that is to be implemented.
> 
> To defragment, the idea was to unplug one additional block, so we can
> plug another block.
> 
>>
>>    > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>
>> That is true, but the attacker will want the capability to release a
>> specific sub-block.
> 
> So it won't be sufficient to have a single sub-block plugged and then
> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
> 
>>
>> In fact, a sub-block is still somewhat coarse, because most likely there
>> is only one page in a sub-block that contains potential bit flips. When
>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>> target page is used to store the EPTEs.
>>
>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>> memory just for the possibility that KVM uses the target page to store
>> EPTEs.
> 
> Ah, that makes sense.
> 
> Can you compress what you wrote into the patch description? Further, I
> assume we want to add a Fixes: tag and Cc: QEMU Stable
> <qemu-stable@nongnu.org>

I just recalled another scenario where we unplug memory: see 
virtio_mem_cleanup_pending_mb() in the Linux driver as one example.

We first plug memory, then add the memory to Linux. If that adding 
fails, we unplug the memory again.

So this change can turn the virtio_mem driver in Linux non-functional, 
unfortunately.

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 16:08, David Hildenbrand wrote:
> On 26.11.24 15:46, David Hildenbrand wrote:
>> On 26.11.24 15:20, Wei Chen wrote:
>>>     > Please provide more information how this is supposed to work
>>>
>>
>> Thanks for the information. A lot of what you wrote belongs into the
>> patch description. Especially, that this might currently only be
>> relevant with device passthrough + viommu.
>>
>>> We initially discovered that virtio-mem could be used by a malicious
>>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>>> escape.
>>>
>>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>>> to a memory location might cause voltage leakage to adjacent locations,
>>> effectively flipping bits in these locations. In other words, with
>>> Rowhammer, an adversary can modify the data stored in the memory.
>>>
>>> For a complete attack, an adversary needs to: a) determine which parts
>>> of the memory are prone to bit flips, b) trick the system to store
>>> important data on those parts of memory and c) trigger bit flips to
>>> tamper important data.
>>>
>>> Now, for an attacker who only has access to their VM but not to the
>>> hypervisor, one important challenge among the three is b), i.e., to give
>>> back the memory they determine as vulnerable to the hypervisor. This is
>>> where the pitfall for virtio-mem lies: the attacker can modify the
>>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>>
>> But b), as you write, is not only about giving back that memory to the
>> hypervisor. How can you be sure (IOW trigger) that the system will store
>> "important data" like EPTs?
>>
>>>
>>> The current impl of virtio-mem in qemu does not check if it is valid for
>>> the VM to unplug memory. Therefore, as is proved by our experiments,
>>> this method works in practice.
>>>
>>>     > whether this is a purely theoretical case, and how relevant this is in
>>>     > practice.
>>>
>>> In our design, on a host machine equipped with certain Intel processors
>>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>>> and c) has a virtio-mem device, an attacker can force the EPT to use
>>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>>> gain read and write privileges to an arbitrary memory location.
>>>
>>> Our efforts involved conducting end-to-end attacks on two separate
>>> machines with the Core i3-10100 and the Xeon E2124 processors
>>> respectively, and has achieved successful VM escapes.
>>
>> Out of curiosity, are newer CPUs no longer affected?
>>
>>>
>>>     > Further, what about virtio-balloon, which does not even support
>>>     > rejecting requests?
>>>
>>> virtio-balloon does not work with device passthrough currently, so we
>>> have yet to produce a feasible attack with it.
>>
>> So is one magic bit really that for your experiments, one needs a viommu?
>>
>> The only mentioning of rohammer+memory ballooning I found is:
>> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
>>
>>>
>>>     > I recall that that behavior was desired once the driver would support
>>>     > de-fragmenting unplugged memory blocks.
>>>
>>> By "that behavior" do you mean to unplug memory when size <=
>>> requested_size? I am not sure how that is to be implemented.
>>
>> To defragment, the idea was to unplug one additional block, so we can
>> plug another block.
>>
>>>
>>>     > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>>
>>> That is true, but the attacker will want the capability to release a
>>> specific sub-block.
>>
>> So it won't be sufficient to have a single sub-block plugged and then
>> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>>
>>>
>>> In fact, a sub-block is still somewhat coarse, because most likely there
>>> is only one page in a sub-block that contains potential bit flips. When
>>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>>> target page is used to store the EPTEs.
>>>
>>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>>> memory just for the possibility that KVM uses the target page to store
>>> EPTEs.
>>
>> Ah, that makes sense.
>>
>> Can you compress what you wrote into the patch description? Further, I
>> assume we want to add a Fixes: tag and Cc: QEMU Stable
>> <qemu-stable@nongnu.org>
> 
> I just recalled another scenario where we unplug memory: see
> virtio_mem_cleanup_pending_mb() in the Linux driver as one example.
> 
> We first plug memory, then add the memory to Linux. If that adding
> fails, we unplug the memory again.
> 
> So this change can turn the virtio_mem driver in Linux non-functional,
> unfortunately.

Further, the Linux driver does not expect a NACK on unplug requests, see 
virtio_mem_send_unplug_request().

So this change won't work.

We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised above, 
we would still have to make it work every now and then (ratelimit), to 
not break the driver.

The alternative is to delay freeing of the memory in case we run into 
this condition. Hm ...

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by Wei Chen 1 year, 2 months ago

Thanks for the information! I will try to come up with V2 that does not
impact virtio-mem's functionality.


Best regards,
Wei Chen

On 2024/11/26 23:14, David Hildenbrand wrote:
> On 26.11.24 16:08, David Hildenbrand wrote:
>> On 26.11.24 15:46, David Hildenbrand wrote:
>>> On 26.11.24 15:20, Wei Chen wrote:
>>>>     > Please provide more information how this is supposed to work
>>>>
>>>
>>> Thanks for the information. A lot of what you wrote belongs into the
>>> patch description. Especially, that this might currently only be
>>> relevant with device passthrough + viommu.
>>>
>>>> We initially discovered that virtio-mem could be used by a malicious
>>>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>>>> escape.
>>>>
>>>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent 
>>>> access
>>>> to a memory location might cause voltage leakage to adjacent 
>>>> locations,
>>>> effectively flipping bits in these locations. In other words, with
>>>> Rowhammer, an adversary can modify the data stored in the memory.
>>>>
>>>> For a complete attack, an adversary needs to: a) determine which parts
>>>> of the memory are prone to bit flips, b) trick the system to store
>>>> important data on those parts of memory and c) trigger bit flips to
>>>> tamper important data.
>>>>
>>>> Now, for an attacker who only has access to their VM but not to the
>>>> hypervisor, one important challenge among the three is b), i.e., to 
>>>> give
>>>> back the memory they determine as vulnerable to the hypervisor. 
>>>> This is
>>>> where the pitfall for virtio-mem lies: the attacker can modify the
>>>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>>>
>>> But b), as you write, is not only about giving back that memory to the
>>> hypervisor. How can you be sure (IOW trigger) that the system will 
>>> store
>>> "important data" like EPTs?
>>>
>>>>
>>>> The current impl of virtio-mem in qemu does not check if it is 
>>>> valid for
>>>> the VM to unplug memory. Therefore, as is proved by our experiments,
>>>> this method works in practice.
>>>>
>>>>     > whether this is a purely theoretical case, and how relevant 
>>>> this is in
>>>>     > practice.
>>>>
>>>> In our design, on a host machine equipped with certain Intel 
>>>> processors
>>>> and inside a VM that a) has a passed-through PCI device, b) has a 
>>>> vIOMMU
>>>> and c) has a virtio-mem device, an attacker can force the EPT to use
>>>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>>>> gain read and write privileges to an arbitrary memory location.
>>>>
>>>> Our efforts involved conducting end-to-end attacks on two separate
>>>> machines with the Core i3-10100 and the Xeon E2124 processors
>>>> respectively, and has achieved successful VM escapes.
>>>
>>> Out of curiosity, are newer CPUs no longer affected?
>>>
>>>>
>>>>     > Further, what about virtio-balloon, which does not even support
>>>>     > rejecting requests?
>>>>
>>>> virtio-balloon does not work with device passthrough currently, so we
>>>> have yet to produce a feasible attack with it.
>>>
>>> So is one magic bit really that for your experiments, one needs a 
>>> viommu?
>>>
>>> The only mentioning of rohammer+memory ballooning I found is:
>>> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html 
>>>
>>>
>>>>
>>>>     > I recall that that behavior was desired once the driver would 
>>>> support
>>>>     > de-fragmenting unplugged memory blocks.
>>>>
>>>> By "that behavior" do you mean to unplug memory when size <=
>>>> requested_size? I am not sure how that is to be implemented.
>>>
>>> To defragment, the idea was to unplug one additional block, so we can
>>> plug another block.
>>>
>>>>
>>>>     > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be 
>>>> allowed
>>>>
>>>> That is true, but the attacker will want the capability to release a
>>>> specific sub-block.
>>>
>>> So it won't be sufficient to have a single sub-block plugged and then
>>> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>>>
>>>>
>>>> In fact, a sub-block is still somewhat coarse, because most likely 
>>>> there
>>>> is only one page in a sub-block that contains potential bit flips. 
>>>> When
>>>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>>>> target page is used to store the EPTEs.
>>>>
>>>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>>>> least 1GB of memory. In other words, the attack program exhausts 
>>>> 1GB of
>>>> memory just for the possibility that KVM uses the target page to store
>>>> EPTEs.
>>>
>>> Ah, that makes sense.
>>>
>>> Can you compress what you wrote into the patch description? Further, I
>>> assume we want to add a Fixes: tag and Cc: QEMU Stable
>>> <qemu-stable@nongnu.org>
>>
>> I just recalled another scenario where we unplug memory: see
>> virtio_mem_cleanup_pending_mb() in the Linux driver as one example.
>>
>> We first plug memory, then add the memory to Linux. If that adding
>> fails, we unplug the memory again.
>>
>> So this change can turn the virtio_mem driver in Linux non-functional,
>> unfortunately.
>
> Further, the Linux driver does not expect a NACK on unplug requests, 
> see virtio_mem_send_unplug_request().
>
> So this change won't work.
>
> We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised 
> above, we would still have to make it work every now and then 
> (ratelimit), to not break the driver.
>
> The alternative is to delay freeing of the memory in case we run into 
> this condition. Hm ...
>

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 16:41, Wei Chen wrote:
> Thanks for the information! I will try to come up with V2 that does not
> impact virtio-mem's functionality.

So, thinking about this ... both UNPLUG_ALL and "over-UNPLUG" (exceeding 
the request) will happen in sane environment currently very rarely. In 
many setups never at all.

We could likely limit them to "once every 60s" without causing real 
harm. Would that sufficient to mitigate the problem? How often would you 
usually have to retry in order to make it fly?

-- 
Cheers,

David / dhildenb

Re: [PATCH] hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Posted by David Hildenbrand 1 year, 2 months ago

On 26.11.24 16:41, Wei Chen wrote:
> Thanks for the information! I will try to come up with V2 that does not
> impact virtio-mem's functionality.

Thanks. In case we want to go this path in this patch, we'd have to glue 
the new behavior to a new feature flag, and implement support for that 
in Linux (+Windows) drivers.

So if we can find a way to avoid that, it would be beneficial.

-- 
Cheers,

David / dhildenb