-----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: Friday, December 16, 2022 3:39 AM > > Hi Stefano, > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > On Thu, 15 Dec 2022, Julien Grall wrote: > >>>> On 13/12/2022 19:48, Smith, Jackson wrote: > >>> Yes, we are familiar with the "secret-free hypervisor" work. As you > >>> point out, both our work and the secret-free hypervisor remove the > >>> directmap region to mitigate the risk of leaking sensitive guest > >>> secrets. However, our work is slightly different because it > >>> additionally prevents attackers from tricking Xen into remapping a > guest. > >> > >> I understand your goal, but I don't think this is achieved (see > >> above). You would need an entity to prevent write to TTBR0_EL2 in > >> order to fully protect it. > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > cannot > > claim that the guest's secrets are 100% safe. > > > > But the attacker would have to follow the sequence you outlines > above > > to change Xen's pagetables and remap guest memory before > accessing it. > > It is an additional obstacle for attackers that want to steal other > guests' > > secrets. The size of the code that the attacker would need to inject > > in Xen would need to be bigger and more complex. > > Right, that's why I wrote with a bit more work. However, the nuance > you mention doesn't seem to be present in the cover letter: > > "This creates what we call "Software Enclaves", ensuring that an > adversary with arbitrary code execution in the hypervisor STILL cannot > read/write guest memory." > > So if the end goal if really to protect against *all* sort of arbitrary > code, > then I think we should have a rough idea how this will look like in Xen. > > From a brief look, it doesn't look like it would be possible to prevent > modification to TTBR0_EL2 (even from EL3). We would need to > investigate if there are other bits in the architecture to help us. > > > > > Every little helps :-) > > I can see how making the life of the attacker more difficult is > appealing. > Yet, the goal needs to be clarified and the risk with the approach > acknowledged (see above). > You're right, we should have mentioned this weakness in our first email. Sorry about the oversight! This is definitely still a limitation that we have not yet overcome. However, we do think that the increase in attacker workload that you and Stefano are discussing could still be valuable to security conscious Xen users. It would nice to find additional architecture features that we can use to close this hole on arm, but there aren't any that stand out to me either. With this limitation in mind, what are the next steps we should take to support this feature for the xen community? Is this increase in attacker workload meaningful enough to justify the inclusion of VMF in Xen? Thanks, Jackson
On Tue, 20 Dec 2022, Smith, Jackson wrote: > > Hi Stefano, > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote: > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As > you > > >>> point out, both our work and the secret-free hypervisor remove the > > >>> directmap region to mitigate the risk of leaking sensitive guest > > >>> secrets. However, our work is slightly different because it > > >>> additionally prevents attackers from tricking Xen into remapping a > > guest. > > >> > > >> I understand your goal, but I don't think this is achieved (see > > >> above). You would need an entity to prevent write to TTBR0_EL2 in > > >> order to fully protect it. > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > cannot > > > claim that the guest's secrets are 100% safe. > > > > > > But the attacker would have to follow the sequence you outlines > > above > > > to change Xen's pagetables and remap guest memory before > > accessing it. > > > It is an additional obstacle for attackers that want to steal other > > guests' > > > secrets. The size of the code that the attacker would need to inject > > > in Xen would need to be bigger and more complex. > > > > Right, that's why I wrote with a bit more work. However, the nuance > > you mention doesn't seem to be present in the cover letter: > > > > "This creates what we call "Software Enclaves", ensuring that an > > adversary with arbitrary code execution in the hypervisor STILL cannot > > read/write guest memory." > > > > So if the end goal if really to protect against *all* sort of > arbitrary > > code, > > then I think we should have a rough idea how this will look like in > Xen. > > > > From a brief look, it doesn't look like it would be possible to > prevent > > modification to TTBR0_EL2 (even from EL3). We would need to > > investigate if there are other bits in the architecture to help us. > > > > > > > > Every little helps :-) > > > > I can see how making the life of the attacker more difficult is > > appealing. > > Yet, the goal needs to be clarified and the risk with the approach > > acknowledged (see above). > > > > You're right, we should have mentioned this weakness in our first email. > Sorry about the oversight! This is definitely still a limitation that we > have not yet overcome. However, we do think that the increase in > attacker workload that you and Stefano are discussing could still be > valuable to security conscious Xen users. > > It would nice to find additional architecture features that we can use > to close this hole on arm, but there aren't any that stand out to me > either. > > With this limitation in mind, what are the next steps we should take to > support this feature for the xen community? Is this increase in attacker > workload meaningful enough to justify the inclusion of VMF in Xen? I think it could be valuable as an additional obstacle for the attacker to overcome. The next step would be to port your series on top of Julien's "Remove the directmap" patch series https://marc.info/?l=xen-devel&m=167119090721116 Julien, what do you think?
Hi Stefano,
On 22/12/2022 00:38, Stefano Stabellini wrote:
> On Tue, 20 Dec 2022, Smith, Jackson wrote:
>>> Hi Stefano,
>>>
>>> On 16/12/2022 01:46, Stefano Stabellini wrote:
>>>> On Thu, 15 Dec 2022, Julien Grall wrote:
>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As
>> you
>>>>>> point out, both our work and the secret-free hypervisor remove the
>>>>>> directmap region to mitigate the risk of leaking sensitive guest
>>>>>> secrets. However, our work is slightly different because it
>>>>>> additionally prevents attackers from tricking Xen into remapping a
>>> guest.
>>>>>
>>>>> I understand your goal, but I don't think this is achieved (see
>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in
>>>>> order to fully protect it.
>>>>
>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we
>>> cannot
>>>> claim that the guest's secrets are 100% safe.
>>>>
>>>> But the attacker would have to follow the sequence you outlines
>>> above
>>>> to change Xen's pagetables and remap guest memory before
>>> accessing it.
>>>> It is an additional obstacle for attackers that want to steal other
>>> guests'
>>>> secrets. The size of the code that the attacker would need to inject
>>>> in Xen would need to be bigger and more complex.
>>>
>>> Right, that's why I wrote with a bit more work. However, the nuance
>>> you mention doesn't seem to be present in the cover letter:
>>>
>>> "This creates what we call "Software Enclaves", ensuring that an
>>> adversary with arbitrary code execution in the hypervisor STILL cannot
>>> read/write guest memory."
>>>
>>> So if the end goal if really to protect against *all* sort of
>> arbitrary
>>> code,
>>> then I think we should have a rough idea how this will look like in
>> Xen.
>>>
>>> From a brief look, it doesn't look like it would be possible to
>> prevent
>>> modification to TTBR0_EL2 (even from EL3). We would need to
>>> investigate if there are other bits in the architecture to help us.
>>>
>>>>
>>>> Every little helps :-)
>>>
>>> I can see how making the life of the attacker more difficult is
>>> appealing.
>>> Yet, the goal needs to be clarified and the risk with the approach
>>> acknowledged (see above).
>>>
>>
>> You're right, we should have mentioned this weakness in our first email.
>> Sorry about the oversight! This is definitely still a limitation that we
>> have not yet overcome. However, we do think that the increase in
>> attacker workload that you and Stefano are discussing could still be
>> valuable to security conscious Xen users.
>>
>> It would nice to find additional architecture features that we can use
>> to close this hole on arm, but there aren't any that stand out to me
>> either.
>>
>> With this limitation in mind, what are the next steps we should take to
>> support this feature for the xen community? Is this increase in attacker
>> workload meaningful enough to justify the inclusion of VMF in Xen?
>
> I think it could be valuable as an additional obstacle for the attacker
> to overcome. The next step would be to port your series on top of
> Julien's "Remove the directmap" patch series
> https://marc.info/?l=xen-devel&m=167119090721116
>
> Julien, what do you think?
If we want Xen to be used in confidential compute, then we need a
compelling story and prove that we are at least as secure as other
hypervisors.
So I think we need to investigate a few areas:
* Can we protect the TTBR? I don't think this can be done with the
HW. But maybe I overlook it.
* Can VMF be extended to more use-cases? For instances, for
hypercalls, we could have bounce buffer.
* If we can't fully secure VMF, can the attack surface be reduced
(e.g. disable hypercalls at runtime/compile time)? Could we use a
different architecture (I am thinking something like pKVM [1])?
Cheers,
[1] https://lwn.net/Articles/836693/
--
Julien Grall
On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote: > Hi Stefano, > > On 22/12/2022 00:38, Stefano Stabellini wrote: > > On Tue, 20 Dec 2022, Smith, Jackson wrote: > > > > Hi Stefano, > > > > > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > > > > > > > On 13/12/2022 19:48, Smith, Jackson wrote: > > > > > > > Yes, we are familiar with the "secret-free hypervisor" work. As > > > you > > > > > > > point out, both our work and the secret-free hypervisor remove the > > > > > > > directmap region to mitigate the risk of leaking sensitive guest > > > > > > > secrets. However, our work is slightly different because it > > > > > > > additionally prevents attackers from tricking Xen into remapping a > > > > guest. > > > > > > > > > > > > I understand your goal, but I don't think this is achieved (see > > > > > > above). You would need an entity to prevent write to TTBR0_EL2 in > > > > > > order to fully protect it. > > > > > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > > > cannot > > > > > claim that the guest's secrets are 100% safe. > > > > > > > > > > But the attacker would have to follow the sequence you outlines > > > > above > > > > > to change Xen's pagetables and remap guest memory before > > > > accessing it. > > > > > It is an additional obstacle for attackers that want to steal other > > > > guests' > > > > > secrets. The size of the code that the attacker would need to inject > > > > > in Xen would need to be bigger and more complex. > > > > > > > > Right, that's why I wrote with a bit more work. However, the nuance > > > > you mention doesn't seem to be present in the cover letter: > > > > > > > > "This creates what we call "Software Enclaves", ensuring that an > > > > adversary with arbitrary code execution in the hypervisor STILL cannot > > > > read/write guest memory." > > > > > > > > So if the end goal if really to protect against *all* sort of > > > arbitrary > > > > code, > > > > then I think we should have a rough idea how this will look like in > > > Xen. > > > > > > > > From a brief look, it doesn't look like it would be possible to > > > prevent > > > > modification to TTBR0_EL2 (even from EL3). We would need to > > > > investigate if there are other bits in the architecture to help us. > > > > > > > > > > > > > > Every little helps :-) > > > > > > > > I can see how making the life of the attacker more difficult is > > > > appealing. > > > > Yet, the goal needs to be clarified and the risk with the approach > > > > acknowledged (see above). > > > > > > > > > > You're right, we should have mentioned this weakness in our first email. > > > Sorry about the oversight! This is definitely still a limitation that we > > > have not yet overcome. However, we do think that the increase in > > > attacker workload that you and Stefano are discussing could still be > > > valuable to security conscious Xen users. > > > > > > It would nice to find additional architecture features that we can use > > > to close this hole on arm, but there aren't any that stand out to me > > > either. > > > > > > With this limitation in mind, what are the next steps we should take to > > > support this feature for the xen community? Is this increase in attacker > > > workload meaningful enough to justify the inclusion of VMF in Xen? > > > > I think it could be valuable as an additional obstacle for the attacker > > to overcome. The next step would be to port your series on top of > > Julien's "Remove the directmap" patch series > > https://marc.info/?l=xen-devel&m=167119090721116 > > > > Julien, what do you think? > > If we want Xen to be used in confidential compute, then we need a compelling > story and prove that we are at least as secure as other hypervisors. > > So I think we need to investigate a few areas: > * Can we protect the TTBR? I don't think this can be done with the HW. > But maybe I overlook it. This can be done by running most of Xen at a lower EL, and having only a small trusted (and hopefully formally verified) kernel run at EL2. > * Can VMF be extended to more use-cases? For instances, for hypercalls, > we could have bounce buffer. > * If we can't fully secure VMF, can the attack surface be reduced (e.g. > disable hypercalls at runtime/compile time)? Could we use a different > architecture (I am thinking something like pKVM [1])? > > Cheers, > > [1] https://lwn.net/Articles/836693/ pKVM has been formally verified already, in the form of seKVM. So there very much is precident for this. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
On 22/12/2022 10:14, Demi Marie Obenour wrote: > On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote: >> Hi Stefano, >> >> On 22/12/2022 00:38, Stefano Stabellini wrote: >>> On Tue, 20 Dec 2022, Smith, Jackson wrote: >>>>> Hi Stefano, >>>>> >>>>> On 16/12/2022 01:46, Stefano Stabellini wrote: >>>>>> On Thu, 15 Dec 2022, Julien Grall wrote: >>>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote: >>>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As >>>> you >>>>>>>> point out, both our work and the secret-free hypervisor remove the >>>>>>>> directmap region to mitigate the risk of leaking sensitive guest >>>>>>>> secrets. However, our work is slightly different because it >>>>>>>> additionally prevents attackers from tricking Xen into remapping a >>>>> guest. >>>>>>> >>>>>>> I understand your goal, but I don't think this is achieved (see >>>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in >>>>>>> order to fully protect it. >>>>>> >>>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we >>>>> cannot >>>>>> claim that the guest's secrets are 100% safe. >>>>>> >>>>>> But the attacker would have to follow the sequence you outlines >>>>> above >>>>>> to change Xen's pagetables and remap guest memory before >>>>> accessing it. >>>>>> It is an additional obstacle for attackers that want to steal other >>>>> guests' >>>>>> secrets. The size of the code that the attacker would need to inject >>>>>> in Xen would need to be bigger and more complex. >>>>> >>>>> Right, that's why I wrote with a bit more work. However, the nuance >>>>> you mention doesn't seem to be present in the cover letter: >>>>> >>>>> "This creates what we call "Software Enclaves", ensuring that an >>>>> adversary with arbitrary code execution in the hypervisor STILL cannot >>>>> read/write guest memory." >>>>> >>>>> So if the end goal if really to protect against *all* sort of >>>> arbitrary >>>>> code, >>>>> then I think we should have a rough idea how this will look like in >>>> Xen. >>>>> >>>>> From a brief look, it doesn't look like it would be possible to >>>> prevent >>>>> modification to TTBR0_EL2 (even from EL3). We would need to >>>>> investigate if there are other bits in the architecture to help us. >>>>> >>>>>> >>>>>> Every little helps :-) >>>>> >>>>> I can see how making the life of the attacker more difficult is >>>>> appealing. >>>>> Yet, the goal needs to be clarified and the risk with the approach >>>>> acknowledged (see above). >>>>> >>>> >>>> You're right, we should have mentioned this weakness in our first email. >>>> Sorry about the oversight! This is definitely still a limitation that we >>>> have not yet overcome. However, we do think that the increase in >>>> attacker workload that you and Stefano are discussing could still be >>>> valuable to security conscious Xen users. >>>> >>>> It would nice to find additional architecture features that we can use >>>> to close this hole on arm, but there aren't any that stand out to me >>>> either. >>>> >>>> With this limitation in mind, what are the next steps we should take to >>>> support this feature for the xen community? Is this increase in attacker >>>> workload meaningful enough to justify the inclusion of VMF in Xen? >>> >>> I think it could be valuable as an additional obstacle for the attacker >>> to overcome. The next step would be to port your series on top of >>> Julien's "Remove the directmap" patch series >>> https://marc.info/?l=xen-devel&m=167119090721116 >>> >>> Julien, what do you think? >> >> If we want Xen to be used in confidential compute, then we need a compelling >> story and prove that we are at least as secure as other hypervisors. >> >> So I think we need to investigate a few areas: >> * Can we protect the TTBR? I don't think this can be done with the HW. >> But maybe I overlook it. > > This can be done by running most of Xen at a lower EL, and having only a > small trusted (and hopefully formally verified) kernel run at EL2. This is what I hinted in my 3rd bullet. :) I didn't consider this for the first bullet because the goal of this question is to figure out whether we can leave all Xen running in EL2 and still have the same guarantee. Cheers, -- Julien Grall
On Thu, Dec 22, 2022 at 10:21:57AM +0000, Julien Grall wrote: > > > On 22/12/2022 10:14, Demi Marie Obenour wrote: > > On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 22/12/2022 00:38, Stefano Stabellini wrote: > > > > On Tue, 20 Dec 2022, Smith, Jackson wrote: > > > > > > Hi Stefano, > > > > > > > > > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > > > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > > > > > > > > > On 13/12/2022 19:48, Smith, Jackson wrote: > > > > > > > > > Yes, we are familiar with the "secret-free hypervisor" work. As > > > > > you > > > > > > > > > point out, both our work and the secret-free hypervisor remove the > > > > > > > > > directmap region to mitigate the risk of leaking sensitive guest > > > > > > > > > secrets. However, our work is slightly different because it > > > > > > > > > additionally prevents attackers from tricking Xen into remapping a > > > > > > guest. > > > > > > > > > > > > > > > > I understand your goal, but I don't think this is achieved (see > > > > > > > > above). You would need an entity to prevent write to TTBR0_EL2 in > > > > > > > > order to fully protect it. > > > > > > > > > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > > > > > cannot > > > > > > > claim that the guest's secrets are 100% safe. > > > > > > > > > > > > > > But the attacker would have to follow the sequence you outlines > > > > > > above > > > > > > > to change Xen's pagetables and remap guest memory before > > > > > > accessing it. > > > > > > > It is an additional obstacle for attackers that want to steal other > > > > > > guests' > > > > > > > secrets. The size of the code that the attacker would need to inject > > > > > > > in Xen would need to be bigger and more complex. > > > > > > > > > > > > Right, that's why I wrote with a bit more work. However, the nuance > > > > > > you mention doesn't seem to be present in the cover letter: > > > > > > > > > > > > "This creates what we call "Software Enclaves", ensuring that an > > > > > > adversary with arbitrary code execution in the hypervisor STILL cannot > > > > > > read/write guest memory." > > > > > > > > > > > > So if the end goal if really to protect against *all* sort of > > > > > arbitrary > > > > > > code, > > > > > > then I think we should have a rough idea how this will look like in > > > > > Xen. > > > > > > > > > > > > From a brief look, it doesn't look like it would be possible to > > > > > prevent > > > > > > modification to TTBR0_EL2 (even from EL3). We would need to > > > > > > investigate if there are other bits in the architecture to help us. > > > > > > > > > > > > > > > > > > > > Every little helps :-) > > > > > > > > > > > > I can see how making the life of the attacker more difficult is > > > > > > appealing. > > > > > > Yet, the goal needs to be clarified and the risk with the approach > > > > > > acknowledged (see above). > > > > > > > > > > > > > > > > You're right, we should have mentioned this weakness in our first email. > > > > > Sorry about the oversight! This is definitely still a limitation that we > > > > > have not yet overcome. However, we do think that the increase in > > > > > attacker workload that you and Stefano are discussing could still be > > > > > valuable to security conscious Xen users. > > > > > > > > > > It would nice to find additional architecture features that we can use > > > > > to close this hole on arm, but there aren't any that stand out to me > > > > > either. > > > > > > > > > > With this limitation in mind, what are the next steps we should take to > > > > > support this feature for the xen community? Is this increase in attacker > > > > > workload meaningful enough to justify the inclusion of VMF in Xen? > > > > > > > > I think it could be valuable as an additional obstacle for the attacker > > > > to overcome. The next step would be to port your series on top of > > > > Julien's "Remove the directmap" patch series > > > > https://marc.info/?l=xen-devel&m=167119090721116 > > > > > > > > Julien, what do you think? > > > > > > If we want Xen to be used in confidential compute, then we need a compelling > > > story and prove that we are at least as secure as other hypervisors. > > > > > > So I think we need to investigate a few areas: > > > * Can we protect the TTBR? I don't think this can be done with the HW. > > > But maybe I overlook it. > > > > This can be done by running most of Xen at a lower EL, and having only a > > small trusted (and hopefully formally verified) kernel run at EL2. > > This is what I hinted in my 3rd bullet. :) I didn't consider this for the > first bullet because the goal of this question is to figure out whether we > can leave all Xen running in EL2 and still have the same guarantee. It should be possible (see Google Native Client) but whether or not it is useful is questionable. I expect the complexity of the needed compiler patches and binary-level static analysis to be greater than that of running most of Xen at a lower exception level. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote: > -----Original Message----- > > From: Julien Grall <julien@xen.org> > > Sent: Friday, December 16, 2022 3:39 AM > > > > Hi Stefano, > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote: > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As > you > > >>> point out, both our work and the secret-free hypervisor remove the > > >>> directmap region to mitigate the risk of leaking sensitive guest > > >>> secrets. However, our work is slightly different because it > > >>> additionally prevents attackers from tricking Xen into remapping a > > guest. > > >> > > >> I understand your goal, but I don't think this is achieved (see > > >> above). You would need an entity to prevent write to TTBR0_EL2 in > > >> order to fully protect it. > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > cannot > > > claim that the guest's secrets are 100% safe. > > > > > > But the attacker would have to follow the sequence you outlines > > above > > > to change Xen's pagetables and remap guest memory before > > accessing it. > > > It is an additional obstacle for attackers that want to steal other > > guests' > > > secrets. The size of the code that the attacker would need to inject > > > in Xen would need to be bigger and more complex. > > > > Right, that's why I wrote with a bit more work. However, the nuance > > you mention doesn't seem to be present in the cover letter: > > > > "This creates what we call "Software Enclaves", ensuring that an > > adversary with arbitrary code execution in the hypervisor STILL cannot > > read/write guest memory." > > > > So if the end goal if really to protect against *all* sort of > arbitrary > > code, > > then I think we should have a rough idea how this will look like in > Xen. > > > > From a brief look, it doesn't look like it would be possible to > prevent > > modification to TTBR0_EL2 (even from EL3). We would need to > > investigate if there are other bits in the architecture to help us. > > > > > > > > Every little helps :-) > > > > I can see how making the life of the attacker more difficult is > > appealing. > > Yet, the goal needs to be clarified and the risk with the approach > > acknowledged (see above). > > > > You're right, we should have mentioned this weakness in our first email. > Sorry about the oversight! This is definitely still a limitation that we > have not yet overcome. However, we do think that the increase in > attacker workload that you and Stefano are discussing could still be > valuable to security conscious Xen users. > > It would nice to find additional architecture features that we can use > to close this hole on arm, but there aren't any that stand out to me > either. > > With this limitation in mind, what are the next steps we should take to > support this feature for the xen community? Is this increase in attacker > workload meaningful enough to justify the inclusion of VMF in Xen? Personally, I don’t think so. The kinds of workloads VMF is usable for (no hypercalls) are likely easily portable to other hypervisors, including formally verified microkernels such as seL4 that provide a significantly higher level of assurance. seL4’s proofs do need to be ported to each particular board, but this is fairly simple. Conversely, workloads that need Xen’s features cannot use VMF, so VMF again is not suitable. Have you considered other approaches to improving security, such as fuzzing Xen’s hypercall interface or even using formal methods? Those would benefit all users of Xen, not merely a small subset who already have alternatives available. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
On Tue, 20 Dec 2022, Demi Marie Obenour wrote: > On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote: > > > Hi Stefano, > > > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote: > > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As > > you > > > >>> point out, both our work and the secret-free hypervisor remove the > > > >>> directmap region to mitigate the risk of leaking sensitive guest > > > >>> secrets. However, our work is slightly different because it > > > >>> additionally prevents attackers from tricking Xen into remapping a > > > guest. > > > >> > > > >> I understand your goal, but I don't think this is achieved (see > > > >> above). You would need an entity to prevent write to TTBR0_EL2 in > > > >> order to fully protect it. > > > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > > cannot > > > > claim that the guest's secrets are 100% safe. > > > > > > > > But the attacker would have to follow the sequence you outlines > > > above > > > > to change Xen's pagetables and remap guest memory before > > > accessing it. > > > > It is an additional obstacle for attackers that want to steal other > > > guests' > > > > secrets. The size of the code that the attacker would need to inject > > > > in Xen would need to be bigger and more complex. > > > > > > Right, that's why I wrote with a bit more work. However, the nuance > > > you mention doesn't seem to be present in the cover letter: > > > > > > "This creates what we call "Software Enclaves", ensuring that an > > > adversary with arbitrary code execution in the hypervisor STILL cannot > > > read/write guest memory." > > > > > > So if the end goal if really to protect against *all* sort of > > arbitrary > > > code, > > > then I think we should have a rough idea how this will look like in > > Xen. > > > > > > From a brief look, it doesn't look like it would be possible to > > prevent > > > modification to TTBR0_EL2 (even from EL3). We would need to > > > investigate if there are other bits in the architecture to help us. > > > > > > > > > > > Every little helps :-) > > > > > > I can see how making the life of the attacker more difficult is > > > appealing. > > > Yet, the goal needs to be clarified and the risk with the approach > > > acknowledged (see above). > > > > > > > You're right, we should have mentioned this weakness in our first email. > > Sorry about the oversight! This is definitely still a limitation that we > > have not yet overcome. However, we do think that the increase in > > attacker workload that you and Stefano are discussing could still be > > valuable to security conscious Xen users. > > > > It would nice to find additional architecture features that we can use > > to close this hole on arm, but there aren't any that stand out to me > > either. > > > > With this limitation in mind, what are the next steps we should take to > > support this feature for the xen community? Is this increase in attacker > > workload meaningful enough to justify the inclusion of VMF in Xen? > > Personally, I don’t think so. The kinds of workloads VMF is usable > for (no hypercalls) are likely easily portable to other hypervisors, > including formally verified microkernels such as seL4 that provide... What other hypervisors might or might not do should not be a factor in this discussion and it would be best to leave it aside. From an AMD/Xilinx point of view, most of our customers using Xen in productions today don't use any hypercalls in one or more of their VMs. Xen is great for these use-cases and it is rather common in embedded. It is certainly a different configuration from what most are come to expect from Xen on the server/desktop x86 side. There is no question that guests without hypercalls are important for Xen on ARM. As a Xen community we have a long history and strong interest in making Xen more secure and also, more recently, safer (in the ISO 26262 safety-certification sense). The VMF work is very well aligned with both of these efforts and any additional burder to attackers is certainly good for Xen. Now the question is what changes are necessary and how to make them to the codebase. And if it turns out that some of the changes are not applicable or too complex to accept, the decision will be made purely from a code maintenance point of view and will have nothing to do with VMs making no hypercalls being unimportant (i.e. if we don't accept one or more patches is not going to have anything to do with the use-case being unimportant or what other hypervisors might or might not do).
Hi Stefano, On 22/12/2022 00:53, Stefano Stabellini wrote: > On Tue, 20 Dec 2022, Demi Marie Obenour wrote: >> On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote: >>>> Hi Stefano, >>>> >>>> On 16/12/2022 01:46, Stefano Stabellini wrote: >>>>> On Thu, 15 Dec 2022, Julien Grall wrote: >>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote: >>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As >>> you >>>>>>> point out, both our work and the secret-free hypervisor remove the >>>>>>> directmap region to mitigate the risk of leaking sensitive guest >>>>>>> secrets. However, our work is slightly different because it >>>>>>> additionally prevents attackers from tricking Xen into remapping a >>>> guest. >>>>>> >>>>>> I understand your goal, but I don't think this is achieved (see >>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in >>>>>> order to fully protect it. >>>>> >>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we >>>> cannot >>>>> claim that the guest's secrets are 100% safe. >>>>> >>>>> But the attacker would have to follow the sequence you outlines >>>> above >>>>> to change Xen's pagetables and remap guest memory before >>>> accessing it. >>>>> It is an additional obstacle for attackers that want to steal other >>>> guests' >>>>> secrets. The size of the code that the attacker would need to inject >>>>> in Xen would need to be bigger and more complex. >>>> >>>> Right, that's why I wrote with a bit more work. However, the nuance >>>> you mention doesn't seem to be present in the cover letter: >>>> >>>> "This creates what we call "Software Enclaves", ensuring that an >>>> adversary with arbitrary code execution in the hypervisor STILL cannot >>>> read/write guest memory." >>>> >>>> So if the end goal if really to protect against *all* sort of >>> arbitrary >>>> code, >>>> then I think we should have a rough idea how this will look like in >>> Xen. >>>> >>>> From a brief look, it doesn't look like it would be possible to >>> prevent >>>> modification to TTBR0_EL2 (even from EL3). We would need to >>>> investigate if there are other bits in the architecture to help us. >>>> >>>>> >>>>> Every little helps :-) >>>> >>>> I can see how making the life of the attacker more difficult is >>>> appealing. >>>> Yet, the goal needs to be clarified and the risk with the approach >>>> acknowledged (see above). >>>> >>> >>> You're right, we should have mentioned this weakness in our first email. >>> Sorry about the oversight! This is definitely still a limitation that we >>> have not yet overcome. However, we do think that the increase in >>> attacker workload that you and Stefano are discussing could still be >>> valuable to security conscious Xen users. >>> >>> It would nice to find additional architecture features that we can use >>> to close this hole on arm, but there aren't any that stand out to me >>> either. >>> >>> With this limitation in mind, what are the next steps we should take to >>> support this feature for the xen community? Is this increase in attacker >>> workload meaningful enough to justify the inclusion of VMF in Xen? >> >> Personally, I don’t think so. The kinds of workloads VMF is usable >> for (no hypercalls) are likely easily portable to other hypervisors, >> including formally verified microkernels such as seL4 that provide... > > What other hypervisors might or might not do should not be a factor in > this discussion and it would be best to leave it aside. To be honest, Demi has a point. At the moment, VMF is a very niche use-case (see more below). So you would end up to use less than 10% of the normal Xen on Arm code. A lot of people will likely wonder why using Xen in this case? > > From an AMD/Xilinx point of view, most of our customers using Xen in > productions today don't use any hypercalls in one or more of their VMs. This suggests a mix of guests are running (some using hypercalls and other not). It would not be possible if you were using VMF. > Xen is great for these use-cases and it is rather common in embedded. > It is certainly a different configuration from what most are come to > expect from Xen on the server/desktop x86 side. There is no question > that guests without hypercalls are important for Xen on ARM. > > As a Xen community we have a long history and strong interest in making > Xen more secure and also, more recently, safer (in the ISO 26262 > safety-certification sense). The VMF work is very well aligned with both > of these efforts and any additional burder to attackers is certainly > good for Xen. I agree that we have a strong focus on making Xen more secure. However, we also need to look at the use cases for it. As it stands, there will no: - IOREQ use (don't think about emulating TPM) - GICv3 ITS - stage-1 SMMUv3 - decoding of instructions when there is no syndrome - hypercalls (including event channels) - dom0 That's a lot of Xen features that can't be used. Effectively you will make Xen more "secure" for a very few users. > > Now the question is what changes are necessary and how to make them to > the codebase. And if it turns out that some of the changes are not > applicable or too complex to accept, the decision will be made purely > from a code maintenance point of view and will have nothing to do with > VMs making no hypercalls being unimportant (i.e. if we don't accept one > or more patches is not going to have anything to do with the use-case > being unimportant or what other hypervisors might or might not do). I disagree, I think this is also about use cases. On the paper VMF look very great, but so far it still has a big flaw (the TTBR can be changed) and it would restrict a lot what you can do. To me, if you can't secure the TTBR, then there are other way to improve the security of Xen for the same setup and more. The biggest attack surface of Xen on Arm today are the hypercalls. So if you remove hypercalls access to the guest (or even compile out), then there is a lot less chance for an attacker to compromise Xen. This is not exactly the same guarantee as VMF. But as I wrote before, if the attacker has access to Xen, then you are already doomed because you have to assume they can switch the TTBR. Cheers, -- Julien Grall
On Thu, 22 Dec 2022, Julien Grall wrote: > > What other hypervisors might or might not do should not be a factor in > > this discussion and it would be best to leave it aside. > > To be honest, Demi has a point. At the moment, VMF is a very niche use-case > (see more below). So you would end up to use less than 10% of the normal Xen > on Arm code. A lot of people will likely wonder why using Xen in this case? [...] > > From an AMD/Xilinx point of view, most of our customers using Xen in > > productions today don't use any hypercalls in one or more of their VMs. > This suggests a mix of guests are running (some using hypercalls and other > not). It would not be possible if you were using VMF. It is true that the current limitations are very restrictive. In embedded, we have a few pure static partitioning deployments where no hypercalls are required (Linux is using hypercalls today but it could do without), so maybe VMF could be enabled, but admittedly in those cases the main focus today is safety and fault tolerance, rather than confidential computing. > > Xen is great for these use-cases and it is rather common in embedded. > > It is certainly a different configuration from what most are come to > > expect from Xen on the server/desktop x86 side. There is no question > > that guests without hypercalls are important for Xen on ARM. > > > As a Xen community we have a long history and strong interest in making > > Xen more secure and also, more recently, safer (in the ISO 26262 > > safety-certification sense). The VMF work is very well aligned with both > > of these efforts and any additional burder to attackers is certainly > > good for Xen. > > I agree that we have a strong focus on making Xen more secure. However, we > also need to look at the use cases for it. As it stands, there will no: > - IOREQ use (don't think about emulating TPM) > - GICv3 ITS > - stage-1 SMMUv3 > - decoding of instructions when there is no syndrome > - hypercalls (including event channels) > - dom0 > > That's a lot of Xen features that can't be used. Effectively you will make Xen > more "secure" for a very few users. Among these, the main problems affecting AMD/Xilinx users today would be: - decoding of instructions - hypercalls, especially event channels Decoding of instructions would affect all our deployments. For hypercalls, even in static partitioning deployments, sometimes event channels are used for VM-to-VM notifications. > > Now the question is what changes are necessary and how to make them to > > the codebase. And if it turns out that some of the changes are not > > applicable or too complex to accept, the decision will be made purely > > from a code maintenance point of view and will have nothing to do with > > VMs making no hypercalls being unimportant (i.e. if we don't accept one > > or more patches is not going to have anything to do with the use-case > > being unimportant or what other hypervisors might or might not do). > I disagree, I think this is also about use cases. On the paper VMF look very > great, but so far it still has a big flaw (the TTBR can be changed) and it > would restrict a lot what you can do. We would need to be very clear in the commit messages and documentation that with the current version of VMF we do *not* achieve confidential computing and we do *not* offer protections comparable to AMD SEV. It is still possible for Xen to access guest data, it is just a bit harder. From an implementation perspective, if we can find a way to implement it that would be easy to maintain, then it might still be worth it. It would probably take only a small amount of changes on top of the "Remove the directmap" series to make it so "map_domain_page" doesn't work anymore after boot. That might be worth exploring if you and Jackson agree? One thing that would make it much more widely applicable is your idea of hypercalls bounce buffers. VMF might work with hypercalls if the guest always uses the same buffer to pass hypercalls parameters to Xen. That one buffer could remain mapped in Xen for the lifetime of the VM and the VM would know to use it only to pass parameters to Xen.
Hi Stefano, On 22/12/2022 21:28, Stefano Stabellini wrote: > On Thu, 22 Dec 2022, Julien Grall wrote: >>> What other hypervisors might or might not do should not be a factor in >>> this discussion and it would be best to leave it aside. >> >> To be honest, Demi has a point. At the moment, VMF is a very niche use-case >> (see more below). So you would end up to use less than 10% of the normal Xen >> on Arm code. A lot of people will likely wonder why using Xen in this case? > > [...] > >>> From an AMD/Xilinx point of view, most of our customers using Xen in >>> productions today don't use any hypercalls in one or more of their VMs. >> This suggests a mix of guests are running (some using hypercalls and other >> not). It would not be possible if you were using VMF. > > It is true that the current limitations are very restrictive. > > In embedded, we have a few pure static partitioning deployments where no > hypercalls are required (Linux is using hypercalls today but it could do > without), so maybe VMF could be enabled, but admittedly in those cases > the main focus today is safety and fault tolerance, rather than > confidential computing. > > >>> Xen is great for these use-cases and it is rather common in embedded. >>> It is certainly a different configuration from what most are come to >>> expect from Xen on the server/desktop x86 side. There is no question >>> that guests without hypercalls are important for Xen on ARM. > >>> As a Xen community we have a long history and strong interest in making >>> Xen more secure and also, more recently, safer (in the ISO 26262 >>> safety-certification sense). The VMF work is very well aligned with both >>> of these efforts and any additional burder to attackers is certainly >>> good for Xen. >> >> I agree that we have a strong focus on making Xen more secure. However, we >> also need to look at the use cases for it. As it stands, there will no: >> - IOREQ use (don't think about emulating TPM) >> - GICv3 ITS >> - stage-1 SMMUv3 >> - decoding of instructions when there is no syndrome >> - hypercalls (including event channels) >> - dom0 >> >> That's a lot of Xen features that can't be used. Effectively you will make Xen >> more "secure" for a very few users. > > Among these, the main problems affecting AMD/Xilinx users today would be: > - decoding of instructions > - hypercalls, especially event channels > > Decoding of instructions would affect all our deployments. For > hypercalls, even in static partitioning deployments, sometimes event > channels are used for VM-to-VM notifications. > > >>> Now the question is what changes are necessary and how to make them to >>> the codebase. And if it turns out that some of the changes are not >>> applicable or too complex to accept, the decision will be made purely >>> from a code maintenance point of view and will have nothing to do with >>> VMs making no hypercalls being unimportant (i.e. if we don't accept one >>> or more patches is not going to have anything to do with the use-case >>> being unimportant or what other hypervisors might or might not do). >> I disagree, I think this is also about use cases. On the paper VMF look very >> great, but so far it still has a big flaw (the TTBR can be changed) and it >> would restrict a lot what you can do. > > We would need to be very clear in the commit messages and documentation > that with the current version of VMF we do *not* achieve confidential > computing and we do *not* offer protections comparable to AMD SEV. It is > still possible for Xen to access guest data, it is just a bit harder. > > From an implementation perspective, if we can find a way to implement it > that would be easy to maintain, then it might still be worth it. It > would probably take only a small amount of changes on top of the "Remove > the directmap" series to make it so "map_domain_page" doesn't work > anymore after boot. None of the callers of map_domain_page() expect the function to fais. So some treewide changes will be needed in order to deal with map_domain_page() not working. This is not something I am willing to accept if the only user is VMF (at the moment I can't think of any other). So instead, we would need to come up with a way where map_domain_page() will never be called at runtime when VMF is in use (maybe by compiling out some code?). I haven't really looked in details to say whether that's feasiable. > > That might be worth exploring if you and Jackson agree? I am OK to continue explore it because I think some bits will be still useful for the general use. As for the full solution, I will wait and see the results before deciding whether this is something that I would be happy to merge/maintain. Cheers, -- Julien Grall
On Wed, Dec 21, 2022 at 04:53:46PM -0800, Stefano Stabellini wrote: > On Tue, 20 Dec 2022, Demi Marie Obenour wrote: > > On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote: > > > > Hi Stefano, > > > > > > > > On 16/12/2022 01:46, Stefano Stabellini wrote: > > > > > On Thu, 15 Dec 2022, Julien Grall wrote: > > > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote: > > > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As > > > you > > > > >>> point out, both our work and the secret-free hypervisor remove the > > > > >>> directmap region to mitigate the risk of leaking sensitive guest > > > > >>> secrets. However, our work is slightly different because it > > > > >>> additionally prevents attackers from tricking Xen into remapping a > > > > guest. > > > > >> > > > > >> I understand your goal, but I don't think this is achieved (see > > > > >> above). You would need an entity to prevent write to TTBR0_EL2 in > > > > >> order to fully protect it. > > > > > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we > > > > cannot > > > > > claim that the guest's secrets are 100% safe. > > > > > > > > > > But the attacker would have to follow the sequence you outlines > > > > above > > > > > to change Xen's pagetables and remap guest memory before > > > > accessing it. > > > > > It is an additional obstacle for attackers that want to steal other > > > > guests' > > > > > secrets. The size of the code that the attacker would need to inject > > > > > in Xen would need to be bigger and more complex. > > > > > > > > Right, that's why I wrote with a bit more work. However, the nuance > > > > you mention doesn't seem to be present in the cover letter: > > > > > > > > "This creates what we call "Software Enclaves", ensuring that an > > > > adversary with arbitrary code execution in the hypervisor STILL cannot > > > > read/write guest memory." > > > > > > > > So if the end goal if really to protect against *all* sort of > > > arbitrary > > > > code, > > > > then I think we should have a rough idea how this will look like in > > > Xen. > > > > > > > > From a brief look, it doesn't look like it would be possible to > > > prevent > > > > modification to TTBR0_EL2 (even from EL3). We would need to > > > > investigate if there are other bits in the architecture to help us. > > > > > > > > > > > > > > Every little helps :-) > > > > > > > > I can see how making the life of the attacker more difficult is > > > > appealing. > > > > Yet, the goal needs to be clarified and the risk with the approach > > > > acknowledged (see above). > > > > > > > > > > You're right, we should have mentioned this weakness in our first email. > > > Sorry about the oversight! This is definitely still a limitation that we > > > have not yet overcome. However, we do think that the increase in > > > attacker workload that you and Stefano are discussing could still be > > > valuable to security conscious Xen users. > > > > > > It would nice to find additional architecture features that we can use > > > to close this hole on arm, but there aren't any that stand out to me > > > either. > > > > > > With this limitation in mind, what are the next steps we should take to > > > support this feature for the xen community? Is this increase in attacker > > > workload meaningful enough to justify the inclusion of VMF in Xen? > > > > Personally, I don’t think so. The kinds of workloads VMF is usable > > for (no hypercalls) are likely easily portable to other hypervisors, > > including formally verified microkernels such as seL4 that provide... > > What other hypervisors might or might not do should not be a factor in > this discussion and it would be best to leave it aside. Indeed so, sorry. > From an AMD/Xilinx point of view, most of our customers using Xen in > productions today don't use any hypercalls in one or more of their VMs. > Xen is great for these use-cases and it is rather common in embedded. > It is certainly a different configuration from what most are come to > expect from Xen on the server/desktop x86 side. There is no question > that guests without hypercalls are important for Xen on ARM. I was completely unaware of this. > As a Xen community we have a long history and strong interest in making > Xen more secure and also, more recently, safer (in the ISO 26262 > safety-certification sense). The VMF work is very well aligned with both > of these efforts and any additional burder to attackers is certainly > good for Xen. That it is. > Now the question is what changes are necessary and how to make them to > the codebase. And if it turns out that some of the changes are not > applicable or too complex to accept, the decision will be made purely > from a code maintenance point of view and will have nothing to do with > VMs making no hypercalls being unimportant (i.e. if we don't accept one > or more patches is not going to have anything to do with the use-case > being unimportant or what other hypervisors might or might not do). -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
© 2016 - 2026 Red Hat, Inc.