[PATCH 2/4] xsm/silo: Support hwdom/control domains

Jason Andryuk posted 4 patches 4 months, 3 weeks ago
There is a newer version of this series
[PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jason Andryuk 4 months, 3 weeks ago
In a disaggregated environment, dom0 is split into Control, Hardware,
and Xenstore domains, along with domUs.  The is_control_domain() check
is not sufficient to handle all these cases.  Add is_priv_domain() to
support allowing for the various domains.

The purpose of SILO mode is to prevent domUs from interacting with each
other.  But dom0 was allowed to communicate with domUs to provide
services.  As the disaggregation of dom0, Control, Hardware and Xenstore
are all service domains that need to communicate with other domains.

To provide xenstore connections, the Xenstore domain must be allowed to
connect via grants and event channels.  Xenstore domain must also be
allowed to connect to Control and Hardware to provide xenstore to them.

Hardware domain will provide PV devices to domains, so it must be
allowed to connect to domains.

That leaves Control.  Xenstore and Hardware would already allow access
to Control, so it can obtain services that way.  Control should be
"privileged", which would mean it can make the connections.  But with
Xenstore and Hardware providing their services to domUs, there may not
be a reason to allow Control to use grants or event channels with domUs.
Still, Control is privileged, so it should be allowed to do something if
it chooses.  Establishing a grant, or event channel requires action on
both sides, so allow for the possibility.  This does open up an argo
wildcard ring from domUs, FWIW.

This silo check is for grants, event channels and argo.  The dummy
policy handles other calls, so Hardware is prevented from foreign
mapping Control's memory with that.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
v2:
Add xenstore domain
Expand commit message
Remove always_inline
---
 xen/xsm/silo.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/xsm/silo.c b/xen/xsm/silo.c
index b89b364287..db48705674 100644
--- a/xen/xsm/silo.c
+++ b/xen/xsm/silo.c
@@ -20,6 +20,12 @@
 #define XSM_NO_WRAPPERS
 #include <xsm/dummy.h>
 
+static bool is_priv_domain(const struct domain *d)
+{
+    return is_xenstore_domain(d) || is_hardware_domain(d) ||
+           is_control_domain(d);
+}
+
 /*
  * Check if inter-domain communication is allowed.
  * Return true when pass check.
@@ -29,8 +35,8 @@ static bool silo_mode_dom_check(const struct domain *ldom,
 {
     const struct domain *currd = current->domain;
 
-    return (is_control_domain(currd) || is_control_domain(ldom) ||
-            is_control_domain(rdom) || ldom == rdom);
+    return (is_priv_domain(currd) || is_priv_domain(ldom) ||
+            is_priv_domain(rdom) || ldom == rdom);
 }
 
 static int cf_check silo_evtchn_unbound(
-- 
2.49.0
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jan Beulich 4 months, 3 weeks ago
On 11.06.2025 00:57, Jason Andryuk wrote:
> In a disaggregated environment, dom0 is split into Control, Hardware,
> and Xenstore domains, along with domUs.  The is_control_domain() check
> is not sufficient to handle all these cases.  Add is_priv_domain() to
> support allowing for the various domains.
> 
> The purpose of SILO mode is to prevent domUs from interacting with each
> other.  But dom0 was allowed to communicate with domUs to provide
> services.  As the disaggregation of dom0, Control, Hardware and Xenstore
> are all service domains that need to communicate with other domains.
> 
> To provide xenstore connections, the Xenstore domain must be allowed to
> connect via grants and event channels.  Xenstore domain must also be
> allowed to connect to Control and Hardware to provide xenstore to them.

Are you suggesting that SILO at present is incompatible with a Xenstore
domain? silo_mode_dom_check() in its original form has no special
precautions, after all.

> Hardware domain will provide PV devices to domains, so it must be
> allowed to connect to domains.

As a built-in policy, isn't this already going too far? There could
conceivably be configurations with only pass-through devices in use, in
which case neither grants nor the event channels operations intercepted
by SILO would be required.

> That leaves Control.  Xenstore and Hardware would already allow access
> to Control, so it can obtain services that way.  Control should be
> "privileged", which would mean it can make the connections.  But with
> Xenstore and Hardware providing their services to domUs, there may not
> be a reason to allow Control to use grants or event channels with domUs.
> Still, Control is privileged, so it should be allowed to do something if
> it chooses.  Establishing a grant, or event channel requires action on
> both sides, so allow for the possibility.  This does open up an argo
> wildcard ring from domUs, FWIW.

Along the lines of my reply to patch 1, I think Hardware and Control
need to have a pretty strong boundary between them. It's hard to see,
for example, whether grant map/copy/transfer would indeed make sense
between the two.

Similarly I'm not convinced a strong boundary isn't also needed
between Xenstore and Hardware.

> --- a/xen/xsm/silo.c
> +++ b/xen/xsm/silo.c
> @@ -20,6 +20,12 @@
>  #define XSM_NO_WRAPPERS
>  #include <xsm/dummy.h>
>  
> +static bool is_priv_domain(const struct domain *d)
> +{
> +    return is_xenstore_domain(d) || is_hardware_domain(d) ||
> +           is_control_domain(d);
> +}

This construct expands to two evaluate_nospec(), which likely isn't
wanted. Some open-coding may be pretty much unavoidable here. (I'm
surprised it's not three, i.e. I find it odd that is_xenstore_domain()
doesn't also use that guard.)

Jan
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jason Andryuk 4 months, 3 weeks ago
On 2025-06-11 09:17, Jan Beulich wrote:
> On 11.06.2025 00:57, Jason Andryuk wrote:
>> In a disaggregated environment, dom0 is split into Control, Hardware,
>> and Xenstore domains, along with domUs.  The is_control_domain() check
>> is not sufficient to handle all these cases.  Add is_priv_domain() to
>> support allowing for the various domains.
>>
>> The purpose of SILO mode is to prevent domUs from interacting with each
>> other.  But dom0 was allowed to communicate with domUs to provide
>> services.  As the disaggregation of dom0, Control, Hardware and Xenstore
>> are all service domains that need to communicate with other domains.
>>
>> To provide xenstore connections, the Xenstore domain must be allowed to
>> connect via grants and event channels.  Xenstore domain must also be
>> allowed to connect to Control and Hardware to provide xenstore to them.
> 
> Are you suggesting that SILO at present is incompatible with a Xenstore
> domain? silo_mode_dom_check() in its original form has no special
> precautions, after all.

Yes, it is incompatible with the current silo_mode_dom_check().  Only 
Control domain is allowed to use grants and event channels with a domU. 
A Xenstore domain would be denied.

Xenstore stubdom only exists for x86 today.  My limited attempts to run 
xenstored in an dedicated Xenstore ARM Linux domain have failed.

>> Hardware domain will provide PV devices to domains, so it must be
>> allowed to connect to domains.
> 
> As a built-in policy, isn't this already going too far? There could
> conceivably be configurations with only pass-through devices in use, in
> which case neither grants nor the event channels operations intercepted
> by SILO would be required.

Such a domain wouldn't have any PV devices configured?  I don't think 
this changes anything compared to today.

Both sides need to be configured and opt-in.  Hardware is a system 
domain, so it should be possible to allow grants and event channels. 
But they won't be used unless configured.

>> That leaves Control.  Xenstore and Hardware would already allow access
>> to Control, so it can obtain services that way.  Control should be
>> "privileged", which would mean it can make the connections.  But with
>> Xenstore and Hardware providing their services to domUs, there may not
>> be a reason to allow Control to use grants or event channels with domUs.
>> Still, Control is privileged, so it should be allowed to do something if
>> it chooses.  Establishing a grant, or event channel requires action on
>> both sides, so allow for the possibility.  This does open up an argo
>> wildcard ring from domUs, FWIW.
> 
> Along the lines of my reply to patch 1, I think Hardware and Control
> need to have a pretty strong boundary between them. It's hard to see,
> for example, whether grant map/copy/transfer would indeed make sense
> between the two.

The Hardware domain might provide a PV device to Control?

I've tested removing control:
static bool is_priv_domain(const struct domain *d)
{
     return is_xenstore_domain(d) || is_hardware_domain(d);
}

And that works in my limited ARM dom0less testing.  The toolstack isn't 
really exercised in that case.  It seems strange that the privileged 
control domain is *not* allowed though.

> Similarly I'm not convinced a strong boundary isn't also needed
> between Xenstore and Hardware.

If hardware is providing PV devices to domains, it will need access to 
Xenstore.  I don't see how you can get around it.

I tried to explain this in the first paragraph.  SILO's purpose was to 
isolate domUs from each other, but allow it to access dom0.  dom0 
embodies the control, hardware, and xenstore capabilities.  So as a 
first approximation, each of Control, Hardware, and Xenstore should be 
allowed to communicate with domUs.

domUs needs to communicate with Xenstore and Hardware for PV devices.

Xenstore provides Xenstore access to Hardware.

Control would want Xenstore access.

I don't know if this helps, but here's a table:

     | CTL | HW  | XS  | domU
----------------------------
CTL |     |  ?  |  y  |  ?
HW  |  ?  |     |  y  |  y
XS  |  y  |  y  |     |  y
domU|  ?  |  y  |  y  |  x

Control and Hardware would be y if we allow PV devices

Control and domUs - I don't have an immediate rational for them.  Except 
that Control is privileged.  I've been running xenconsoled in Hardware. 
If xenconsoled is in Control, then access would be required.

>> --- a/xen/xsm/silo.c
>> +++ b/xen/xsm/silo.c
>> @@ -20,6 +20,12 @@
>>   #define XSM_NO_WRAPPERS
>>   #include <xsm/dummy.h>
>>   
>> +static bool is_priv_domain(const struct domain *d)
>> +{
>> +    return is_xenstore_domain(d) || is_hardware_domain(d) ||
>> +           is_control_domain(d);
>> +}
> 
> This construct expands to two evaluate_nospec(), which likely isn't
> wanted. Some open-coding may be pretty much unavoidable here.

Thanks, yes, good point.

> (I'm
> surprised it's not three, i.e. I find it odd that is_xenstore_domain()
> doesn't also use that guard.)

It looks okay to me.  There were only 2 uses until I added a 3rd in the 
dom0less code.  The XSM check has evaluate_nospec() and the other 2 uses 
aren't security critical - Setting a domain info flag, and __init code 
for dom0less.  Maybe moving the evaluate_nospec() would be safer in case 
use grows in the future, but it looks okay to me today.

Regards,
Jason
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jan Beulich 4 months, 2 weeks ago
On 11.06.2025 06:20, Jason Andryuk wrote:
> On 2025-06-11 09:17, Jan Beulich wrote:
>> On 11.06.2025 00:57, Jason Andryuk wrote:
>>> In a disaggregated environment, dom0 is split into Control, Hardware,
>>> and Xenstore domains, along with domUs.  The is_control_domain() check
>>> is not sufficient to handle all these cases.  Add is_priv_domain() to
>>> support allowing for the various domains.
>>>
>>> The purpose of SILO mode is to prevent domUs from interacting with each
>>> other.  But dom0 was allowed to communicate with domUs to provide
>>> services.  As the disaggregation of dom0, Control, Hardware and Xenstore
>>> are all service domains that need to communicate with other domains.
>>>
>>> To provide xenstore connections, the Xenstore domain must be allowed to
>>> connect via grants and event channels.  Xenstore domain must also be
>>> allowed to connect to Control and Hardware to provide xenstore to them.
>>
>> Are you suggesting that SILO at present is incompatible with a Xenstore
>> domain? silo_mode_dom_check() in its original form has no special
>> precautions, after all.
> 
> Yes, it is incompatible with the current silo_mode_dom_check().  Only 
> Control domain is allowed to use grants and event channels with a domU. 
> A Xenstore domain would be denied.
> 
> Xenstore stubdom only exists for x86 today.  My limited attempts to run 
> xenstored in an dedicated Xenstore ARM Linux domain have failed.

This may want sorting independently first. Once sorted, the requirements
here may become more clear.

>>> Hardware domain will provide PV devices to domains, so it must be
>>> allowed to connect to domains.
>>
>> As a built-in policy, isn't this already going too far? There could
>> conceivably be configurations with only pass-through devices in use, in
>> which case neither grants nor the event channels operations intercepted
>> by SILO would be required.
> 
> Such a domain wouldn't have any PV devices configured?

Indeed, that's my point: Why would Hardware then have a need to be
allowed to connect to domains.

>  I don't think this changes anything compared to today.

I don't think I see what you mean to tell me with this. What we're
discussing here is the effect of the separation you're suggesting,
which necessarily is different from what we have today.

> Both sides need to be configured and opt-in.  Hardware is a system 
> domain, so it should be possible to allow grants and event channels. 
> But they won't be used unless configured.

"Won't be used" isn't enough, imo. Isn't disaggregation about proper
isolation, i.e. to guarantee that unwanted interactions can't occur?

>>> That leaves Control.  Xenstore and Hardware would already allow access
>>> to Control, so it can obtain services that way.  Control should be
>>> "privileged", which would mean it can make the connections.  But with
>>> Xenstore and Hardware providing their services to domUs, there may not
>>> be a reason to allow Control to use grants or event channels with domUs.
>>> Still, Control is privileged, so it should be allowed to do something if
>>> it chooses.  Establishing a grant, or event channel requires action on
>>> both sides, so allow for the possibility.  This does open up an argo
>>> wildcard ring from domUs, FWIW.
>>
>> Along the lines of my reply to patch 1, I think Hardware and Control
>> need to have a pretty strong boundary between them. It's hard to see,
>> for example, whether grant map/copy/transfer would indeed make sense
>> between the two.
> 
> The Hardware domain might provide a PV device to Control?
> 
> I've tested removing control:
> static bool is_priv_domain(const struct domain *d)
> {
>      return is_xenstore_domain(d) || is_hardware_domain(d);
> }
> 
> And that works in my limited ARM dom0less testing.  The toolstack isn't 
> really exercised in that case.  It seems strange that the privileged 
> control domain is *not* allowed though.

With the intended separation, there's (imo) not going to be any
all-mighty domain anymore. Neither Hardware nor Control.

>> Similarly I'm not convinced a strong boundary isn't also needed
>> between Xenstore and Hardware.
> 
> If hardware is providing PV devices to domains, it will need access to 
> Xenstore.  I don't see how you can get around it.
> 
> I tried to explain this in the first paragraph.  SILO's purpose was to 
> isolate domUs from each other, but allow it to access dom0.  dom0 
> embodies the control, hardware, and xenstore capabilities.  So as a 
> first approximation, each of Control, Hardware, and Xenstore should be 
> allowed to communicate with domUs.

Yes. Yet what to permit between the three special entities is far less
clear. Hence why I'm unconvinced this can be expressed by SILO, and
would rather require Flask.

> domUs needs to communicate with Xenstore and Hardware for PV devices.
> 
> Xenstore provides Xenstore access to Hardware.
> 
> Control would want Xenstore access.
> 
> I don't know if this helps, but here's a table:
> 
>      | CTL | HW  | XS  | domU
> ----------------------------
> CTL |     |  ?  |  y  |  ?
> HW  |  ?  |     |  y  |  y
> XS  |  y  |  y  |     |  y
> domU|  ?  |  y  |  y  |  x
> 
> Control and Hardware would be y if we allow PV devices
> 
> Control and domUs - I don't have an immediate rational for them.  Except 
> that Control is privileged.  I've been running xenconsoled in Hardware. 
> If xenconsoled is in Control, then access would be required.

Perhaps some clarification is first need about what Control really is
(and is not). It is sole the domain to create other domains. But beyond
that things become unclear. E.g. xenconsoled may not belong into either
Hardware or Control.

>>> --- a/xen/xsm/silo.c
>>> +++ b/xen/xsm/silo.c
>>> @@ -20,6 +20,12 @@
>>>   #define XSM_NO_WRAPPERS
>>>   #include <xsm/dummy.h>
>>>   
>>> +static bool is_priv_domain(const struct domain *d)
>>> +{
>>> +    return is_xenstore_domain(d) || is_hardware_domain(d) ||
>>> +           is_control_domain(d);
>>> +}
>>
>> This construct expands to two evaluate_nospec(), which likely isn't
>> wanted. Some open-coding may be pretty much unavoidable here.
> 
> Thanks, yes, good point.
> 
>> (I'm
>> surprised it's not three, i.e. I find it odd that is_xenstore_domain()
>> doesn't also use that guard.)
> 
> It looks okay to me.  There were only 2 uses until I added a 3rd in the 
> dom0less code.  The XSM check has evaluate_nospec() and the other 2 uses 
> aren't security critical - Setting a domain info flag, and __init code 
> for dom0less.  Maybe moving the evaluate_nospec() would be safer in case 
> use grows in the future, but it looks okay to me today.

When some of the hardening was first introduced, actual use sites were
indeed taken into account. That wasn't quite right though, I think. Any
such construct ought to be safe to use anywhere. For uses with clearly
no concerns towards speculative abuse, a 2nd lightweight form of such
constructs should then exist, imo. As to your use of "security critical":
I'm not convinced you what mean is covering the potential of speculative
abuse of involved code paths.

Jan
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jason Andryuk 4 months, 2 weeks ago
On 2025-06-12 03:52, Jan Beulich wrote:
> On 11.06.2025 06:20, Jason Andryuk wrote:
>> On 2025-06-11 09:17, Jan Beulich wrote:
>>> On 11.06.2025 00:57, Jason Andryuk wrote:
>>>> In a disaggregated environment, dom0 is split into Control, Hardware,
>>>> and Xenstore domains, along with domUs.  The is_control_domain() check
>>>> is not sufficient to handle all these cases.  Add is_priv_domain() to
>>>> support allowing for the various domains.
>>>>
>>>> The purpose of SILO mode is to prevent domUs from interacting with each
>>>> other.  But dom0 was allowed to communicate with domUs to provide
>>>> services.  As the disaggregation of dom0, Control, Hardware and Xenstore
>>>> are all service domains that need to communicate with other domains.
>>>>
>>>> To provide xenstore connections, the Xenstore domain must be allowed to
>>>> connect via grants and event channels.  Xenstore domain must also be
>>>> allowed to connect to Control and Hardware to provide xenstore to them.
>>>
>>> Are you suggesting that SILO at present is incompatible with a Xenstore
>>> domain? silo_mode_dom_check() in its original form has no special
>>> precautions, after all.
>>
>> Yes, it is incompatible with the current silo_mode_dom_check().  Only
>> Control domain is allowed to use grants and event channels with a domU.
>> A Xenstore domain would be denied.
>>
>> Xenstore stubdom only exists for x86 today.  My limited attempts to run
>> xenstored in an dedicated Xenstore ARM Linux domain have failed.
> 
> This may want sorting independently first. Once sorted, the requirements
> here may become more clear.

HW+XS-> xenstore works
CTL+XS or XS -> the domain's console just stops.  vCPUs are in Linux cpu 
idle.  I haven't figured out more.  This required some Linux changes to 
query the capabilities since XS isn't exposed and ARM assumes initial 
domain implies HW + CTL.  It's orthogonal to my goals, so I haven't 
looked too hard.

>>>> Hardware domain will provide PV devices to domains, so it must be
>>>> allowed to connect to domains.
>>>
>>> As a built-in policy, isn't this already going too far? There could
>>> conceivably be configurations with only pass-through devices in use, in
>>> which case neither grants nor the event channels operations intercepted
>>> by SILO would be required.
>>
>> Such a domain wouldn't have any PV devices configured?
> 
> Indeed, that's my point: Why would Hardware then have a need to be
> allowed to connect to domains.
> 
>>   I don't think this changes anything compared to today.
> 
> I don't think I see what you mean to tell me with this. What we're
> discussing here is the effect of the separation you're suggesting,
> which necessarily is different from what we have today.
>
>> Both sides need to be configured and opt-in.  Hardware is a system
>> domain, so it should be possible to allow grants and event channels.
>> But they won't be used unless configured.
> 
> "Won't be used" isn't enough, imo. Isn't disaggregation about proper
> isolation, i.e. to guarantee that unwanted interactions can't occur?

Disaggregation is the separation of components.  The security policy 
applied is related but distinct.

"Won't be used" is how dummy and SILO (with respect to dom0) devices 
work today

dummy -> cooperating domUs can communicate
SILO  -> domUs cannot communicate
Flask -> configurable, but typically strict limits to explicit 
communication channels

SILO today doesn't deny communication between a domU without PV devices 
and dom0 - they just aren't configured.  I'm saying that would be the 
same with a split hardware domain.  PV devices just aren't configured, 
but there is no mandatory denial.  SILO only enforces mandatory denial 
between domUs today.

If you want mandatory enforcement, Flask is the correct choice. 
Everything would be explicitly configured.  Some domains could 
communicate with hwdom and others could not.

For SILO, a split hardware domain would be allowed communication with a 
domU if configured by the administrator.  I see this as comparable to 
configuring a domU to access dom0 PV backends today.

>>>> That leaves Control.  Xenstore and Hardware would already allow access
>>>> to Control, so it can obtain services that way.  Control should be
>>>> "privileged", which would mean it can make the connections.  But with
>>>> Xenstore and Hardware providing their services to domUs, there may not
>>>> be a reason to allow Control to use grants or event channels with domUs.
>>>> Still, Control is privileged, so it should be allowed to do something if
>>>> it chooses.  Establishing a grant, or event channel requires action on
>>>> both sides, so allow for the possibility.  This does open up an argo
>>>> wildcard ring from domUs, FWIW.
>>>
>>> Along the lines of my reply to patch 1, I think Hardware and Control
>>> need to have a pretty strong boundary between them. It's hard to see,
>>> for example, whether grant map/copy/transfer would indeed make sense
>>> between the two.
>>
>> The Hardware domain might provide a PV device to Control?
>>
>> I've tested removing control:
>> static bool is_priv_domain(const struct domain *d)
>> {
>>       return is_xenstore_domain(d) || is_hardware_domain(d);
>> }
>>
>> And that works in my limited ARM dom0less testing.  The toolstack isn't
>> really exercised in that case.  It seems strange that the privileged
>> control domain is *not* allowed though.
> 
> With the intended separation, there's (imo) not going to be any
> all-mighty domain anymore. Neither Hardware nor Control.
> 
>>> Similarly I'm not convinced a strong boundary isn't also needed
>>> between Xenstore and Hardware.
>>
>> If hardware is providing PV devices to domains, it will need access to
>> Xenstore.  I don't see how you can get around it.
>>
>> I tried to explain this in the first paragraph.  SILO's purpose was to
>> isolate domUs from each other, but allow it to access dom0.  dom0
>> embodies the control, hardware, and xenstore capabilities.  So as a
>> first approximation, each of Control, Hardware, and Xenstore should be
>> allowed to communicate with domUs.
> 
> Yes. Yet what to permit between the three special entities is far less
> clear. Hence why I'm unconvinced this can be expressed by SILO, and
> would rather require Flask.
> 
>> domUs needs to communicate with Xenstore and Hardware for PV devices.
>>
>> Xenstore provides Xenstore access to Hardware.
>>
>> Control would want Xenstore access.
>>
>> I don't know if this helps, but here's a table:
>>
>>       | CTL | HW  | XS  | domU
>> ----------------------------
>> CTL |     |  ?  |  y  |  ?
>> HW  |  ?  |     |  y  |  y
>> XS  |  y  |  y  |     |  y
>> domU|  ?  |  y  |  y  |  x
>>
>> Control and Hardware would be y if we allow PV devices
>>
>> Control and domUs - I don't have an immediate rational for them.  Except
>> that Control is privileged.  I've been running xenconsoled in Hardware.
>> If xenconsoled is in Control, then access would be required.
> 
> Perhaps some clarification is first need about what Control really is
> (and is not). It is sole the domain to create other domains. But beyond
> that things become unclear. E.g. xenconsoled may not belong into either
> Hardware or Control.
> 
>>>> --- a/xen/xsm/silo.c
>>>> +++ b/xen/xsm/silo.c
>>>> @@ -20,6 +20,12 @@
>>>>    #define XSM_NO_WRAPPERS
>>>>    #include <xsm/dummy.h>
>>>>    
>>>> +static bool is_priv_domain(const struct domain *d)
>>>> +{
>>>> +    return is_xenstore_domain(d) || is_hardware_domain(d) ||
>>>> +           is_control_domain(d);
>>>> +}
>>>
>>> This construct expands to two evaluate_nospec(), which likely isn't
>>> wanted. Some open-coding may be pretty much unavoidable here.
>>
>> Thanks, yes, good point.
>>
>>> (I'm
>>> surprised it's not three, i.e. I find it odd that is_xenstore_domain()
>>> doesn't also use that guard.)
>>
>> It looks okay to me.  There were only 2 uses until I added a 3rd in the
>> dom0less code.  The XSM check has evaluate_nospec() and the other 2 uses
>> aren't security critical - Setting a domain info flag, and __init code
>> for dom0less.  Maybe moving the evaluate_nospec() would be safer in case
>> use grows in the future, but it looks okay to me today.
> 
> When some of the hardening was first introduced, actual use sites were
> indeed taken into account. That wasn't quite right though, I think. Any
> such construct ought to be safe to use anywhere. For uses with clearly
> no concerns towards speculative abuse, a 2nd lightweight form of such
> constructs should then exist, imo. As to your use of "security critical":
> I'm not convinced you what mean is covering the potential of speculative
> abuse of involved code paths.

I can't parse this last sentence, and I think it's your main point.

XSM -> don't speculate around permission checks.  That's what I meant by 
"security critical".

The __init code is inaccessible to users, so it doesn't matter.

         if ( is_xenstore_domain(d) )
             continue;

getdomaininfo sets a flag, so I don't see this making a security 
difference.  It's not controlling loads or code paths.

(is_xenstore_domain(d)          ? XEN_DOMINF_xs_domain : 0) |

Regards,
Jason
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jan Beulich 4 months, 2 weeks ago
On 12.06.2025 18:56, Jason Andryuk wrote:
> On 2025-06-12 03:52, Jan Beulich wrote:
>> On 11.06.2025 06:20, Jason Andryuk wrote:
>>> On 2025-06-11 09:17, Jan Beulich wrote:
>>>> On 11.06.2025 00:57, Jason Andryuk wrote:
>>>>> --- a/xen/xsm/silo.c
>>>>> +++ b/xen/xsm/silo.c
>>>>> @@ -20,6 +20,12 @@
>>>>>    #define XSM_NO_WRAPPERS
>>>>>    #include <xsm/dummy.h>
>>>>>    
>>>>> +static bool is_priv_domain(const struct domain *d)
>>>>> +{
>>>>> +    return is_xenstore_domain(d) || is_hardware_domain(d) ||
>>>>> +           is_control_domain(d);
>>>>> +}
>>>>
>>>> This construct expands to two evaluate_nospec(), which likely isn't
>>>> wanted. Some open-coding may be pretty much unavoidable here.
>>>
>>> Thanks, yes, good point.
>>>
>>>> (I'm
>>>> surprised it's not three, i.e. I find it odd that is_xenstore_domain()
>>>> doesn't also use that guard.)
>>>
>>> It looks okay to me.  There were only 2 uses until I added a 3rd in the
>>> dom0less code.  The XSM check has evaluate_nospec() and the other 2 uses
>>> aren't security critical - Setting a domain info flag, and __init code
>>> for dom0less.  Maybe moving the evaluate_nospec() would be safer in case
>>> use grows in the future, but it looks okay to me today.
>>
>> When some of the hardening was first introduced, actual use sites were
>> indeed taken into account. That wasn't quite right though, I think. Any
>> such construct ought to be safe to use anywhere. For uses with clearly
>> no concerns towards speculative abuse, a 2nd lightweight form of such
>> constructs should then exist, imo. As to your use of "security critical":
>> I'm not convinced you what mean is covering the potential of speculative
>> abuse of involved code paths.
> 
> I can't parse this last sentence, and I think it's your main point.

Oh, sorry - the "you" and "what" ought to have swapped places.

> XSM -> don't speculate around permission checks.  That's what I meant by 
> "security critical".
> 
> The __init code is inaccessible to users, so it doesn't matter.
> 
>          if ( is_xenstore_domain(d) )
>              continue;
> 
> getdomaininfo sets a flag, so I don't see this making a security 
> difference.  It's not controlling loads or code paths.

Right, but this is what I said should imo not have been done: Make a
predicate speculation-safe (or not) based on its present uses. It's
imo more likely than not that a new use being added won't result in
the predicate being looked at, re-considering its safety for the new
use.

And indeed there's a 3rd use, in xsm_default_action():

    case XSM_XS_PRIV:
        if ( action == XSM_XS_PRIV &&
             evaluate_nospec(is_xenstore_domain(src)) )
            return 0;
        fallthrough;

It should not have been necessary to open-code the speculation safety
here, just like such isn't required a few lines later:

    case XSM_PRIV:
        if ( is_control_domain(src) )
            return 0;
        return -EPERM;

I am, btw, also not convinced the uses of evaluate_nospec() are fully
correct here, in that they apply to only part of the if() conditions.
For "action == XSM_XS_PRIV" it's okay as long as
- the function is indeed inlined, and
- the function argument is compile-time constant.
For "target" the same applies, but there is more room there for the
latter of the constraints to not be met. The argument in favor of
the present arrangements likely was that our main concern here is
with the "success" paths. Yet such argumentation would again be
dependent upon all call sites fitting the assumption that on the
"failure" paths there would be nothing critical that follows.

Jan
Re: [PATCH 2/4] xsm/silo: Support hwdom/control domains
Posted by Jason Andryuk 4 months, 2 weeks ago

On 2025-06-12 12:56, Jason Andryuk wrote:
> On 2025-06-12 03:52, Jan Beulich wrote:
>> On 11.06.2025 06:20, Jason Andryuk wrote:
>>> On 2025-06-11 09:17, Jan Beulich wrote:
>>>> On 11.06.2025 00:57, Jason Andryuk wrote:
>>>>> In a disaggregated environment, dom0 is split into Control, Hardware,
>>>>> and Xenstore domains, along with domUs.  The is_control_domain() check
>>>>> is not sufficient to handle all these cases.  Add is_priv_domain() to
>>>>> support allowing for the various domains.
>>>>>
>>>>> The purpose of SILO mode is to prevent domUs from interacting with 
>>>>> each
>>>>> other.  But dom0 was allowed to communicate with domUs to provide
>>>>> services.  As the disaggregation of dom0, Control, Hardware and 
>>>>> Xenstore
>>>>> are all service domains that need to communicate with other domains.
>>>>>
>>>>> To provide xenstore connections, the Xenstore domain must be 
>>>>> allowed to
>>>>> connect via grants and event channels.  Xenstore domain must also be
>>>>> allowed to connect to Control and Hardware to provide xenstore to 
>>>>> them.
>>>>
>>>> Are you suggesting that SILO at present is incompatible with a Xenstore
>>>> domain? silo_mode_dom_check() in its original form has no special
>>>> precautions, after all.
>>>
>>> Yes, it is incompatible with the current silo_mode_dom_check().  Only
>>> Control domain is allowed to use grants and event channels with a domU.
>>> A Xenstore domain would be denied.
>>>
>>> Xenstore stubdom only exists for x86 today.  My limited attempts to run
>>> xenstored in an dedicated Xenstore ARM Linux domain have failed.
>>
>> This may want sorting independently first. Once sorted, the requirements
>> here may become more clear.
> 
> HW+XS-> xenstore works
> CTL+XS or XS -> the domain's console just stops.  vCPUs are in Linux cpu 
> idle.  I haven't figured out more.  This required some Linux changes to 
> query the capabilities since XS isn't exposed and ARM assumes initial 
> domain implies HW + CTL.  It's orthogonal to my goals, so I haven't 
> looked too hard.

I got standalone Xenstore working on ARM.  Linux was blocking in 
xs_reset_watches() - the Xenstore domain needs to skip that function 
like xen_initial_domain().

This is with SILO's check as:
static bool is_priv_domain(const struct domain *d)
{
     return evaluate_nospec((d->options & XEN_DOMCTL_CDF_xs_domain) ||
                            d == hardware_domain);
}

Regards,
Jason