Xen includes disctinct concepts of a control domain (privileged) and a
hardware domain, but there is only a single XSM_PRIV check.  For dom0
this is not an issue as they are one and the same.
With hyperlaunch and its build capabilities, a non-privileged hwdom and a
privileged control domain should be possible.  Today the hwdom fails the
XSM_PRIV checks for hardware-related hooks which it should be allowed
access to.
Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and
platform_op.  The hwdom is allowed access for XSM_HW_PRIV.
Make XSM_HW_PRIV a new privilege level that is given to the hardware
domain, but is not exclusive.  The control domain can still execute
XSM_HW_PRIV commands.  This is a little questionable since it's unclear
how the control domain can meaningfully execute them.  But this approach
is chosen to maintain the increasing privileges and keep control domain
fully privileged.
Testing was performed with hardware+xenstore capabilities for dom0 and a
control dom3 booted from hyperlaunch.  The additional xenstore
permissions allowed hwdom+xenstore XSM_XS_PRIV which are necesary for
xenstore.
A traditional dom0 will be both privileged and hardware domain, so it
continues to have all accesses.
Why not XSM:Flask?  XSM:Flask is fine grain, and this aims to allow
coarse grain.  domUs are still domUs.  If capabilities are meant to be a
first class citizen, they should be usable by the default XSM policy.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/arch/arm/platform_hypercall.c |  2 +-
 xen/arch/x86/msi.c                |  2 +-
 xen/arch/x86/physdev.c            | 12 ++++++------
 xen/arch/x86/platform_hypercall.c |  2 +-
 xen/drivers/passthrough/pci.c     |  5 +++--
 xen/drivers/pci/physdev.c         |  2 +-
 xen/include/xsm/dummy.h           | 20 ++++++++++++--------
 xen/include/xsm/xsm.h             |  1 +
 8 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/xen/arch/arm/platform_hypercall.c b/xen/arch/arm/platform_hypercall.c
index ac55622426..a84596ae3a 100644
--- a/xen/arch/arm/platform_hypercall.c
+++ b/xen/arch/arm/platform_hypercall.c
@@ -35,7 +35,7 @@ long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     if ( d == NULL )
         return -ESRCH;
 
-    ret = xsm_platform_op(XSM_PRIV, op->cmd);
+    ret = xsm_platform_op(XSM_HW_PRIV, op->cmd);
     if ( ret )
         return ret;
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 5389bc0867..30801d980c 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1360,7 +1360,7 @@ int pci_restore_msi_state(struct pci_dev *pdev)
     if ( !use_msi )
         return -EOPNOTSUPP;
 
-    ret = xsm_resource_setup_pci(XSM_PRIV,
+    ret = xsm_resource_setup_pci(XSM_HW_PRIV,
                                 (pdev->seg << 16) | (pdev->bus << 8) |
                                 pdev->devfn);
     if ( ret )
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 4dfa1c0191..ce1ba41fa3 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -358,7 +358,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         ret = -EFAULT;
         if ( copy_from_guest(&apic, arg, 1) != 0 )
             break;
-        ret = xsm_apic(XSM_PRIV, currd, cmd);
+        ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
         if ( ret )
             break;
         ret = ioapic_guest_read(apic.apic_physbase, apic.reg, &apic.value);
@@ -372,7 +372,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         ret = -EFAULT;
         if ( copy_from_guest(&apic, arg, 1) != 0 )
             break;
-        ret = xsm_apic(XSM_PRIV, currd, cmd);
+        ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
         if ( ret )
             break;
         ret = ioapic_guest_write(apic.apic_physbase, apic.reg, apic.value);
@@ -388,7 +388,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         /* Use the APIC check since this dummy hypercall should still only
          * be called by the domain with access to program the ioapic */
-        ret = xsm_apic(XSM_PRIV, currd, cmd);
+        ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
         if ( ret )
             break;
 
@@ -490,7 +490,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&dev, arg, 1) )
             ret = -EFAULT;
         else
-            ret = xsm_resource_setup_pci(XSM_PRIV,
+            ret = xsm_resource_setup_pci(XSM_HW_PRIV,
                                          (dev.seg << 16) | (dev.bus << 8) |
                                          dev.devfn) ?:
                   pci_prepare_msix(dev.seg, dev.bus, dev.devfn,
@@ -501,7 +501,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved: {
         struct physdev_pci_mmcfg_reserved info;
 
-        ret = xsm_resource_setup_misc(XSM_PRIV);
+        ret = xsm_resource_setup_misc(XSM_HW_PRIV);
         if ( ret )
             break;
 
@@ -567,7 +567,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( setup_gsi.gsi < 0 || setup_gsi.gsi >= nr_irqs_gsi )
             break;
 
-        ret = xsm_resource_setup_gsi(XSM_PRIV, setup_gsi.gsi);
+        ret = xsm_resource_setup_gsi(XSM_HW_PRIV, setup_gsi.gsi);
         if ( ret )
             break;
 
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 90abd3197f..8efb4ad05f 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -228,7 +228,7 @@ ret_t do_platform_op(
     if ( op->interface_version != XENPF_INTERFACE_VERSION )
         return -EACCES;
 
-    ret = xsm_platform_op(XSM_PRIV, op->cmd);
+    ret = xsm_platform_op(XSM_HW_PRIV, op->cmd);
     if ( ret )
         return ret;
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 3edcfa8a04..9de7f0d358 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -672,7 +672,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
     else
         type = "device";
 
-    ret = xsm_resource_plug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn);
+    ret = xsm_resource_plug_pci(XSM_HW_PRIV, (seg << 16) | (bus << 8) | devfn);
     if ( ret )
         return ret;
 
@@ -824,7 +824,8 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     struct pci_dev *pdev;
     int ret;
 
-    ret = xsm_resource_unplug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn);
+    ret = xsm_resource_unplug_pci(XSM_HW_PRIV,
+                                  (seg << 16) | (bus << 8) | devfn);
     if ( ret )
         return ret;
 
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 0161a85e1e..c223611dfb 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -86,7 +86,7 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                         dev_reset.dev.bus,
                         dev_reset.dev.devfn);
 
-        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        ret = xsm_resource_setup_pci(XSM_HW_PRIV, sbdf.sbdf);
         if ( ret )
             break;
 
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 9227205fcd..d8df3f66c4 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -94,6 +94,10 @@ static always_inline int xsm_default_action(
         if ( target && evaluate_nospec(src->target == target) )
             return 0;
         fallthrough;
+    case XSM_HW_PRIV:
+        if ( action == XSM_HW_PRIV && is_hardware_domain(src) )
+            return 0;
+        fallthrough;
     case XSM_PRIV:
         if ( is_control_domain(src) )
             return 0;
@@ -275,7 +279,7 @@ static XSM_INLINE int cf_check xsm_console_io(
     if ( cmd == CONSOLEIO_write )
         return xsm_default_action(XSM_HOOK, d, NULL);
 #endif
-    return xsm_default_action(XSM_PRIV, d, NULL);
+    return xsm_default_action(XSM_HW_PRIV, d, NULL);
 }
 
 static XSM_INLINE int cf_check xsm_profile(
@@ -455,33 +459,33 @@ static XSM_INLINE int cf_check xsm_resource_unplug_core(XSM_DEFAULT_VOID)
 static XSM_INLINE int cf_check xsm_resource_plug_pci(
     XSM_DEFAULT_ARG uint32_t machine_bdf)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
 static XSM_INLINE int cf_check xsm_resource_unplug_pci(
     XSM_DEFAULT_ARG uint32_t machine_bdf)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
 static XSM_INLINE int cf_check xsm_resource_setup_pci(
     XSM_DEFAULT_ARG uint32_t machine_bdf)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
 static XSM_INLINE int cf_check xsm_resource_setup_gsi(XSM_DEFAULT_ARG int gsi)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
 static XSM_INLINE int cf_check xsm_resource_setup_misc(XSM_DEFAULT_VOID)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
@@ -673,7 +677,7 @@ static XSM_INLINE int cf_check xsm_mem_sharing(XSM_DEFAULT_ARG struct domain *d)
 
 static XSM_INLINE int cf_check xsm_platform_op(XSM_DEFAULT_ARG uint32_t op)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, current->domain, NULL);
 }
 
@@ -701,7 +705,7 @@ static XSM_INLINE int cf_check xsm_mem_sharing_op(
 static XSM_INLINE int cf_check xsm_apic(
     XSM_DEFAULT_ARG struct domain *d, int cmd)
 {
-    XSM_ASSERT_ACTION(XSM_PRIV);
+    XSM_ASSERT_ACTION(XSM_HW_PRIV);
     return xsm_default_action(action, d, NULL);
 }
 
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 24acc16125..264db4d8ee 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -36,6 +36,7 @@ enum xsm_default {
     XSM_DM_PRIV,  /* Device model can perform on its target domain */
     XSM_TARGET,   /* Can perform on self or your target domain */
     XSM_PRIV,     /* Privileged - normally restricted to dom0 */
+    XSM_HW_PRIV,  /* Hardware Privileged - normally restricted to dom0/hwdom */
     XSM_XS_PRIV,  /* Xenstore domain - can do some privileged operations */
     XSM_OTHER     /* Something more complex */
 };
-- 
2.49.0On 11.06.2025 00:57, Jason Andryuk wrote: > Xen includes disctinct concepts of a control domain (privileged) and a > hardware domain, but there is only a single XSM_PRIV check. For dom0 > this is not an issue as they are one and the same. > > With hyperlaunch and its build capabilities, a non-privileged hwdom and a > privileged control domain should be possible. Today the hwdom fails the > XSM_PRIV checks for hardware-related hooks which it should be allowed > access to. > > Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and > platform_op. The hwdom is allowed access for XSM_HW_PRIV. > > Make XSM_HW_PRIV a new privilege level that is given to the hardware > domain, but is not exclusive. The control domain can still execute > XSM_HW_PRIV commands. This is a little questionable since it's unclear > how the control domain can meaningfully execute them. But this approach > is chosen to maintain the increasing privileges and keep control domain > fully privileged. I consider this conceptually wrong. "Control" aiui refers to software (e.g. VMs or system-wide settings), but there ought to be a (pretty?) clear boundary between control and hardware domains, imo. As to "pretty" - should any overlap be necessary (xms_machine_memory_map() comes to mind), such would need handling specially then, I think. At the same time: The more of an overlap there is, the less clear it is why the two want/need separating in the first place. Jan
On 2025-06-11 09:02, Jan Beulich wrote: > On 11.06.2025 00:57, Jason Andryuk wrote: >> Xen includes disctinct concepts of a control domain (privileged) and a >> hardware domain, but there is only a single XSM_PRIV check. For dom0 >> this is not an issue as they are one and the same. >> >> With hyperlaunch and its build capabilities, a non-privileged hwdom and a >> privileged control domain should be possible. Today the hwdom fails the >> XSM_PRIV checks for hardware-related hooks which it should be allowed >> access to. >> >> Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and >> platform_op. The hwdom is allowed access for XSM_HW_PRIV. >> >> Make XSM_HW_PRIV a new privilege level that is given to the hardware >> domain, but is not exclusive. The control domain can still execute >> XSM_HW_PRIV commands. This is a little questionable since it's unclear >> how the control domain can meaningfully execute them. But this approach >> is chosen to maintain the increasing privileges and keep control domain >> fully privileged. > > I consider this conceptually wrong. "Control" aiui refers to software > (e.g. VMs or system-wide settings), but there ought to be a (pretty?) > clear boundary between control and hardware domains, imo. As to > "pretty" - should any overlap be necessary (xms_machine_memory_map() > comes to mind), such would need handling specially then, I think. At > the same time: The more of an overlap there is, the less clear it is > why the two want/need separating in the first place. So you are in favor of splitting control and hardware into distinct sets? I am okay with this. I implemented that originally, but I started doubting it. Mainly, should control be denied any permission? We aren't using the toolstack to build domains - dom0less or Hyperlaunch handles that. This avoids issues that might arise from running the toolstack. Thanks for your feedback. -Jason
On 11.06.2025 05:13, Jason Andryuk wrote: > On 2025-06-11 09:02, Jan Beulich wrote: >> On 11.06.2025 00:57, Jason Andryuk wrote: >>> Xen includes disctinct concepts of a control domain (privileged) and a >>> hardware domain, but there is only a single XSM_PRIV check. For dom0 >>> this is not an issue as they are one and the same. >>> >>> With hyperlaunch and its build capabilities, a non-privileged hwdom and a >>> privileged control domain should be possible. Today the hwdom fails the >>> XSM_PRIV checks for hardware-related hooks which it should be allowed >>> access to. >>> >>> Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and >>> platform_op. The hwdom is allowed access for XSM_HW_PRIV. >>> >>> Make XSM_HW_PRIV a new privilege level that is given to the hardware >>> domain, but is not exclusive. The control domain can still execute >>> XSM_HW_PRIV commands. This is a little questionable since it's unclear >>> how the control domain can meaningfully execute them. But this approach >>> is chosen to maintain the increasing privileges and keep control domain >>> fully privileged. >> >> I consider this conceptually wrong. "Control" aiui refers to software >> (e.g. VMs or system-wide settings), but there ought to be a (pretty?) >> clear boundary between control and hardware domains, imo. As to >> "pretty" - should any overlap be necessary (xms_machine_memory_map() >> comes to mind), such would need handling specially then, I think. At >> the same time: The more of an overlap there is, the less clear it is >> why the two want/need separating in the first place. > > So you are in favor of splitting control and hardware into distinct > sets? I am okay with this. I implemented that originally, but I > started doubting it. Mainly, should control be denied any permission? Yes, imo: Fundamentally for anything the hardware domain is supposed to be doing. Yet as indicated in other replies to this series - boundaries aren't always as clear as they ought to be for a clean separation. > We aren't using the toolstack to build domains - dom0less or Hyperlaunch > handles that. This avoids issues that might arise from running the > toolstack. IOW you don't have a control domain there in the first place? Jan
On 2025-06-12 03:36, Jan Beulich wrote: > On 11.06.2025 05:13, Jason Andryuk wrote: >> On 2025-06-11 09:02, Jan Beulich wrote: >>> On 11.06.2025 00:57, Jason Andryuk wrote: >>>> Xen includes disctinct concepts of a control domain (privileged) and a >>>> hardware domain, but there is only a single XSM_PRIV check. For dom0 >>>> this is not an issue as they are one and the same. >>>> >>>> With hyperlaunch and its build capabilities, a non-privileged hwdom and a >>>> privileged control domain should be possible. Today the hwdom fails the >>>> XSM_PRIV checks for hardware-related hooks which it should be allowed >>>> access to. >>>> >>>> Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and >>>> platform_op. The hwdom is allowed access for XSM_HW_PRIV. >>>> >>>> Make XSM_HW_PRIV a new privilege level that is given to the hardware >>>> domain, but is not exclusive. The control domain can still execute >>>> XSM_HW_PRIV commands. This is a little questionable since it's unclear >>>> how the control domain can meaningfully execute them. But this approach >>>> is chosen to maintain the increasing privileges and keep control domain >>>> fully privileged. >>> >>> I consider this conceptually wrong. "Control" aiui refers to software >>> (e.g. VMs or system-wide settings), but there ought to be a (pretty?) >>> clear boundary between control and hardware domains, imo. As to >>> "pretty" - should any overlap be necessary (xms_machine_memory_map() >>> comes to mind), such would need handling specially then, I think. At >>> the same time: The more of an overlap there is, the less clear it is >>> why the two want/need separating in the first place. >> >> So you are in favor of splitting control and hardware into distinct >> sets? I am okay with this. I implemented that originally, but I >> started doubting it. Mainly, should control be denied any permission? > > Yes, imo: Fundamentally for anything the hardware domain is supposed to > be doing. Ok. > Yet as indicated in other replies to this series - boundaries > aren't always as clear as they ought to be for a clean separation. Agreed. >> We aren't using the toolstack to build domains - dom0less or Hyperlaunch >> handles that. This avoids issues that might arise from running the >> toolstack. > > IOW you don't have a control domain there in the first place? I have a domain with d->is_privileged == true. We don't create more domains with it though, which was your other email's definition of the control domain. But it can pause and unpause domains. Regards, Jason
© 2016 - 2025 Red Hat, Inc.