Xen includes disctinct concepts of a control domain (privileged) and a
hardware domain, but there is only a single XSM_PRIV check. For dom0
this is not an issue as they are one and the same.
With hyperlaunch and its build capabiliies, a non-privileged hwdom and a
privileged control domain should be possible. Today the hwdom fails the
XSM_PRIV checks for hardware-related hooks which it should be allowed
access to.
Introduce XSM_HW_PRIV, and use it to mark many of the physdev_op and
platform_op.
Previously, xsm_default_action() was almost linearly increasing in
permissions with its fallthroughs. When it gets to XSM_PRIV, all
permissions were allowed for the control domain. That needs to change
so the control domain cannot access XSM_HW_PRIV.
The hwdom is allowed access for XSM_HW_PRIV and XSM_DM_PRIV. The
hardware domain providing a device model for a domU is an expected use
case, so those permission are needed as well.
Testing was performed with hardware+xenstore capabilities for dom0 and a
control dom3 booted from hyperlaunch. The additional xenstore
permissions allowed hwdom+xenstore XSM_XS_PRIV which are necesary for
xenstore.
A traditional dom0 will be both privileged and hardware domain, so it
continues to have all accesses.
Why not XSM:Flask? XSM:Flask is fine grain, and this aims to allow
coarse grain. domUs are still domUs. If capabilities are meant to be a
first class citizen, they should be usable by the default XSM policy.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
xen/arch/arm/platform_hypercall.c | 2 +-
xen/arch/x86/msi.c | 2 +-
xen/arch/x86/physdev.c | 12 ++++++------
xen/arch/x86/platform_hypercall.c | 2 +-
xen/drivers/passthrough/pci.c | 5 +++--
xen/drivers/pci/physdev.c | 2 +-
xen/include/xsm/dummy.h | 22 +++++++++++++---------
xen/include/xsm/xsm.h | 1 +
8 files changed, 27 insertions(+), 21 deletions(-)
diff --git a/xen/arch/arm/platform_hypercall.c b/xen/arch/arm/platform_hypercall.c
index ac55622426..a84596ae3a 100644
--- a/xen/arch/arm/platform_hypercall.c
+++ b/xen/arch/arm/platform_hypercall.c
@@ -35,7 +35,7 @@ long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
if ( d == NULL )
return -ESRCH;
- ret = xsm_platform_op(XSM_PRIV, op->cmd);
+ ret = xsm_platform_op(XSM_HW_PRIV, op->cmd);
if ( ret )
return ret;
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index bf5b71822e..6b4bc712c5 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1355,7 +1355,7 @@ int pci_restore_msi_state(struct pci_dev *pdev)
if ( !use_msi )
return -EOPNOTSUPP;
- ret = xsm_resource_setup_pci(XSM_PRIV,
+ ret = xsm_resource_setup_pci(XSM_HW_PRIV,
(pdev->seg << 16) | (pdev->bus << 8) |
pdev->devfn);
if ( ret )
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 69fd42667c..b0bb2b846b 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -358,7 +358,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
ret = -EFAULT;
if ( copy_from_guest(&apic, arg, 1) != 0 )
break;
- ret = xsm_apic(XSM_PRIV, currd, cmd);
+ ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
if ( ret )
break;
ret = ioapic_guest_read(apic.apic_physbase, apic.reg, &apic.value);
@@ -372,7 +372,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
ret = -EFAULT;
if ( copy_from_guest(&apic, arg, 1) != 0 )
break;
- ret = xsm_apic(XSM_PRIV, currd, cmd);
+ ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
if ( ret )
break;
ret = ioapic_guest_write(apic.apic_physbase, apic.reg, apic.value);
@@ -388,7 +388,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
/* Use the APIC check since this dummy hypercall should still only
* be called by the domain with access to program the ioapic */
- ret = xsm_apic(XSM_PRIV, currd, cmd);
+ ret = xsm_apic(XSM_HW_PRIV, currd, cmd);
if ( ret )
break;
@@ -490,7 +490,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
if ( copy_from_guest(&dev, arg, 1) )
ret = -EFAULT;
else
- ret = xsm_resource_setup_pci(XSM_PRIV,
+ ret = xsm_resource_setup_pci(XSM_HW_PRIV,
(dev.seg << 16) | (dev.bus << 8) |
dev.devfn) ?:
pci_prepare_msix(dev.seg, dev.bus, dev.devfn,
@@ -501,7 +501,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
case PHYSDEVOP_pci_mmcfg_reserved: {
struct physdev_pci_mmcfg_reserved info;
- ret = xsm_resource_setup_misc(XSM_PRIV);
+ ret = xsm_resource_setup_misc(XSM_HW_PRIV);
if ( ret )
break;
@@ -567,7 +567,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
if ( setup_gsi.gsi < 0 || setup_gsi.gsi >= nr_irqs_gsi )
break;
- ret = xsm_resource_setup_gsi(XSM_PRIV, setup_gsi.gsi);
+ ret = xsm_resource_setup_gsi(XSM_HW_PRIV, setup_gsi.gsi);
if ( ret )
break;
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 90abd3197f..8efb4ad05f 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -228,7 +228,7 @@ ret_t do_platform_op(
if ( op->interface_version != XENPF_INTERFACE_VERSION )
return -EACCES;
- ret = xsm_platform_op(XSM_PRIV, op->cmd);
+ ret = xsm_platform_op(XSM_HW_PRIV, op->cmd);
if ( ret )
return ret;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ab25840e20..f25d00f7c4 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -678,7 +678,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
else
type = "device";
- ret = xsm_resource_plug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn);
+ ret = xsm_resource_plug_pci(XSM_HW_PRIV, (seg << 16) | (bus << 8) | devfn);
if ( ret )
return ret;
@@ -830,7 +830,8 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
struct pci_dev *pdev;
int ret;
- ret = xsm_resource_unplug_pci(XSM_PRIV, (seg << 16) | (bus << 8) | devfn);
+ ret = xsm_resource_unplug_pci(XSM_HW_PRIV,
+ (seg << 16) | (bus << 8) | devfn);
if ( ret )
return ret;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 0161a85e1e..c223611dfb 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -86,7 +86,7 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
dev_reset.dev.bus,
dev_reset.dev.devfn);
- ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+ ret = xsm_resource_setup_pci(XSM_HW_PRIV, sbdf.sbdf);
if ( ret )
break;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 06f4eccf5f..4536ee5dad 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -95,7 +95,11 @@ static always_inline int xsm_default_action(
return 0;
fallthrough;
case XSM_PRIV:
- if ( is_control_domain(src) )
+ case XSM_HW_PRIV:
+ if ( is_control_domain(src) && action != XSM_HW_PRIV )
+ return 0;
+ if ( is_hardware_domain(src) &&
+ (action == XSM_HW_PRIV || action == XSM_DM_PRIV) )
return 0;
return -EPERM;
default:
@@ -280,7 +284,7 @@ static XSM_INLINE int cf_check xsm_console_io(
if ( cmd == CONSOLEIO_write )
return xsm_default_action(XSM_HOOK, d, NULL);
#endif
- return xsm_default_action(XSM_PRIV, d, NULL);
+ return xsm_default_action(XSM_HW_PRIV, d, NULL);
}
static XSM_INLINE int cf_check xsm_profile(
@@ -460,33 +464,33 @@ static XSM_INLINE int cf_check xsm_resource_unplug_core(XSM_DEFAULT_VOID)
static XSM_INLINE int cf_check xsm_resource_plug_pci(
XSM_DEFAULT_ARG uint32_t machine_bdf)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
static XSM_INLINE int cf_check xsm_resource_unplug_pci(
XSM_DEFAULT_ARG uint32_t machine_bdf)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
static XSM_INLINE int cf_check xsm_resource_setup_pci(
XSM_DEFAULT_ARG uint32_t machine_bdf)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
static XSM_INLINE int cf_check xsm_resource_setup_gsi(XSM_DEFAULT_ARG int gsi)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
static XSM_INLINE int cf_check xsm_resource_setup_misc(XSM_DEFAULT_VOID)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
@@ -688,7 +692,7 @@ static XSM_INLINE int cf_check xsm_mem_sharing(XSM_DEFAULT_ARG struct domain *d)
static XSM_INLINE int cf_check xsm_platform_op(XSM_DEFAULT_ARG uint32_t op)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, current->domain, NULL);
}
@@ -716,7 +720,7 @@ static XSM_INLINE int cf_check xsm_mem_sharing_op(
static XSM_INLINE int cf_check xsm_apic(
XSM_DEFAULT_ARG struct domain *d, int cmd)
{
- XSM_ASSERT_ACTION(XSM_PRIV);
+ XSM_ASSERT_ACTION(XSM_HW_PRIV);
return xsm_default_action(action, d, NULL);
}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 4dbff9d866..404491ef62 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -36,6 +36,7 @@ enum xsm_default {
XSM_DM_PRIV, /* Device model can perform on its target domain */
XSM_TARGET, /* Can perform on self or your target domain */
XSM_PRIV, /* Privileged - normally restricted to dom0 */
+ XSM_HW_PRIV, /* Hardware Privileged - normally restricted to dom0/hwdom */
XSM_XS_PRIV, /* Xenstore domain - can do some privileged operations */
XSM_OTHER /* Something more complex */
};
--
2.48.1
On 06.03.2025 23:03, Jason Andryuk wrote: > --- a/xen/include/xsm/dummy.h > +++ b/xen/include/xsm/dummy.h > @@ -95,7 +95,11 @@ static always_inline int xsm_default_action( > return 0; > fallthrough; > case XSM_PRIV: > - if ( is_control_domain(src) ) > + case XSM_HW_PRIV: > + if ( is_control_domain(src) && action != XSM_HW_PRIV ) > + return 0; > + if ( is_hardware_domain(src) && > + (action == XSM_HW_PRIV || action == XSM_DM_PRIV) ) > return 0; > return -EPERM; Hmm. Isn't DM_PRIV a property applying to the control domain (besides any stub domains), but not the hardware one? Jan
On 2025-03-17 10:22, Jan Beulich wrote: > On 06.03.2025 23:03, Jason Andryuk wrote: >> --- a/xen/include/xsm/dummy.h >> +++ b/xen/include/xsm/dummy.h >> @@ -95,7 +95,11 @@ static always_inline int xsm_default_action( >> return 0; >> fallthrough; >> case XSM_PRIV: >> - if ( is_control_domain(src) ) >> + case XSM_HW_PRIV: >> + if ( is_control_domain(src) && action != XSM_HW_PRIV ) >> + return 0; >> + if ( is_hardware_domain(src) && >> + (action == XSM_HW_PRIV || action == XSM_DM_PRIV) ) >> return 0; >> return -EPERM; > > Hmm. Isn't DM_PRIV a property applying to the control domain (besides > any stub domains), but not the hardware one? I ran QEMU in hardware domain to provide devices to a domU. I thought QEMU would better run in hardware domain than control domain. Regards, Jason
On Mon, 17 Mar 2025, Jason Andryuk wrote: > On 2025-03-17 10:22, Jan Beulich wrote: > > On 06.03.2025 23:03, Jason Andryuk wrote: > > > --- a/xen/include/xsm/dummy.h > > > +++ b/xen/include/xsm/dummy.h > > > @@ -95,7 +95,11 @@ static always_inline int xsm_default_action( > > > return 0; > > > fallthrough; > > > case XSM_PRIV: > > > - if ( is_control_domain(src) ) > > > + case XSM_HW_PRIV: > > > + if ( is_control_domain(src) && action != XSM_HW_PRIV ) > > > + return 0; > > > + if ( is_hardware_domain(src) && > > > + (action == XSM_HW_PRIV || action == XSM_DM_PRIV) ) > > > return 0; > > > return -EPERM; > > > > Hmm. Isn't DM_PRIV a property applying to the control domain (besides > > any stub domains), but not the hardware one? > > I ran QEMU in hardware domain to provide devices to a domU. I thought QEMU > would better run in hardware domain than control domain. Leaving stubdoms aside, QEMU has to run in the hardware domain because the hardware domain is less privileged. QEMU can be attacked or affected by the guest so it is undesirable to run QEMU in the control domain which is highly privileged, and considered highly secure / safe.
On 18.03.2025 00:55, Stefano Stabellini wrote: > On Mon, 17 Mar 2025, Jason Andryuk wrote: >> On 2025-03-17 10:22, Jan Beulich wrote: >>> On 06.03.2025 23:03, Jason Andryuk wrote: >>>> --- a/xen/include/xsm/dummy.h >>>> +++ b/xen/include/xsm/dummy.h >>>> @@ -95,7 +95,11 @@ static always_inline int xsm_default_action( >>>> return 0; >>>> fallthrough; >>>> case XSM_PRIV: >>>> - if ( is_control_domain(src) ) >>>> + case XSM_HW_PRIV: >>>> + if ( is_control_domain(src) && action != XSM_HW_PRIV ) >>>> + return 0; >>>> + if ( is_hardware_domain(src) && >>>> + (action == XSM_HW_PRIV || action == XSM_DM_PRIV) ) >>>> return 0; >>>> return -EPERM; >>> >>> Hmm. Isn't DM_PRIV a property applying to the control domain (besides >>> any stub domains), but not the hardware one? >> >> I ran QEMU in hardware domain to provide devices to a domU. I thought QEMU >> would better run in hardware domain than control domain. > > Leaving stubdoms aside, QEMU has to run in the hardware domain because > the hardware domain is less privileged. QEMU can be attacked or affected > by the guest so it is undesirable to run QEMU in the control domain > which is highly privileged, and considered highly secure / safe. Yet having access to the hardware, hwdom can likely also do about anything to the system. IOW I'd consider this "highly privileged" too, just not from a pure software perspective. If you want a secure / safe environment, I fear you won't get around using further (stub?) domain(s) to run qemu in. Jan
On Tue, 18 Mar 2025, Jan Beulich wrote: > On 18.03.2025 00:55, Stefano Stabellini wrote: > > On Mon, 17 Mar 2025, Jason Andryuk wrote: > >> On 2025-03-17 10:22, Jan Beulich wrote: > >>> On 06.03.2025 23:03, Jason Andryuk wrote: > >>>> --- a/xen/include/xsm/dummy.h > >>>> +++ b/xen/include/xsm/dummy.h > >>>> @@ -95,7 +95,11 @@ static always_inline int xsm_default_action( > >>>> return 0; > >>>> fallthrough; > >>>> case XSM_PRIV: > >>>> - if ( is_control_domain(src) ) > >>>> + case XSM_HW_PRIV: > >>>> + if ( is_control_domain(src) && action != XSM_HW_PRIV ) > >>>> + return 0; > >>>> + if ( is_hardware_domain(src) && > >>>> + (action == XSM_HW_PRIV || action == XSM_DM_PRIV) ) > >>>> return 0; > >>>> return -EPERM; > >>> > >>> Hmm. Isn't DM_PRIV a property applying to the control domain (besides > >>> any stub domains), but not the hardware one? > >> > >> I ran QEMU in hardware domain to provide devices to a domU. I thought QEMU > >> would better run in hardware domain than control domain. > > > > Leaving stubdoms aside, QEMU has to run in the hardware domain because > > the hardware domain is less privileged. QEMU can be attacked or affected > > by the guest so it is undesirable to run QEMU in the control domain > > which is highly privileged, and considered highly secure / safe. > > Yet having access to the hardware, hwdom can likely also do about anything > to the system. IOW I'd consider this "highly privileged" too, just not from > a pure software perspective. If you want a secure / safe environment, I > fear you won't get around using further (stub?) domain(s) to run qemu in. Traditionally, from an upstream Xen Project perspective, it is important to support as many hardware platforms as possible. The goal is to provide the best possible out-of-the-box experience, including support for devices like the Raspberry Pi 4 (RPi4), which has certain quirks and limited IOMMU support. However, when considering safety (similar concepts also apply to security) it is necessary to write down assumptions. One key assumption is that the system includes an IOMMU and that all DMA-capable devices are protected by it. There are boards, such as the RPi4, that do not meet these assumptions. While Xen can and should run on such hardware, the level of flexibility in configuring safe environments on these boards will be limited. For example, on the RPi4, I would not recommend to run QEMU in an unsafe hardware domaini because the hardware domain will end up having control over unprotected DMA-capable devices that can be used to harm the system. However, it will be possible to run QEMU in an unsafe hardware domains on platforms were all DMA-capable devices that can harm the system are IOMMU protected, such as Xilinx MPSoC when configured correctly.
© 2016 - 2026 Red Hat, Inc.