From nobody Tue May 7 14:56:14 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1620963728; cv=none; d=zohomail.com; s=zohoarc; b=lGeLxLJQ1HzsxV4WEaf2N6/pan1TB1VfyLEquiglf0dZHTm568zTKm3wOnig6+GfbERePUtDHKS2CxaByW6GY+bockishsOSAYOgYnpZiQ6txHoHtwkKRQp/qQnR9nQRd6eS0HPN5rfc+afO+B8tHO1gY8Mpg3uo0QTLbmZzHfI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620963728; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=K8N5QAyyvFIXtCrN3elhBL2wI2atvVxFr2oYKT9IGE0=; b=kc/Zozi/4MdaeObUMtUoNVlHKh7SXRnT/h15Jej7D20jQCr8aX0B7DMmlKtfLFsux0HNE1f/KhB3tlLbWQqymx5btCUo5hg8tRVnmvihBxDd2Pz1ydAR1hjXkp0AjXvbLIvCJZJpZWdyIop9q92N4Us+T7K46Dh+5L6OKFDkx+Y= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1620963728037733.6213669916588; Thu, 13 May 2021 20:42:08 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.127166.238948 (Exim 4.92) (envelope-from ) id 1lhOhx-0005k6-TZ; Fri, 14 May 2021 03:41:49 +0000 Received: by outflank-mailman (output) from mailman id 127166.238948; Fri, 14 May 2021 03:41:49 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lhOhx-0005jx-On; Fri, 14 May 2021 03:41:49 +0000 Received: by outflank-mailman (input) for mailman id 127166; Fri, 14 May 2021 03:41:48 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lhOhv-0005RK-Si for xen-devel@lists.xenproject.org; Fri, 14 May 2021 03:41:47 +0000 Received: from mail-qk1-x736.google.com (unknown [2607:f8b0:4864:20::736]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 03d3a714-a814-4ced-a08f-0ce4acfbd8d9; Fri, 14 May 2021 03:41:41 +0000 (UTC) Received: by mail-qk1-x736.google.com with SMTP id f18so5758388qko.7 for ; Thu, 13 May 2021 20:41:41 -0700 (PDT) Received: from walnut.ice.pyrology.org (mobile-166-176-184-32.mycingular.net. [166.176.184.32]) by smtp.gmail.com with ESMTPSA id g15sm3873432qka.49.2021.05.13.20.41.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 May 2021 20:41:39 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 03d3a714-a814-4ced-a08f-0ce4acfbd8d9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K8N5QAyyvFIXtCrN3elhBL2wI2atvVxFr2oYKT9IGE0=; b=Dh7uX/dzBU+kHTQNp5zVHJBkd4ipfqH+jmMFF77mJi43hJmvCpRrKkX5hl4KAbXIss hnAcuIsB4QrVXMeU43A5xzube57C6nDMMdBxG6GcN4Ol7SdtMxpQD1iQyD5b7j+4HCnY 8R5GeAHb+MsJestGYRjJyCVC/h5v/nUTe00J28q8f7xzAu2AmdMNaatfk4IUqGhOhnCL Uiz834OsHNerVR170/qHEE+FYgETgDTaExgP1SansvtJMbj8qypLuTD5l6H1pRK3KDRy SRFvQQMa2n0ziAFgXZk+1BR8sTlWKmSssXoAtvITsIhqVPWntTEglJXVmQuh6BA2P4j8 ovDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K8N5QAyyvFIXtCrN3elhBL2wI2atvVxFr2oYKT9IGE0=; b=Jf1srypjYGRyuEwB4dfdiA95pmWJucaXgcdXJTFWW8AunAhyH7b4pVc5bYqbZC2jHP mmmrDEccQtTV6wYjCdOMGb4D6iOUYvQLH0xqUZbPTgux7lzdMyLS4DjKnJvQRdYHEvV4 z0kFRgyMTh3BhOb1jqhhvedfEsmaucpP0Vyq8AAHYhCDo06GbzOoEQMtMs2KPhPpZZTZ NdpnFYpKLHKguMTOvr31+UpckFCqUzDeVjwUzTw+P9+AzQ01aV0lijQYuFmojMMDGmeg V74fd/eNSGpld2AfC6HWbbPuzogfQnXGLmAYCCBhBnZ5v1R+Ka0EDcY3NSXorb8/fasS 2/Lg== X-Gm-Message-State: AOAM533021SpZqvRDO5j6V0qFPKN8ctkivIFbpfkeM67svnklYZ22JYV ogLglFIRNyKgNFxhMVqOJGsKbdWVcs2sgw== X-Google-Smtp-Source: ABdhPJzvDjiz7eOZkXWzr8/lR/xD4PZal2Op1AKBEWcwjWNVx43EOxBCyurpGBl22cyd3yzWCSUWXQ== X-Received: by 2002:a37:a3d7:: with SMTP id m206mr41063632qke.343.1620963699655; Thu, 13 May 2021 20:41:39 -0700 (PDT) From: Christopher Clark To: xen-devel@lists.xenproject.org Cc: "Daniel P. Smith" , andrew.cooper3@citrix.com, stefano.stabellini@xilinx.com, jgrall@amazon.com, Julien.grall.oss@gmail.com, iwj@xenproject.org, wl@xen.org, george.dunlap@citrix.com, jbeulich@suse.com, persaur@gmail.com, Bertrand.Marquis@arm.com, roger.pau@citrix.com, luca.fancellu@arm.com, paul@xen.org, adam.schwalm@starlab.io, scott.davis@starlab.io, Christopher Clark Subject: [PATCH v4 1/2] docs/designs/launch: Hyperlaunch design document Date: Thu, 13 May 2021 20:41:00 -0700 Message-Id: <20210514034101.3683-2-christopher.w.clark@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210514034101.3683-1-christopher.w.clark@gmail.com> References: <20210514034101.3683-1-christopher.w.clark@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @gmail.com) From: "Daniel P. Smith" Adds a design document for Hyperlaunch, formerly DomB mode of dom0less. Signed-off-by: Christopher Clark Signed-off by: Daniel P. Smith Reviewed-by: Rich Persaud --- Changes since v3: * Rename the Landscape table * Changed Crash Domain to Recovery Domain * amended text to indicate that this will be new rather than existing Xen functionality * including update to the configuration, permission, function table * Add definitions for =E2=80=9Crecovery domain=E2=80=9D and =E2=80=9Ccrash = environment=E2=80=9D, describing the different functionalities * some design issues deferred * Added section to explain the motivations for the separation between VM creation (by the hypervisor) and VM configuration (by the boot domain) * Adjusted the description of the current process for creating a domain * Added recommendation for UEFI boot to use GRUB.efi to load via multiboot2 method. * Added Document Structure section * Added section on Communication of Domain Configuration docs/designs/launch/hyperlaunch.rst | 1004 +++++++++++++++++++++++++++ 1 file changed, 1004 insertions(+) create mode 100644 docs/designs/launch/hyperlaunch.rst diff --git a/docs/designs/launch/hyperlaunch.rst b/docs/designs/launch/hype= rlaunch.rst new file mode 100644 index 0000000000..30fce8c9c3 --- /dev/null +++ b/docs/designs/launch/hyperlaunch.rst @@ -0,0 +1,1004 @@ +########################### +Hyperlaunch Design Document +########################### + +.. sectnum:: :depth: 4 + +This post is a Request for Comment on the included v4 of a design document= that +describes Hyperlaunch: a new method of launching the Xen hypervisor, relat= ing +to dom0less and work from the Hyperlaunch project. We invite discussion of= this +on this list, at the monthly Xen Community Calls, and at dedicated meeting= s on +this topic in the Xen Working Group which will be announced in advance on = the +Xen Development mailing list. + + +.. contents:: :depth: 3 + + +Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document describes the design and motivation for the funded developme= nt of +a new, flexible system for launching the Xen hypervisor and virtual machin= es +named: "Hyperlaunch". + +The design enables seamless transition for existing systems that require a +dom0, and provides a new general capability to build and launch alternative +configurations of virtual machines, including support for static partition= ing +and accelerated start of VMs during host boot, while adhering to the princ= iples +of least privilege. It incorporates the existing dom0less functionality, +extended to fold in the new developments from the Hyperlaunch project, with +support for both x86 and Arm platform architectures, building upon and +replacing the earlier 'late hardware domain' feature for disaggregation of +dom0. + +Hyperlaunch is designed to be flexible and reusable across multiple use ca= ses, +and our aim is to ensure that it is capable, widely exercised, comprehensi= vely +tested, and well understood by the Xen community. + +Document Structure +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This is the primary design document for Hyperlaunch, to provide an overvie= w of +the feature. Separate additional documents will cover specific aspects of +Hyperlaunch in further detail, including: + + - The Device Tree specification for Hyperlaunch metadata + - New Domain Roles for Xen and the Xen Security Modules (XSM) policy + - Passthrough of PCI devices with Hyperlaunch + +Approach +=3D=3D=3D=3D=3D=3D=3D=3D + +Born out of improving support for Dynamic Root of Trust for Measurement (D= RTM), +the Hyperlaunch project is focused on restructuring the system launch of X= en. +The Hyperlaunch design provides a security architecture that builds on the +principles of Least Privilege and Strong Isolation, achieving this through= the +disaggregation of system functions. It enables this with the introduction = of a +boot domain that works in conjunction with the hypervisor to provide the +ability to launch multiple domains as part of host boot while maintaining a +least privilege implementation. + +While the Hyperlaunch project inception was and continues to be driven by a +focus on security through disaggregation, there are multiple use cases wit= h a +non-security focus that require or benefit from the ability to launch mult= iple +domains at host boot. This was proven by the need that drove the implement= ation +of the dom0less capability in the Arm branch of Xen. + +Hyperlaunch is designed to be flexible and reusable across multiple use ca= ses, +and our aim is to ensure that it is capable, widely exercised, comprehensi= vely +tested, and provides a robust foundation for current and emerging system l= aunch +requirements of the Xen community. + + +Objectives +---------- + +* In general strive to maintain compatibility with existing Xen behavior +* A default build of the hypervisor should be capable of booting both lega= cy-compatible and new styles of launch: + + * classic Xen boot: starting a single, privileged Dom0 + * classic Xen boot with late hardware domain: starting a Dom0 that= transitions hardware access/control to another domain + * a dom0less boot: starting multiple domains without privilege ass= ignment controls + * Hyperlaunch: starting one or more VMs, with flexible configurati= on + +* Preferred that it be managed via KCONFIG options to govern inclusion of = support for each style +* The selection between classic boot and Hyperlaunch boot should be automa= tic + + * Preferred that it not require a kernel command line parameter fo= r selection + +* It should not require modification to boot loaders +* It should provide a user friendly interface for its configuration and ma= nagement +* It must provide a method for building systems that fallback to console a= ccess in the event of misconfiguration +* It should be able to boot an x86 Xen environment without the need for a = Dom0 domain + + +Requirements and Design +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Hyperlaunch is defined as the ability of a hypervisor to construct and sta= rt +one or more virtual machines at system launch in a specific way. A hypervi= sor +can support one or both modes of configuration, Hyperlaunch Static and +Hyperlaunch Dynamic. The Hyperlaunch Static mode functions as a static +partitioning hypervisor ensuring only the virtual machines started at syst= em +launch are running on the system. The Hyperlaunch Dynamic mode functions a= s a +dynamic hypervisor allowing for additional virtual machines to be started = after +the initial virtual machines have started. The Xen hypervisor is capable of +both modes of configuration from the same binary and when paired with its = XSM +flask, provides strong controls that enable fine grained system partitioni= ng. + +Hypervisor Launch Landscape +--------- + +This comparison table presents the distinctive capabilities of Hyperlaunch= with +reference to existing launch configurations currently available in Xen and +other hypervisors. + +:: + + +---------------+-----------+------------+-----------+-------------+-----= ----------------+ + | **Xen Dom0** | **Linux** | **Late** | **Jail** | **Xen** | **Xe= n Hyperlaunch** | + | **(Classic)** | **KVM** | **HW Dom** | **house** | **dom0less**+-----= ----+-----------+ + | | | | | | Stat= ic | Dynamic | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D+= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + | Hypervisor able to launch multiple VMs during host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | Y | Y | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Hypervisor supports Static Partitioning = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | Y | Y | Y= | | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Able to launch VMs dynamically after host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Y | Y | Y* | Y | Y* | = | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Supports strong isolation between all VMs started at host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | Y | Y | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Enables flexible sequencing of VM start during host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Prevent all-powerful static root domain being launched at boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | Y* | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Operates without a Highly-privileged management VM (eg. Dom0) = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | Y* | | Y* | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Operates without a privileged toolstack VM (Control Domain) = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | Y* | Y= | | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Extensible VM configuration applied before launch of VMs at host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Flexible granular assignment of permissions and functions to VMs = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | Supports extensible VM measurement architecture for DRTM and attestatio= n | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | PCI passthrough configured at host boot = | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + | | | | | | Y= | Y | + +---------------+-----------+------------+-----------+-------------+-----= ----+-----------+ + + +Domain Construction +------------------- + +An important aspect of the Hyperlaunch architecture is that the hypervisor +performs domain construction for all the Initial Domains, ie. it builds e= ach +domain that is described in the Launch Control Module. More specifically, = the +hypervisor will perform the function of *domain creation* for each Initial +Domain: it allocates the unique domain identifier assigned to the virtual +machine and records essential metadata about it in the internal data struc= ture +that enables scheduling the domain to run. It will also perform *basic dom= ain +construction*: build the initial page tables with data from the kernel and +initial ramdisk supplied, and as appropriate for the domain type, populate= the +p2m table and ACPI tables. + +Subsequent to this, the boot domain can apply additional configuration to = the +initial domains from the data in the LCM, in *extended domain construction= *. + +The benefits of this structure include: + +* Security: Contrains the permissions required by the boot domain: it does= not + require the capability to create domains in this structure. This aligns = with + the principles of least privilege. +* Flexibility: Enables policy-based dynamic assignment of hardware by the = boot + domain, customizable according to use-case and able to adapt to hardware + discovery +* Compatibility: Supports reuse of familiar tools with use-case customized= boot + domains. +* Commonality: Reuses the same logic for initial basic domain building acr= oss + diverse Xen deployments. + * It aligns the x86 initial domain construction with the existing Arm + dom0less feature for construction of multiple domains at boot. + * The boot domain implementation may vary significantly with different + deployment use cases, whereas the hypervisor implementation is + common. +* Correctness: Increases confidence in the implementation of domain + construction, since it is performed by the hypervisor in well maintained= and + centrally tested logic. +* Performance: Enables launch for configurations where a fast start of + multiple domains at boot is a requirement. +* Capability: Supports launch of advanced configurations where a sequenced + start of multiple domains is required, or multiple domains are involved = in + startup of the running system configuration + * eg. for PCI passthrough on systems where the toolstack runs in a + separate domain to the hardware management. + +Please, see the =E2=80=98Hyperlaunch Device Tree=E2=80=99 design document,= which describes the +configuration module that is provided to the hypervisor by the bootloader. + +The hypervisor determines how these domains are started as host boot compl= etes: +in some systems the Boot Domain acts upon the extended boot configuration +supplied as part of launch, performing configuration tasks for preparing t= he +other domains for the hypervisor to commence running them. + +Common Boot Configurations +-------------------------- + +When looking across those that have expressed interest or discussed a need= for +launching multiple domains at host boot, the Hyperlaunch approach is to pr= ovide +the means to start nearly any combination of domains. Below is an enumerat= ed +selection of common boot configurations for reference in the following sec= tion.=20 + +Dynamic Launch with a Highly-Privileged Domain 0 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Hyperlaunch Classic: Dom0 + This configuration mimics the classic Xen start and domain constru= ction + where a single domain is constructed with all privileges and funct= ions for + managing hardware and running virtualization toolstack software. + +Hyperlaunch Classic: Extended Launch Dom0 + This configuration is where a Dom0 is started via a Boot Domain th= at runs + first. This is for cases where some preprocessing in a less privil= eged domain + is required before starting the all-privileged Domain 0. + +Hyperlaunch Classic: Basic Cloud + This configuration constructs a Dom0 that is started in parallel w= ith some + number of workload domains. + +Hyperlaunch Classic: Cloud + This configuration builds a Dom0 and some number of workload domai= ns, launched + via a Boot Domain that runs first. + + +Static Launch Configurations: without a Domain 0 or a Control Domain +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Hyperlaunch Static: Basic + Simple static partitioning where all domains that can be run on th= is system are + built and started during host boot and where no domain is started = with the + Control Domain permissions, thus making it not possible to create/= start any + further new domains. + +Hyperlaunch Static: Standard + This is a variation of the =E2=80=9CHyperlaunch Static: Basic=E2= =80=9D static partitioning + configuration with the introduction of a Boot Domain. This configu= ration allows + for use of a Boot Domain to be able to apply extended configuration + to the Initial Domains before they are started and + sequence the order in which they start. + +Hyperlaunch Static: Disaggregated + This is a variation of the =E2=80=9CHyperlaunch Static: Standard= =E2=80=9D configuration with + the introduction of a Boot Domain and an illustration that some fu= nctions can + be disaggregated to dedicated domains. + +Dynamic Launch of Disaggregated System Configurations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Hyperlaunch Dynamic: Hardware Domain + This configuration mimics the existing Xen feature late hardware d= omain with + the one difference being that the hardware domain is constructed b= y the + hypervisor at startup instead of later by Dom0. + +Hyperlaunch Dynamic: Flexible Disaggregation + This configuration is similar to the =E2=80=9CHyperlaunch Classic:= Dom0=E2=80=9D configuration + except that it includes starting a separate hardware domain during= Xen startup. + It is also similar to =E2=80=9CHyperlaunch Dynamic: Hardware Domai= n=E2=80=9D configuration, but + it launches via a Boot Domain that runs first. + +Hyperlaunch Dynamic: Full Disaggregation + In this configuration it is demonstrated how it is possible to sta= rt a fully + disaggregated system: the virtualization toolstack runs in a Contr= ol Domain, + separate from the domains responsible for managing hardware, XenSt= ore, the Xen + Console and Crash functions, each launched via a Boot Domain. + + +Example Use Cases and Configurations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following example use cases can be matched to configurations listed in= the +previous section. + +Use case: Modern cloud hypervisor +""""""""""""""""""""""""""""""""" + +**Option:** Hyperlaunch Classic: Cloud + +This configuration will support strong isolation for virtual TPM domains a= nd +measured launch in support of attestation to infrastructure management, wh= ile +allowing the use of existing Dom0 virtualization toolstack software. + +Use case: Edge device with security or safety requirements +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +**Option:** Hyperlaunch Static: Boot + +This configuration runs without requiring a highly-privileged Dom0, and en= ables +extended VM configuration to be applied to the Initial VMs prior to launch= ing +them, optionally in a sequenced start. + +Use case: Client hypervisor +""""""""""""""""""""""""""" + +**Option:** Hyperlaunch Dynamic: Flexible Disaggregation + +**Option:** Hyperlaunch Dynamic: Full Disaggregation + +These configurations enable dynamic client workloads, strong isolation for= the +domain running the virtualization toolstack software and each domain manag= ing +hardware, with PCI passthrough performed during host boot and support for +measured launch. + +Hyperlaunch Disaggregated Launch +-------------------------------- + + +Existing in Xen today are two primary permissions, *control domain* and +*hardware domain*, and two functions, *console domain* and *xenstore domai= n*, +that can be assigned to a domain. Traditionally all of these permissions a= nd +functions are all assigned to Dom0 at start and can then be delegated to o= ther +domains created by the toolstack in Dom0. With Hyperlaunch it becomes poss= ible +to assign these permissions and functions to any domain for which there is= a +definition provided at startup. + +Additionally, two further functions are introduced: the *recovery domain*, +intended to assist with recovery from failures encountered starting VMs du= ring +host boot, and the *boot domain*, for performing aspects of domain constru= ction +during startup. + +Supporting the booting of each of the above common boot configurations is +accomplished by considering the set of initial domains and the assignment = of +Xen=E2=80=99s permissions and functions, including the ones introduced by = Hyperlaunch, +to these domains. A discussion of these will be covered later but for now = they +are laid out in a table with a mapping to the common boot configurations. = This +table is not intended to be an exhaustive list of configurations and does = not +account for flask policy specified functions that are use case specific. + +In the table each number represents a separate domain being +constructed by the Hyperlaunch construction path as Xen starts, and the +designator, ``{n}`` signifies that there may be =E2=80=9Cn=E2=80=9D additi= onal domains that may +be constructed that do not have any special role for a general Xen system. + +:: + + +-------------------+------------------+---------------------------------= --+ + | Configuration | Permission | Function = | + | +------+------+----+------+--------+--------+--------= --+ + | | None | Ctrl | HW | Boot |Recovery| Console| Xenstor= e | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + | Classic: Dom0 | | 0 | 0 | | 0 | 0 | 0 = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Classic: Extended | | 1 | 1 | 0 | 1 | 1 | 1 = | + | Launch Dom0 | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Classic: | {n} | 0 | 0 | | 0 | 0 | 0 = | + | Basic Cloud | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Classic: Cloud | {n} | 1 | 1 | 0 | 1 | 1 | 1 = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Static: Basic | {n} | | 0 | | 0 | 0 | 0 = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Static: Standard | {n} | | 1 | 0 | 1 | 1 | 1 = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Static: | {n} | | 2 | 0 | 3 | 4 | 1 = | + | Disaggregated | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Dynamic: | | 0 | 1 | | 0 | 0 | 0 = | + | Hardware Domain | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Dynamic: Flexible | {n} | 1 | 2 | 0 | 1 | 1 | 1 = | + | Disaggregation | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + | Dynamic: Full | {n} | 2 | 3 | 0 | 4 | 5 | 1 = | + | Disaggregation | | | | | | | = | + +-------------------+------+------+----+------+--------+--------+--------= --+ + +Overview of Hyperlaunch Flow +---------------------------- + +Before delving into Hyperlaunch, a good basis to start with is an understa= nding +of the current process to create a domain. A way to view this process star= ts +with the core configuration which is the information the hypervisor requir= es to +make the call to `domain_create`, followed by basic construction to provid= e the +memory image to run, including the kernel and ramdisk. A subsequent step +applies the extended configuration used by the toolstack to provide a doma= in +with any additional configuration information. Until the extended configur= ation +is completed, a domain has access to no resources except its allocated vcp= us +and memory. The exception to this is Dom0, which the hypervisor explicitly +grants control and access to all system resources, except for those that o= nly +the hypervisor should have control over. This exception for Dom0 is drive= n by +the system structure with a monolithic Dom0 domain predating introduction = of +support for disaggregation into Xen, and the corresponding default assignm= ent +of multiple roles within the Xen system to Dom0. + +While not a different domain creation path, there does exist the Hardware +Domain (hwdom), sometimes also referred to as late-Dom0. It is an early ef= fort +to disaggregate Dom0=E2=80=99s roles into a separate control domain and ha= rdware +domain. This capability is activated by the passing of a domain id to the +`hardware_dom` kernel command line parameter, and the Xen hypervisor will = then +flag that domain id as the hardware domain. Later when the toolstack const= ructs +a domain with that domain id as the requested domid, the hypervisor will +transfer all device I/O from Dom0 to this domain. In addition it will also +transfer the =E2=80=9Chost shutdown on domain shutdown=E2=80=9D flag from = Dom0 to the hardware +domain. It is worth mentioning that this approach for disaggregation was +created in this manner due to the inability of Xen to launch more than one +domain at startup. + +Hyperlaunch Xen startup +^^^^^^^^^^^^^^^^^^^^^^^ + +The Hyperlaunch approach=E2=80=99s primary focus is on how to assign the r= oles +traditionally granted to Dom0 to one or more domains at host boot. While t= he +statement is simple to make, the implications are not trivial by any means. +This also explains why the Hyperlaunch approach is orthogonal to the exist= ing +dom0less capability. The dom0less capability focuses on enabling the launc= h of +multiple domains in parallel with Dom0 at host boot. A corollary for dom0l= ess +is that for systems that don=E2=80=99t require Dom0 after all guest domain= s have +started, they are able to do the host boot without a Dom0. Though it shoul= d be +noted that it may be possible to start Dom0 at a later point. Whereas with +Hyperlaunch, its approach of separating Dom0=E2=80=99s roles requires the = ability to +launch multiple domains at host boot. The direct consequences from this +approach are profound and provide a myriad of possible configurations for = which +a sample of common boot configurations were already presented. + +To enable the Hyperlaunch approach a new alternative path for host boot wi= thin +the hypervisor must be introduced. This alternative path effectively branc= hes +just before the current point of Dom0 construction and begins an alternate +means of system construction. The determination if this alternate path sho= uld +be taken is through the inspection of the boot chain. If the bootloader has +loaded a specific configuration, as described later, it will enable Xen to +detect that a Hyperlaunch configuration has been provided. Once a Hyperlau= nch +configuration is detected, this alternate path can be thought of as occurr= ing +in phases: domain creation, domain preparation, and launch finalization. + +Domain Creation +""""""""""""""" + +The domain creation phase begins with Xen parsing the bootloader provided +material, to understand the content of the modules provided. It will then = load +any microcode or XSM policy it discovers. For each domain configuration Xen +finds, it parses the configuration to construct the necessary domain defin= ition +to instantiate an instance of the domain and leave it in a paused state. W= hen +all domain configurations have been instantiated as domains, if one of the= m is +flagged as the Boot Domain, that domain will be unpaused starting the doma= in +preparation phase. If there is no Boot Domain defined, then the domain +preparation phase will be skipped and Xen will trigger the launch finaliza= tion +phase. + +Domain Preparation Phase +"""""""""""""""""""""""" + +The domain preparation phase is an optional check point for the execution = of a +workload specific domain, the Boot Domain. While the Boot Domain is the fi= rst +domain to run and has some degree of control over the system, it is extrem= ely +restricted in both system resource access and hypervisor operations. Its +purpose is to: + +* Access the configuration provided by the bootloader +* Finalize the configuration of the domains +* Conduct any setup and launch related operations +* Do an ordered unpause of domains that require an ordered start + +When the Boot Domain has completed, it will notify the hypervisor that it = is +done triggering the launch finalization phase. + + +Launch Finalization +""""""""""""""""""" + +The hypervisor handles the launch finalization phase which is equivalent t= o the +clean up phase. As such the steps taken by the hypervisor, not necessarily= in +implementation order, are as follows, + +* Free the boot module chain +* If a Boot Domain was used, reclaim Boot Domain resources +* Unpause any domains still in a paused state +* Boot Domain uses a reserved function thus can never be respawned + +While the focus thus far has been on how the Hyperlaunch capability will w= ork, +it is worth mentioning what it does not do or limit from occurring. It doe= s not +stop or inhibit the assigning of the control domain role which gives the d= omain +the ability to create, start, stop, restart, and destroy domains or the +hardware domain role which gives access to all I/O devices except those th= at +the hypervisor has reserved for itself. In particular it is still possible= to +construct a domain with all the privileged roles, i.e. a Dom0, with or wit= hout +the domain id being zero. In fact what limitations are imposed now become = fully +configurable without the risk of circumvention by an all privileged domain. + +Structuring of Hyperlaunch +-------------------------- + +The structure of Hyperlaunch is built around the existing capabilities of = the +host boot protocol. This approach was driven by the objective not to requi= re +modifications to the boot loader. The only requirement is that the boot lo= ader +supports the Multiboot2 (MB2) protocol. For UEFI boot, our recommendation = is to +use GRUB.efi to load Xen and the initial domain materials via the multiboo= t2 +method. On Arm platforms, Hyperlaunch is compatible with the existing inte= rface +for boot into the hypervisor. + + +x86 Multiboot2 +^^^^^^^^^^^^^^ + +The MB2 protocol has no concept of a manifest to tell the initial kernel w= hat +is contained in the chain, leaving it to the kernel to impose a loading +convention, use magic number identification, or both. When considering the +passing of multiple kernels, ramdisks, and domain configuration along with= any +existing modules already passed, there is no sane convention that could be +imposed and magic number identification is nearly impossible when consider= ing +the objective not to impose unnecessary complication to the hypervisor. + +As it was alluded to previously, a manifest describing the contents in the= MB2 +chain and how they relate within a Xen context is needed. To address this = need +the Launch Control Module (LCM) was designed to provide such a manifest. T= he +LCM was designed to have a specific set of properties, + +* minimize the complexity of the parsing logic required by the hypervisor +* allow for expanding and optional configuration fragments without breaking + backwards compatibility + +To enable automatic detection of a Hyperlaunch configuration, the LCM must= be +the first MB2 module in the MB2 module chain. The LCM is implemented using= the +Device Tree as defined in the Hyperlaunch Device Tree design document. Wit= h the +LCM implemented in Device Tree, it has a magic number that enables the +hypervisor to detect its presence when used in a Multiboot2 module chain. = The +hypervisor can confirm that it is a proper LCM Device Tree by checking for= a +compliant Hyperlaunch Device Tree. The Hyperlaunch Device Tree nodes are +designed to allow, + +* for the hypervisor to parse only those entries it understands, +* for packing custom information for a custom boot domain, +* the ability to use a new LCM with an older hypervisor, +* and the ability to use an older LCM with a new hypervisor. + +Arm Device Tree +^^^^^^^^^^^^^^^ + +As discussed the LCM is in Device Tree format and was designed to co-exist= in +the Device Tree ecosystem, and in particular in parallel with dom0less Dev= ice +Tree entries. On Arm, Xen is already designed to boot from a host Device T= ree +description (dtb) file and the LCM entries can be embedded into this host = dtb +file. This makes detecting the LCM entries and supporting Hyperlaunch on A= rm +relatively straight forward. Relative to the described x86 approach, at the +point where Xen inspects the first MB2 module, on Arm Xen will check if th= e top +level LCM node exists in the host dtb file. If the LCM node does exist, th= en at +that point it will enter into the same code path as the x86 entry would go= .=20 + +Xen hypervisor +^^^^^^^^^^^^^^ + +It was previously discussed at a higher level of the new host boot flow th= at +will be introduced. Within this new flow is the configuration parsing and +domain creation phase which will be expanded upon here. The hypervisor will +inspect the LCM for a config node and if found will iterate through all mo= dules +nodes. The module nodes are used to identify if any modules contain microc= ode +or an XSM policy. As it processes domain nodes, it will construct the doma= in +using the node properties and the modules nodes. Once it has completed +iterating through all the entries in the LCM, if a constructed domain has = the +Boot Domain attribute, it will then be unpaused. Otherwise the hypervisor = will +start the launch finalization phase. + +Boot Domain +^^^^^^^^^^^ + +Traditionally domain creation was controlled by the user within the Dom0 +environment whereby custom toolstacks could be implemented to impose +requirements on the process. The Boot Domain is a means to enable the user= to +continue to maintain a degree of that control over domain creation but wit= hin a +limited privilege environment. The Boot Domain will have access to the LCM= and +the boot chain along with access to a subset of the hypercall operations. = When +the Boot Domain is finished it will notify the hypervisor through a hyperc= all +op. + +Recovery Domain +^^^^^^^^^^^^^^^ + +With the existing Dom0 host boot path, when a failure occurs there are sev= eral +assumptions that can safely be made to get the user to a console for +troubleshooting. With the Hyperlaunch host boot path those assumptions can= no +longer be made, thus a means is needed to get the user to a console in the= case +of a recoverable failure. The recovery domain is configured by a domain +configuration entry in the LCM, in the same manner as the other initial +domains, and it will not be unpaused at launch finalization unless a failu= re is +encountered starting the initial domains. + +Xen has existing support for a Crash Environment where memory can be reser= ved +at host boot and a kernel loaded into it, to be jumped into at any point w= hile +the system is running when a crash is detected. The Recovery Domain +functionality is a separate, complementary capability. The Crash Environme= nt +replaces the previously active hypervisor and running guests, and enables a +process for mounting disks to write out log information prior to rebooting= the +system. In contrast, the Recovery Domain is able to use the functionality = of +the Xen hypervisor, that is still present and running, to perform recovery +handling for errors encountered with starting the initial domains. + +Deferred Design +""""""""""""""" + +To be determined: + +* Define what is detected as a crash +* Explain how crash detection is performed and which components are involv= ed +* Explain how the recovery domain is unpaused +* Explain how and when the resources assigned to the recovery domain are r= eclaimed +* Define what the recovery domain is able to do +* Determine what permissions the recovery domain requires to perform its j= ob + + +Control Domain +^^^^^^^^^^^^^^ + +The concept of the Control Domain already exists within Xen as a boolean, +`is_privileged`, that governs access to many of the privileged interfaces = of +the hypervisor that support a domain running a virtualization system tools= tack. +Hyperlaunch will allow the `is_privileged` flag to be set on any domain th= at is +created at launch, rather than only a Dom0. It may potentially be set on +multiple domains. + +Hardware Domain +^^^^^^^^^^^^^^^ + +The Hardware Domain is also an existing concept for Xen that is enabled th= rough +the `is_hardware_domain` check. With Hyperlaunch the previous process of I= /O +accesses being assigned to Dom0 for later transfer to the hardware domain = would +no longer be required. Instead during the configuration phase the Xen +hypervisor would directly assign the I/O accesses to the domain with the +hardware domain permission bit enabled. + +Console Domain +^^^^^^^^^^^^^^ + +Traditionally the Xen console is assigned to the control domain and then +reassignable by the toolstack to another domain. With Hyperlaunch it becom= es +possible to construct a boot configuration where there is no control domai= n or +have a use case where the Xen console needs to be isolated. As such it bec= omes +necessary to be able to designate which of the initial domains should be +assigned the Xen console. Therefore Hyperlaunch introduces the ability to +specify an initial domain which the console is assigned along with a conve= ntion +of ordered assignment for when there is no explicit assignment. + +Communication of Domain Configurations +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +There are several standard methods for an Operating System to access machi= ne +configuration and environment information: ACPI is common on x86 systems, +whereas Device Tree is more typical on Arm platforms. There are currently +implementations of both in Xen. + +* For dom0less, guest Device Trees are dynamically constructed by the + hypervisor to convey domain configuration data + +* For PVH dom0 on x86, ACPI tables are built by the hypervisor before the + domain is started + +Note that both of these mechanisms convey static data that is fixed prior = to +the point of domain construction. Hyperlaunch will retain both the existing +ACPI and Device Tree methods. + +Communication of data between a Boot Domain and a Control Domain is of note +since they may not be running concurrently: the method used will depend on +their specific implementations, but one option available is to use Xen=E2= =80=99s hypfs +for transfer of basic data to support system bootstrap. + +--------------------------------------------------------------------------= ----- + +Appendix +=3D=3D=3D=3D=3D=3D=3D=3D + +Appendix 1: Flow Sequence of Steps of a Hyperlaunch Boot +-------------------------------------------------------- + +Provided here is an ordered flow of a Hyperlaunch with a highlight logic +decision points. Not all branch points are recorded, specifically for the +variety of error conditions that may occur. :: + + 1. Hypervisor Startup: + 2a. (x86) Inspect first module provided by the bootloader + a. Is the module an LCM + i. YES: proceed with the Hyperlaunch host boot path + ii. NO: proceed with a Dom0 host boot path + 2b. (Arm) Inspect host dtb for `/chosen/hypervisor` node + a. Is the LCM present + i. YES: proceed with the Hyperlaunch host boot path + ii. NO: proceed with a Dom0/dom0less host boot path + 3. Iterate through the LCM entries looking for the module description + entry + a. Check if any of the modules are microcode or policy and if so, + load + 4. Iterate through the LCM entries processing all domain description + entries + a. Use the details from the Basic Configuration to call + `domain_create` + b. Record if a domain is flagged as the Boot Domain + c. Record if a domain is flagged as the Recovery Domain + 5. Was a Boot Domain created + a. YES: + i. Attach console to Boot Domain + ii. Unpause Boot Domain + iii. Goto Boot Domain (step 6) + b. NO: Goto Launch Finalization (step 10) + 6. Boot Domain: + 7. Boot Domain comes online and may do any of the following actions + a. Process the LCM + b. Validate the MB2 chain + c. Make additional configuration settings for staged domains + d. Unpause any precursor domains + e. Set any runtime configurations + 8. Boot Domain does any necessary cleanup + 9. Boot Domain make hypercall op call to signal it is finished + i. Hypervisor reclaims all Boot Domain resources + ii. Hypervisor records that the Boot Domain ran + ii. Goto Launch Finalization (step 9) + 10. Launch Finalization + 11. If a configured domain was flagged to have the console, the + hypervisor assigns it + 12. The hypervisor clears the LCM and bootloader loaded module, + reclaiming the memory + 13. The hypervisor iterates through domains unpausing any domain not + flagged as the recovery domain + + +Appendix 2: Considerations in Naming the Hyperlaunch Feature +------------------------------------------------------------ + +* The term =E2=80=9CLaunch=E2=80=9D is preferred over =E2=80=9CBoot=E2=80= =9D + + * Multiple individual component boots can occur in the new system = start + process; Launch is preferable for describing the whole process + * Fortunately there is consensus in the current group of stakehold= ers + that the term =E2=80=9CLaunch=E2=80=9D is good and appropriate + +* The names we define must support becoming meaningful and simple to use + outside the Xen community + + * They must be able to be resolved quickly via search engine to a = clear + explanation (eg. Xen marketing material, documentation or wiki) + * We prefer that the terms be helpful for marketing communications + * Consequence: avoid the term =E2=80=9Cdomain=E2=80=9D which is Xe= n-specific and + requires a definition to be provided each time when used elsewhe= re + + +* There is a need to communicate that Xen is capable of being used as a S= tatic + Partitioning hypervisor + + * The community members using and maintaining dom0less are the cur= rent + primary stakeholders for this + +* There is a need to communicate that the new launch functionality provide= s new + capabilities not available elsewhere, and is more than just supporting S= tatic + Partitioning + + * No other hypervisor known to the authors of this document is cap= able + of providing what Hyperlaunch will be able to do. The launch seq= uence is + designed to: + + * Remove dependency on a single, highly-privileged initial= domain + * Allow the initial domains started to be independent and = fully + isolated from each other + * Support configurations where no further VMs can be launc= hed + once the initial domains have started + * Use a standard, extensible format for conveying VM + configuration data + * Ensure that domain building of all initial domains is + performed by the hypervisor from materials supplied by t= he + bootloader + * Enable flexible configuration to be applied to all initi= al + domains by an optional Boot Domain, that runs with limit= ed + privilege, before any other domain starts and obtains th= e VM + configuration data from the bootloader materials via the + hypervisor + * Enable measurements of all of the boot materials prior to + their use, in a sequence with minimized privilege + * Support use-case-specific customized Boot Domains + * Complement the hypervisor=E2=80=99s existing ability to = enforce + policy-based Mandatory Access Control + + +* =E2=80=9CStatic=E2=80=9D and =E2=80=9CDynamic=E2=80=9D have different an= d important meanings in different + communities + + * Static and Dynamic Partitioning describe the ability to create n= ew + virtual machines, or not, after the initial host boot process + completes + * Static and Dynamic Root of Trust describe the nature of the trust + chain for a measured launch. In this case Static is referring to= the + fact that the trust chain is fixed and non-repeatable until the = next + host reboot or shutdown. Whereas Dynamic in this case refers to = the + ability to conduct the measured launch at any time and potential= ly + multiple times before the next host reboot or shutdown.=20 + + * We will be using Hyperlaunch with both Static and Dynamic + Roots of Trust, to launch both Static and Dynamically + Partitioned Systems, and being clear about exactly which + combination is being started will be very important (eg.= for + certification processes) + + * Consequence: uses of =E2=80=9CStatic=E2=80=9D and =E2=80=9CDynam= ic=E2=80=9D need to be qualified if + they are incorporated into the naming of this functionality + + * This can be done by adding the preceding, stronger brand= ed + term: =E2=80=9CHyperlaunch=E2=80=9D, before =E2=80=9CSta= tic=E2=80=9D or =E2=80=9CDynamic=E2=80=9D + * ie. =E2=80=9CHyperlaunch Static=E2=80=9D describes launc= h of a + Statically Partitioned system + * and =E2=80=9CHyperlaunch Dynamic=E2=80=9D describes laun= ch of a + Dynamically Partitioned system. + * In practice, this means that =E2=80=9CHyperlaunch Static= =E2=80=9D describes + starting a Static Partitioned system where no new domain= s can + be started later (ie. no VM has the Control Domain + permission), whereas =E2=80=9CHyperlaunch Dynamic=E2=80= =9D will launch some + VM with the Control Domain permission, able to create VMs + dynamically at a later point. + +**Naming Proposal:** + +* New Term: =E2=80=9CHyperlaunch=E2=80=9D : the ability of a hypervisor to= construct and start + one or more virtual machines at system launch, in the following manner: + + * The hypervisor must build all of the domains that it starts at h= ost + boot + + * Similar to the way the dom0 domain is built by the hyper= visor + today, and how dom0less works: it will run a loop to bui= ld + them all, driven from the configuration provided + * This is a requirement for ensuring that there is Strong + Isolation between each of the initial VMs + + * A single file contains the VM configs (=E2=80=9CLaunch Control M= odule=E2=80=9D: LCM, + in Device Tree binary format) is provided to the hypervisor + + * The hypervisor parses it and builds domains + * If the LCM config says that a Boot Domain should run fir= st, + then the LCM file itself is made available to the Boot D= omain + for it to parse and act on, to invoke operations via the + hypervisor to apply additional configuration to the othe= r VMs + (ie. executing a privilege-constrained toolstack) + +* New Term: =E2=80=9CHyperlaunch Static=E2=80=9D: starts a Static Partitio= ned system, where + only the virtual machines started at system launch are running on the sy= stem + +* New Term: =E2=80=9CHyperlaunch Dynamic=E2=80=9D: starts a system where v= irtual machines may + be dynamically added after the initial virtual machines have started. + + +In the default configuration, Xen will be capable of both styles of Hyperl= aunch +from the same hypervisor binary, when paired with its XSM flask, provides +strong controls that enable fine grained system partitioning. + + +* Retiring Term: =E2=80=9CDomB=E2=80=9D: will no longer be used to describ= e the optional first + domain that is started. It is replaced with the more general term: =E2= =80=9CBoot + Domain=E2=80=9D. + +* Retiring Term: =E2=80=9CDom0less=E2=80=9D: it is to be replaced with =E2= =80=9CHyperlaunch Static=E2=80=9D + + +Appendix 3: Terminology +----------------------- + +To help ensure clarity in reading this document, the following is the +definition of terminology used within this document. + + +Basic Configuration + the minimal information the hypervisor requires to instantiate a domai= n instance + + +Boot Domain + a domain with limited privileges launched by the hypervisor during a + Multiple Domain Boot that runs as the first domain started. In the Hyp= erlaunch + architecture, it is responsible for assisting with higher level operat= ions of + the domain setup process. + + +Classic Launch + a backwards-compatible host boot that ends with the launch of a single= domain (Dom0) + + +Console Domain + a domain that has the Xen console assigned to it + + +Control Domain + a privileged domain that has been granted Control Domain permissions w= hich + are those that are required by the Xen toolstack for managing other do= mains. + These permissions are a subset of those that are granted to Dom0. + + +Device Tree + a standardized data structure, with defined file formats, for describi= ng + initial system configuration + + +Disaggregation + the separation of system roles and responsibilities across multiple + connected components that work together to provide functionality + + +Dom0 + the highly-privileged, first and only domain started at host boot on a + conventional Xen system + + +Dom0less + an existing feature of Xen on Arm that provides Multiple Domain Boot + + +Domain + a running instance of a virtual machine; (as the term is commonly used= in + the Xen Community) + +DomB + =C2=A0the former name for Hyperlaunch + + +Extended Configuration + any configuration options for a domain beyond its Basic Configuration + + +Hardware Domain + a privileged domain that has been granted permissions to access and ma= nage + host hardware. These permissions are a subset of those that are grante= d to + Dom0. + + +Host Boot + the system startup of Xen using the configuration provided by the boot= loader + + +Hyperlaunch + a flexible host boot that ends with the launch of one or more domains + + +Initial Domain + a domain that is described in the LCM that is run as part of a multiple + domain boot. This includes the Boot Domain, Recovery Domain and all La= unched + Domains. + + +Late Hardware Domain + a Hardware Domain that is launched after host boot has already complet= ed + with a running Dom0. When the Late Hardware Domain is started, Dom0 + relinquishes and transfers the permissions to access and manage host h= ardware + to it.. + + +Launch Control Module (LCM) + A file supplied to the hypervisor by the bootloader that contains + configuration data for the hypervisor and the initial set of virtual m= achines + to be run at boot + + +Launched Domain + a domain, aside from the boot domain and recovery domain, that is star= ted as + part of a multiple domain boot and remains running once the boot proce= ss is + complete + + +Multiple Domain Boot + a system configuration where the hypervisor and multiple virtual machi= nes + are all launched when the host system hardware boots + + +Recovery Domain + an optional fallback domain that the hypervisor may start in the event= of a + detectable error encountered during the multiple domain boot process + + +System Device Tree + this is the product of an Arm community project to extend Device Tree = to + cover more aspects of initial system configuration + + +Appendix 4: Copyright License +----------------------------- + +This work is licensed under a Creative Commons Attribution 4.0 Internation= al +License. A copy of this license may be obtained from the Creative Commons +website (https://creativecommons.org/licenses/by/4.0/legalcode). + +| Contributions by: +| Christopher Clark are Copyright =C2=A9 2021 Star Lab Corporation +| Daniel P. Smith are Copyright =C2=A9 2021 Apertus Solutions, LLC --=20 2.25.1 From nobody Tue May 7 14:56:14 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1620963722; cv=none; d=zohomail.com; s=zohoarc; b=h8RpeCVafLj3OBV80jIlFSohEmehCIhDa7IM5F8J49DoEMRIEeM3HtQvRup7P5sve+fiW1R7XitR1DJ0EB0b8IVWw803EoTxuNEVbRc330OwNscSZSTtHuOd4mkn1km+gbRK5l1emHTDJNQHeJ9vJ7J/e4gQfkcY2tg1vMbI/s0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620963722; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=jEFwomjUiAfu63gnJ/imoReC+k3WHkjDiN31O5bV5mw=; b=Hh9HJjBAFKxQZEZib1jg/XXq8oKi79CLiKo6seQQXX4pQzjZGZobmEsMW0RfG2XmN6DyvjT7TNpzn4wGateNEgs/tn3R3fTxw56x1cFKEFebdYrW4amy+JBFZqFjR7/NXSNJ0UWGIjfgpriJoEXbNhEehC7WK7oDkgLe9kKAmeg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1620963722244138.56104660375559; Thu, 13 May 2021 20:42:02 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.127165.238937 (Exim 4.92) (envelope-from ) id 1lhOhw-0005SC-JP; Fri, 14 May 2021 03:41:48 +0000 Received: by outflank-mailman (output) from mailman id 127165.238937; Fri, 14 May 2021 03:41:48 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lhOhw-0005S5-Fp; Fri, 14 May 2021 03:41:48 +0000 Received: by outflank-mailman (input) for mailman id 127165; Fri, 14 May 2021 03:41:47 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1lhOhv-0005RK-B3 for xen-devel@lists.xenproject.org; Fri, 14 May 2021 03:41:47 +0000 Received: from mail-qv1-xf2b.google.com (unknown [2607:f8b0:4864:20::f2b]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 6e17bf28-b5b9-49c2-a70b-5c049ad0651c; Fri, 14 May 2021 03:41:44 +0000 (UTC) Received: by mail-qv1-xf2b.google.com with SMTP id o59so1668297qva.1 for ; Thu, 13 May 2021 20:41:44 -0700 (PDT) Received: from walnut.ice.pyrology.org (mobile-166-176-184-32.mycingular.net. [166.176.184.32]) by smtp.gmail.com with ESMTPSA id g15sm3873432qka.49.2021.05.13.20.41.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 May 2021 20:41:43 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 6e17bf28-b5b9-49c2-a70b-5c049ad0651c DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jEFwomjUiAfu63gnJ/imoReC+k3WHkjDiN31O5bV5mw=; b=U23Lw0L4zuv6T0JwdydBeLFOvmkhIhzsHiBWLJZl7aV+2kDDPQXxaFDMJ69PIyclnT TnQabamvfZuexf4k0IazbVV4Tm7QddIxcxZ/hX+C42VO2jOWlarZjYsbOlWGP4qZsU1Q OQOOt5biRX9w1K1K9LsYzuaKoKmr1bESM5HPF4gUalVzhkX3MaNyPDav3s4Mksi2MMrL 1vYE8Yw0xYlMQ116M2oIKM3UYboFNgC3vVA8t5t6QY0til66uB2WYgoTdHZ8MV9nko+p k/xl17af32QOfWIott8e2NoeblRkNIfFsvNMTVvhPFx8crgBCHrkNq06hsuX3pOITuvW qQUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jEFwomjUiAfu63gnJ/imoReC+k3WHkjDiN31O5bV5mw=; b=p1HqeffNKcJTujeiczq/fNLVXWynCiQdXf7DFIA2J0BiMI4WRcAKkaL4ZOAgLHttLT t5k28GZ9WRLmcIHJI0Mc0ZP6Y37ITkPMGsj5TydLhbw4LB2/BIQ21eSY6itjcQqe8u0Q 0ozIiRgKBEHtVpNO0KIvD3WCaLko9wJoXCgmQK8WycNxoxN0YU4/UScMkc+n3VSf70bk +KMN+bVkriSYuIJHVJ7Ch1XqKZ0glbZUMO5FJyB5rif3OAvmGWMytc3+JcsksWC8SfhQ l+TtwWPMWZJ9+tC7eAY3r7FeePOfFCTUl39rD+Pyo6FGSkEs0u8EzJl83vtlDT0zwS5C h0Iw== X-Gm-Message-State: AOAM533Fc2/q4cvv3Zzmpbzt4JaBc6KJkpAPLQFM591lK/vbFn7kMibo 1/wZIoiCH2kIC7X9JrrttwPb+8nUpfoYYQ== X-Google-Smtp-Source: ABdhPJx4ZWhqlb8L3Muq/l0fThUfeh/ZjiMw9SGdtD79LKcciTAOPSjzinFLv+7s+hARZpo2UX1j0A== X-Received: by 2002:a0c:e486:: with SMTP id n6mr43853662qvl.21.1620963703731; Thu, 13 May 2021 20:41:43 -0700 (PDT) From: Christopher Clark To: xen-devel@lists.xenproject.org Cc: "Daniel P. Smith" , andrew.cooper3@citrix.com, stefano.stabellini@xilinx.com, jgrall@amazon.com, Julien.grall.oss@gmail.com, iwj@xenproject.org, wl@xen.org, george.dunlap@citrix.com, jbeulich@suse.com, persaur@gmail.com, Bertrand.Marquis@arm.com, roger.pau@citrix.com, luca.fancellu@arm.com, paul@xen.org, adam.schwalm@starlab.io, scott.davis@starlab.io, Christopher Clark Subject: [PATCH v4 2/2] docs/designs/launch: Hyperlaunch device tree Date: Thu, 13 May 2021 20:41:01 -0700 Message-Id: <20210514034101.3683-3-christopher.w.clark@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210514034101.3683-1-christopher.w.clark@gmail.com> References: <20210514034101.3683-1-christopher.w.clark@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @gmail.com) From: "Daniel P. Smith" Adds a design document for Hyperlaunch device tree structure. Signed-off-by: Christopher Clark Signed-off by: Daniel P. Smith --- .../designs/launch/hyperlaunch-devicetree.rst | 343 ++++++++++++++++++ 1 file changed, 343 insertions(+) create mode 100644 docs/designs/launch/hyperlaunch-devicetree.rst diff --git a/docs/designs/launch/hyperlaunch-devicetree.rst b/docs/designs/= launch/hyperlaunch-devicetree.rst new file mode 100644 index 0000000000..f97d357407 --- /dev/null +++ b/docs/designs/launch/hyperlaunch-devicetree.rst @@ -0,0 +1,343 @@ +------------------------------------- +Xen Hyperlaunch Device Tree Bindings +------------------------------------- + +The Xen Hyperlaunch device tree adopts the dom0less device tree structure = and +extends it to meet the requirements for the Hyperlaunch capability. The pr= imary +difference is the introduction of the ``hypervisor`` node that is under the +``/chosen`` node. The move to a dedicated node was driven by: + +1. Reduces the need to walk over nodes that are not of interest, e.g. only + nodes of interest should be in ``/chosen/hypervisor`` + +2. Allows for the domain construction information to easily be sanitized by + simple removing the ``/chosen/hypervisor`` node. + +Example Configuration +--------------------- + +Below are two example device tree definitions for the hypervisor node. The +first is an example of a multiboot-based configuration for x86 and the sec= ond +is a module-based configuration for Arm. + +Multiboot x86 Configuration: +"""""""""""""""""""""""""""" + +:: + + hypervisor { + #address-cells =3D <1>; + #size-cells =3D <0>; + compatible =3D =E2=80=9Chypervisor,xen=E2=80=9D +=20 + // Configuration container + config { + compatible =3D "xen,config"; +=20 + module { + compatible =3D "module,microcode", "multiboot,module"; + mb-index =3D <1>; + }; +=20 + module { + compatible =3D "module,xsm-policy", "multiboot,module"; + mb-index =3D <2>; + }; + }; +=20 + // Boot Domain definition + domain { + compatible =3D "xen,domain"; +=20 + domid =3D <0x7FF5>; +=20 + // FUNCTION_NONE (0) + // FUNCTION_BOOT (1 << 0) + // FUNCTION_CRASH (1 << 1) + // FUNCTION_CONSOLE (1 << 2) + // FUNCTION_XENSTORE (1 << 30) + // FUNCTION_LEGACY_DOM0 (1 << 31) + functions =3D <0x00000001>; +=20 + memory =3D <0x0 0x20000>; + cpus =3D <1>; + module { + compatible =3D "module,kernel", "multiboot,module"; + mb-index =3D <3>; + }; +=20 + module { + compatible =3D "module,ramdisk", "multiboot,module"; + mb-index =3D <4>; + }; + module { + compatible =3D "module,config", "multiboot,module"; + mb-index =3D <5>; + }; +=20 + // Classic Dom0 definition + domain { + compatible =3D "xen,domain"; +=20 + domid =3D <0>; +=20 + // PERMISSION_NONE (0) + // PERMISSION_CONTROL (1 << 0) + // PERMISSION_HARDWARE (1 << 1) + permissions =3D <3>; +=20 + // FUNCTION_NONE (0) + // FUNCTION_BOOT (1 << 0) + // FUNCTION_CRASH (1 << 1) + // FUNCTION_CONSOLE (1 << 2) + // FUNCTION_XENSTORE (1 << 30) + // FUNCTION_LEGACY_DOM0 (1 << 31) + functions =3D <0xC0000006>; +=20 + // MODE_PARAVIRTUALIZED (1 << 0) /* PV | PVH/HVM */ + // MODE_ENABLE_DEVICE_MODEL (1 << 1) /* HVM | PVH */ + // MODE_LONG (1 << 2) /* 64 BIT | 32 BIT */ + mode =3D <5>; /* 64 BIT, PV */ +=20 + // UUID + domain-uuid =3D [B3 FB 98 FB 8F 9F 67 A3]; +=20 + cpus =3D <1>; + memory =3D <0x0 0x20000>; + security-id =3D =E2=80=9Cdom0_t; +=20 + module { + compatible =3D "module,kernel", "multiboot,module"; + mb-index =3D <6>; + bootargs =3D "console=3Dhvc0"; + }; + module { + compatible =3D "module,ramdisk", "multiboot,module"; + mb-index =3D <7>; + }; + }; + +The multiboot modules supplied when using the above config would be, in or= der: + +* (the above config, compiled) +* CPU microcode +* XSM policy +* kernel for boot domain +* ramdisk for boot domain +* boot domain configuration file +* kernel for the classic dom0 domain +* ramdisk for the classic dom0 domain + +Module Arm Configuration: +""""""""""""""""""""""""" + +:: + + hypervisor { + compatible =3D =E2=80=9Chypervisor,xen=E2=80=9D +=20 + // Configuration container + config { + compatible =3D "xen,config"; +=20 + module { + compatible =3D "module,microcode=E2=80=9D; + module-addr =3D <0x0000ff00 0x80>; + }; +=20 + module { + compatible =3D "module,xsm-policy"; + module-addr =3D <0x0000ff00 0x80>; +=20 + }; + }; +=20 + // Boot Domain definition + domain { + compatible =3D "xen,domain"; +=20 + domid =3D <0x7FF5>; +=20 + // FUNCTION_NONE (0) + // FUNCTION_BOOT (1 << 0) + // FUNCTION_CRASH (1 << 1) + // FUNCTION_CONSOLE (1 << 2) + // FUNCTION_XENSTORE (1 << 30) + // FUNCTION_LEGACY_DOM0 (1 << 31) + functions =3D <0x00000001>; +=20 + memory =3D <0x0 0x20000>; + cpus =3D <1>; + module { + compatible =3D "module,kernel"; + module-addr =3D <0x0000ff00 0x80>; + }; +=20 + module { + compatible =3D "module,ramdisk"; + module-addr =3D <0x0000ff00 0x80>; + }; + module { + compatible =3D "module,config"; + module-addr =3D <0x0000ff00 0x80>; + }; +=20 + // Classic Dom0 definition + domain@0 { + compatible =3D "xen,domain"; +=20 + domid =3D <0>; +=20 + // PERMISSION_NONE (0) + // PERMISSION_CONTROL (1 << 0) + // PERMISSION_HARDWARE (1 << 1) + permissions =3D <3>; +=20 + // FUNCTION_NONE (0) + // FUNCTION_BOOT (1 << 0) + // FUNCTION_CRASH (1 << 1) + // FUNCTION_CONSOLE (1 << 2) + // FUNCTION_XENSTORE (1 << 30) + // FUNCTION_LEGACY_DOM0 (1 << 31) + functions =3D <0xC0000006>; +=20 + // MODE_PARAVIRTUALIZED (1 << 0) /* PV | PVH/HVM */ + // MODE_ENABLE_DEVICE_MODEL (1 << 1) /* HVM | PVH */ + // MODE_LONG (1 << 2) /* 64 BIT | 32 BIT */ + mode =3D <5>; /* 64 BIT, PV */ +=20 + // UUID + domain-uuid =3D [B3 FB 98 FB 8F 9F 67 A3]; +=20 + cpus =3D <1>; + memory =3D <0x0 0x20000>; + security-id =3D =E2=80=9Cdom0_t=E2=80=9D; +=20 + module { + compatible =3D "module,kernel"; + module-addr =3D <0x0000ff00 0x80>; + bootargs =3D "console=3Dhvc0"; + }; + module { + compatible =3D "module,ramdisk"; + module-addr =3D <0x0000ff00 0x80>; + }; + }; + +The modules that would be supplied when using the above config would be: + +* (the above config, compiled into hardware tree) +* CPU microcode +* XSM policy +* kernel for boot domain +* ramdisk for boot domain +* boot domain configuration file +* kernel for the classic dom0 domain +* ramdisk for the classic dom0 domain + +The hypervisor device tree would be compiled into the hardware device tree= and +provided to Xen using the standard method currently in use. The remaining +modules would need to be loaded in the respective addresses specified in t= he +`module-addr` property. + + +The Hypervisor node +------------------- + +The hypervisor node is a top level container for the domains that will be = built +by hypervisor on start up. On the ``hypervisor`` node the ``compatible`` +property is used to identify the type of hypervisor node present.. + +compatible + Identifies the type of node. Required. + +The Config node +--------------- + +A config node is for detailing any modules that are of interest to Xen its= elf. +For example this would be where Xen would be informed of microcode or XSM +policy locations. If the modules are multiboot modules and are able to be +located by index within the module chain, the ``mb-index`` property should= be +used to specify the index in the multiboot module chain.. If the module wi= ll be +located by physical memory address, then the ``module-addr`` property shou= ld be +used to identify the location and size of the module. + +compatible + Identifies the type of node. Required. + +The Domain node +--------------- + +A domain node is for describing the construction of a domain. It may provi= de a +domid property which will be used as the requested domain id for the domain +with a value of =E2=80=9C0=E2=80=9D signifying to use the next available d= omain id, which is +the default behavior if omitted. A domain configuration is not able to req= uest +a domid of =E2=80=9C0=E2=80=9D. After that a domain node may have any of t= he following +parameters, + +compatible + Identifies the type of node. Required. + +domid + Identifies the domid requested to assign to the domain. Required. + +permissions + This sets what Discretionary Access Control permissions=20 + a domain is assigned. Optional, default is none. + +functions + This identifies what system functions a domain will fulfill. + Optional, the default is none. + +.. note:: The `functions` bits that have been selected to indicate + ``FUNCTION_XENSTORE`` and ``FUNCTION_LEGACY_DOM0`` are the last two bits + (30, 31) such that should these features ever be fully retired, the fla= gs may + be dropped without leaving a gap in the flag set. + +mode + The mode the domain will be executed under. Required. + +domain-uuid + A globally unique identifier for the domain. Optional, + the default is NULL. + +cpus + The number of vCPUs to be assigned to the domain. Optional, + the default is =E2=80=9C1=E2=80=9D. + +memory + The amount of memory to assign to the domain, in KBs. + Required. + +security-id + The security identity to be assigned to the domain when XSM + is the access control mechanism being used. Optional, + the default is =E2=80=9Cdomu_t=E2=80=9D. + +The Module node +--------------- + +This node describes a boot module loaded by the boot loader. The required +compatible property follows the format: module, where type can be +=E2=80=9Ckernel=E2=80=9D, =E2=80=9Cramdisk=E2=80=9D, =E2=80=9Cdevice-tree= =E2=80=9D, =E2=80=9Cmicrocode=E2=80=9D, =E2=80=9Cxsm-policy=E2=80=9D or =E2= =80=9Cconfig=E2=80=9D. In +the case the module is a multiboot module, the additional property string +=E2=80=9Cmultiboot,module=E2=80=9D may be present. One of two properties i= s required and +identifies how to locate the module. They are the mb-index, used for multi= boot +modules, and the module-addr for memory address based location. + +compatible + This identifies what the module is and thus what the hypervisor + should use the module for during domain construction. Required. + +mb-index + This identifies the index for this module in the multiboot module chain. + Required for multiboot environments. + +module-addr + This identifies where in memory this module is located. Required for + non-multiboot environments. + +bootargs + This is used to provide the boot params to kernel modules. + +.. note:: The bootargs property is intended for situations where the same= kernel multiboot module is used for more than one domain. --=20 2.25.1