Allows guest to boot from a vfio configured real dasd device.
Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
---
docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
pc-bios/s390-ccw/Makefile | 2 +-
pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++
pc-bios/s390-ccw/dasd-ipl.h | 16 +++
pc-bios/s390-ccw/main.c | 4 +
pc-bios/s390-ccw/s390-arch.h | 13 +++
6 files changed, 415 insertions(+), 1 deletion(-)
create mode 100644 docs/devel/s390-dasd-ipl.txt
create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
new file mode 100644
index 0000000..84ec7b8
--- /dev/null
+++ b/docs/devel/s390-dasd-ipl.txt
@@ -0,0 +1,132 @@
+*****************************
+***** s390 hardware IPL *****
+*****************************
+
+The s390 hardware IPL process consists of the following steps.
+
+1. A READ IPL ccw is constructed in memory location 0x0.
+ This ccw, by definition, reads the IPL1 record which is located on the disk
+ at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
+ so when it is complete another ccw will be fetched and executed from memory
+ location 0x08.
+
+2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
+ IPL1 data is 24 bytes in length and consists of the following pieces of
+ information: [psw][read ccw][tic ccw]. When the machine executes the Read
+ IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
+ location 0x0. Then the ccw program at 0x08 which consists of a read
+ ccw and a tic ccw is automatically executed because of the chain flag from
+ the original READ IPL ccw. The read ccw will read the IPL2 data into memory
+ and the TIC (Tranfer In Channel) will transfer control to the channel
+ program contained in the IPL2 data. The TIC channel command is the
+ equivalent of a branch/jump/goto instruction for channel programs.
+ NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
+
+3. Execute IPL2.
+ The TIC ccw instruction at the end of the IPL1 channel program will begin
+ the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
+ process and will contain a larger channel program than IPL1. The point of
+ IPL2 is to find and load either the operating system or a small program that
+ loads the operating system from disk. At the end of this step all or some of
+ the real operating system is loaded into memory and we are ready to hand
+ control over to the guest operating system. At this point the guest
+ operating system is entirely responsible for loading any more data it might
+ need to function. NOTE: The IPL2 channel program might read data into memory
+ location 0 thereby overwriting the IPL1 psw and channel program. This is ok
+ as long as the data placed in location 0 contains a psw whose instruction
+ address points to the guest operating system code to execute at the end of
+ the IPL/boot process.
+ NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
+
+4. Start executing the guest operating system.
+ The psw that was loaded into memory location 0 as part of the ipl process
+ should contain the needed flags for the operating system we have loaded. The
+ psw's instruction address will point to the location in memory where we want
+ to start executing the operating system. This psw is loaded (via LPSW
+ instruction) causing control to be passed to the operating system code.
+
+In a non-virtualized environment this process, handled entirely by the hardware,
+is kicked off by the user initiating a "Load" procedure from the hardware
+management console. This "Load" procedure crafts a special "Read IPL" ccw in
+memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
+off the reading of IPL1 data. Since the channel program from IPL1 will be
+written immediately after the special "Read IPL" ccw, the IPL1 channel program
+will be executed immediately (the special read ccw has the chaining bit turned
+on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
+program to be executed automatically. After this sequence completes the "Load"
+procedure then loads the psw from 0x0.
+
+*****************************************
+***** How this all pertains to Qemu *****
+*****************************************
+
+In theory we should merely have to do the following to IPL/boot a guest
+operating system from a DASD device:
+
+1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
+2. Execute channel program at 0x0.
+3. LPSW 0x0.
+
+However, our emulation of the machine's channel program logic is missing one key
+feature that is required for this process to work: non-prefetch of ccw data.
+
+When we start a channel program we pass the channel subsystem parameters via an
+ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
+bit is on then Qemu is allowed to read the entire channel program from guest
+memory before it starts executing it. This means that any channel commands that
+read additional channel commands will not work as expected because the newly
+read commands will only exist in guest memory and NOT within Qemu's channel
+subsystem memory. Qemu's channel subsystem's implementation currently requires
+this bit to be on for all channel programs. This is a problem because the IPL
+process consists of transferring control from the "Read IPL" ccw immediately to
+the IPL1 channel program that was read by "Read IPL".
+
+Not being able to turn off prefetch will also prevent the TIC at the end of the
+IPL1 channel program from transferring control to the IPL2 channel program.
+
+Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
+tansfers control to another channel program segment immediately after reading it
+from the disk. So we need to be able to handle this case.
+
+**************************
+***** What Qemu does *****
+**************************
+
+Since we are forced to live with prefetch we cannot use the very simple IPL
+procedure we defined in the preceding section. So we compensate by doing the
+following.
+
+1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
+2. Execute "Read IPL" at 0x0.
+
+ So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
+
+4. Write a custom channel program that will seek to the IPL2 record and then
+ execute the READ and TIC ccws from IPL1. Normamly the seek is not required
+ because after reading the IPL1 record the disk is automatically positioned
+ to read the very next record which will be IPL2. But since we are not reading
+ both IPL1 and IPL2 as part of the same channel program we must manually set
+ the position.
+
+5. Grab the target address of the TIC instruction from the IPL1 channel program.
+ This address is where the IPL2 channel program starts.
+
+ Now IPL2 is loaded into memory somewhere, and we know the address.
+
+6. Execute the IPL2 channel program at the address obtained in step #5.
+
+ Because this channel program can be dynamic, we must use a special algorithm
+ that detects a READ immediately followed by a TIC and breaks the ccw chain
+ by turning off the chain bit in the READ ccw. When control is returned from
+ the kernel/hardware to the Qemu bios code we immediately issue another start
+ subchannel to execute the remaining TIC instruction. This causes the entire
+ channel program (starting from the TIC) and all needed data to be refetched
+ thereby stepping around the limitation that would otherwise prevent this
+ channel program from executing properly.
+
+ Now the operating system code is loaded somewhere in guest memory and the psw
+ in memory location 0x0 will point to entry code for the guest operating
+ system.
+
+7. LPSW 0x0.
+ LPSW transfers control to the guest operating system and we're done.
diff --git a/pc-bios/s390-ccw/Makefile b/pc-bios/s390-ccw/Makefile
index 12ad9c1..a048b6b 100644
--- a/pc-bios/s390-ccw/Makefile
+++ b/pc-bios/s390-ccw/Makefile
@@ -10,7 +10,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
.PHONY : all clean build-all
OBJECTS = start.o main.o bootmap.o jump2ipl.o sclp.o menu.o \
- virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o
+ virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o dasd-ipl.o
QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
new file mode 100644
index 0000000..b7ce6d9
--- /dev/null
+++ b/pc-bios/s390-ccw/dasd-ipl.c
@@ -0,0 +1,249 @@
+/*
+ * S390 IPL (boot) from a real DASD device via vfio framework.
+ *
+ * Copyright (c) 2018 Jason J. Herne <jjherne@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "libc.h"
+#include "s390-ccw.h"
+#include "s390-arch.h"
+#include "dasd-ipl.h"
+
+static char prefix_page[PAGE_SIZE * 2]
+ __attribute__((__aligned__(PAGE_SIZE * 2)));
+
+static void enable_prefixing(void)
+{
+ memcpy(&prefix_page, (void *)0, 4096);
+ set_prefix(ptr2u32(&prefix_page));
+}
+
+static void disable_prefixing(void)
+{
+ set_prefix(0);
+ /* Copy io interrupt info back to low core */
+ memcpy((void *)0xB8, prefix_page + 0xB8, 12);
+}
+
+static bool is_read_tic_ccw_chain(Ccw0 *ccw)
+{
+ Ccw0 *next_ccw = ccw + 1;
+
+ return ((ccw->cmd_code == CCW_CMD_DASD_READ ||
+ ccw->cmd_code == CCW_CMD_DASD_READ_MT) &&
+ ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC);
+}
+
+static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t *next_cpa)
+{
+ Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr;
+ Ccw0 *tic_ccw;
+
+ while (true) {
+ /* Skip over inline TIC (it might not have the chain bit on) */
+ if (cur_ccw->cmd_code == CCW_CMD_TIC &&
+ cur_ccw->cda == ptr2u32(cur_ccw) - 8) {
+ cur_ccw += 1;
+ continue;
+ }
+
+ if (!cur_ccw->chain) {
+ break;
+ }
+ if (is_read_tic_ccw_chain(cur_ccw)) {
+ /*
+ * Breaking a chain of CCWs may alter the semantics or even the
+ * validity of a channel program. The heuristic implemented below
+ * seems to work well in practice for the channel programs
+ * generated by zipl.
+ */
+ tic_ccw = cur_ccw + 1;
+ *next_cpa = tic_ccw->cda;
+ cur_ccw->chain = 0;
+ return true;
+ }
+ cur_ccw += 1;
+ }
+ return false;
+}
+
+static int run_dynamic_ccw_program(SubChannelId schid, uint32_t cpa)
+{
+ bool has_next;
+ uint32_t next_cpa = 0;
+ int rc;
+
+ do {
+ has_next = dynamic_cp_fixup(cpa, &next_cpa);
+
+ print_int("executing ccw chain at ", cpa);
+ enable_prefixing();
+ rc = do_cio(schid, cpa, CCW_FMT0);
+ disable_prefixing();
+
+ if (rc) {
+ break;
+ }
+ cpa = next_cpa;
+ } while (has_next);
+
+ return rc;
+}
+
+
+static void make_readipl(void)
+{
+ Ccw0 *ccwIplRead = (Ccw0 *)0x00;
+
+ /* Create Read IPL ccw at address 0 */
+ ccwIplRead->cmd_code = CCW_CMD_READ_IPL;
+ ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */
+ ccwIplRead->chain = 0; /* Chain flag */
+ ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */
+}
+
+static void run_readipl(SubChannelId schid)
+{
+ if (do_cio(schid, 0x00, CCW_FMT0)) {
+ panic("dasd-ipl: Failed to run Read IPL channel program");
+ }
+}
+
+/*
+ * The architecture states that IPL1 data should consist of a psw followed by
+ * format-0 READ and TIC CCWs. Let's sanity check.
+ */
+static void check_ipl1(void)
+{
+ Ccw0 *ccwread = (Ccw0 *)0x08;
+ Ccw0 *ccwtic = (Ccw0 *)0x10;
+
+ if (ccwread->cmd_code != CCW_CMD_DASD_READ ||
+ ccwtic->cmd_code != CCW_CMD_TIC) {
+ panic("dasd-ipl: IPL1 data invalid. Is this disk really bootable?\n");
+ }
+}
+
+static void check_ipl2(uint32_t ipl2_addr)
+{
+ Ccw0 *ccw = u32toptr(ipl2_addr);
+
+ if (ipl2_addr == 0x00) {
+ panic("IPL2 address invalid. Is this disk really bootable?\n");
+ }
+ if (ccw->cmd_code == 0x00) {
+ panic("IPL2 ccw data invalid. Is this disk really bootable?\n");
+ }
+}
+
+static uint32_t read_ipl2_addr(void)
+{
+ Ccw0 *ccwtic = (Ccw0 *)0x10;
+
+ return ccwtic->cda;
+}
+
+static void ipl1_fixup(void)
+{
+ Ccw0 *ccwSeek = (Ccw0 *) 0x08;
+ Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
+ Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
+ Ccw0 *ccwRead = (Ccw0 *) 0x20;
+ CcwSeekData *seekData = (CcwSeekData *) 0x30;
+ CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
+
+ /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
+ memcpy(ccwRead, (void *)0x08, 16);
+
+ /* Disable chaining so we don't TIC to IPL2 channel program */
+ ccwRead->chain = 0x00;
+
+ ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
+ ccwSeek->cda = ptr2u32(seekData);
+ ccwSeek->chain = 1;
+ ccwSeek->count = sizeof(seekData);
+ seekData->reserved = 0x00;
+ seekData->cyl = 0x00;
+ seekData->head = 0x00;
+
+ ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
+ ccwSearchID->cda = ptr2u32(searchData);
+ ccwSearchID->chain = 1;
+ ccwSearchID->count = sizeof(searchData);
+ searchData->cyl = 0;
+ searchData->head = 0;
+ searchData->record = 2;
+
+ /* Go back to Search CCW if correct record not yet found */
+ ccwSearchTic->cmd_code = CCW_CMD_TIC;
+ ccwSearchTic->cda = ptr2u32(ccwSearchID);
+}
+
+static void run_ipl1(SubChannelId schid)
+ {
+ uint32_t startAddr = 0x08;
+
+ if (do_cio(schid, startAddr, CCW_FMT0)) {
+ panic("dasd-ipl: Failed to run IPL1 channel program");
+ }
+}
+
+static void run_ipl2(SubChannelId schid, uint32_t addr)
+{
+
+ if (run_dynamic_ccw_program(schid, addr)) {
+ panic("dasd-ipl: Failed to run IPL2 channel program");
+ }
+}
+
+static void lpsw(void *psw_addr)
+{
+ PSWLegacy *pswl = (PSWLegacy *) psw_addr;
+
+ pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */
+ pswl->addr |= PSW_MASK_BAMODE;
+ asm volatile(" llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */
+ " llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear */
+ " llgtr 4,4\n llgtr 5,5\n" /* high part of regs to */
+ " llgtr 6,6\n llgtr 7,7\n" /* avoid messing up */
+ " llgtr 8,8\n llgtr 9,9\n" /* instructions that work */
+ " llgtr 10,10\n llgtr 11,11\n" /* in both addressing */
+ " llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */
+ " llgtr 14,14\n llgtr 15,15\n"
+ " lpsw %0\n"
+ : : "Q" (*pswl) : "cc");
+}
+
+/*
+ * Limitations in QEMU's CCW support complicate the IPL process. Details can
+ * be found in docs/devel/s390-dasd-ipl.txt
+ */
+void dasd_ipl(SubChannelId schid)
+{
+ uint32_t ipl2_addr;
+
+ /* Construct Read IPL CCW and run it to read IPL1 from boot disk */
+ make_readipl();
+ run_readipl(schid);
+ ipl2_addr = read_ipl2_addr();
+ check_ipl1();
+
+ /*
+ * Fixup IPL1 channel program to account for QEMU limitations, then run it
+ * to read IPL2 channel program from boot disk.
+ */
+ ipl1_fixup();
+ run_ipl1(schid);
+ check_ipl2(ipl2_addr);
+
+ /*
+ * Run IPL2 channel program to read operating system code from boot disk
+ * then transfer control to the guest operating system
+ */
+ run_ipl2(schid, ipl2_addr);
+ lpsw(0);
+}
diff --git a/pc-bios/s390-ccw/dasd-ipl.h b/pc-bios/s390-ccw/dasd-ipl.h
new file mode 100644
index 0000000..56bba82
--- /dev/null
+++ b/pc-bios/s390-ccw/dasd-ipl.h
@@ -0,0 +1,16 @@
+/*
+ * S390 IPL (boot) from a real DASD device via vfio framework.
+ *
+ * Copyright (c) 2018 Jason J. Herne <jjherne@us.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef DASD_IPL_H
+#define DASD_IPL_H
+
+void dasd_ipl(SubChannelId schid);
+
+#endif /* DASD_IPL_H */
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 5ee02c3..0a46339 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -13,6 +13,7 @@
#include "s390-ccw.h"
#include "cio.h"
#include "virtio.h"
+#include "dasd-ipl.h"
char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
static SubChannelId blk_schid = { .one = 1 };
@@ -210,6 +211,9 @@ int main(void)
cutype = cu_type(blk_schid) ;
switch (cutype) {
+ case CU_TYPE_DASD_3990:
+ dasd_ipl(blk_schid); /* no return */
+ break;
case CU_TYPE_VIRTIO:
virtio_setup();
zipl_load(); /* no return */
diff --git a/pc-bios/s390-ccw/s390-arch.h b/pc-bios/s390-ccw/s390-arch.h
index 47eaa04..0438d42 100644
--- a/pc-bios/s390-ccw/s390-arch.h
+++ b/pc-bios/s390-ccw/s390-arch.h
@@ -97,4 +97,17 @@ typedef struct LowCore {
extern const LowCore *lowcore;
+static inline void set_prefix(uint32_t address)
+{
+ asm volatile("spx %0" : : "m" (address) : "memory");
+}
+
+static inline uint32_t store_prefix(void)
+{
+ uint32_t address;
+
+ asm volatile("stpx %0" : "=m" (address));
+ return address;
+}
+
#endif
--
2.7.4
On Tue, 29 Jan 2019 08:29:22 -0500 "Jason J. Herne" <jjherne@linux.ibm.com> wrote: > Allows guest to boot from a vfio configured real dasd device. > > Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com> > --- > docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++ > pc-bios/s390-ccw/Makefile | 2 +- > pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++ > pc-bios/s390-ccw/dasd-ipl.h | 16 +++ > pc-bios/s390-ccw/main.c | 4 + > pc-bios/s390-ccw/s390-arch.h | 13 +++ > 6 files changed, 415 insertions(+), 1 deletion(-) > create mode 100644 docs/devel/s390-dasd-ipl.txt This file should probably be added to the s390-ccw boot section in MAINTAINERS (the other new files are already covered.) > create mode 100644 pc-bios/s390-ccw/dasd-ipl.c > create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
On 01/29/2019 08:29 AM, Jason J. Herne wrote:
> Allows guest to boot from a vfio configured real dasd device.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
> ---
> docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
> pc-bios/s390-ccw/Makefile | 2 +-
> pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++
> pc-bios/s390-ccw/dasd-ipl.h | 16 +++
> pc-bios/s390-ccw/main.c | 4 +
> pc-bios/s390-ccw/s390-arch.h | 13 +++
> 6 files changed, 415 insertions(+), 1 deletion(-)
> create mode 100644 docs/devel/s390-dasd-ipl.txt
> create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
> create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
...snip...
> diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
> new file mode 100644
> index 0000000..b7ce6d9
> --- /dev/null
> +++ b/pc-bios/s390-ccw/dasd-ipl.c
> @@ -0,0 +1,249 @@
...snip...
> +static void ipl1_fixup(void)
> +{
> + Ccw0 *ccwSeek = (Ccw0 *) 0x08;
> + Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
> + Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
> + Ccw0 *ccwRead = (Ccw0 *) 0x20;
> + CcwSeekData *seekData = (CcwSeekData *) 0x30;
> + CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
> +
> + /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
> + memcpy(ccwRead, (void *)0x08, 16);
> +
> + /* Disable chaining so we don't TIC to IPL2 channel program */
> + ccwRead->chain = 0x00;
> +
> + ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
> + ccwSeek->cda = ptr2u32(seekData);
> + ccwSeek->chain = 1;
> + ccwSeek->count = sizeof(seekData);
This needs to be sizeof(*seekData)
> + seekData->reserved = 0x00;
> + seekData->cyl = 0x00;
> + seekData->head = 0x00;
> +
> + ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
> + ccwSearchID->cda = ptr2u32(searchData);
> + ccwSearchID->chain = 1;
> + ccwSearchID->count = sizeof(searchData);
sizeof(*searchData)
I notice that vfio sees the count for each of these as 8 bytes despite
them being packed structs of 6 or 5 bytes.
> + searchData->cyl = 0;
> + searchData->head = 0;
> + searchData->record = 2;
> +
> + /* Go back to Search CCW if correct record not yet found */
> + ccwSearchTic->cmd_code = CCW_CMD_TIC;
> + ccwSearchTic->cda = ptr2u32(ccwSearchID);
> +}
> +
> +static void run_ipl1(SubChannelId schid)
> + {
> + uint32_t startAddr = 0x08;
> +
> + if (do_cio(schid, startAddr, CCW_FMT0)) {
> + panic("dasd-ipl: Failed to run IPL1 channel program");
> + }
> +}
> +
> +static void run_ipl2(SubChannelId schid, uint32_t addr)
> +{
> +
> + if (run_dynamic_ccw_program(schid, addr)) {
> + panic("dasd-ipl: Failed to run IPL2 channel program");
> + }
> +}
> +
> +static void lpsw(void *psw_addr)
> +{
> + PSWLegacy *pswl = (PSWLegacy *) psw_addr;
> +
> + pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */
> + pswl->addr |= PSW_MASK_BAMODE;
> + asm volatile(" llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */
> + " llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear */
> + " llgtr 4,4\n llgtr 5,5\n" /* high part of regs to */
> + " llgtr 6,6\n llgtr 7,7\n" /* avoid messing up */
> + " llgtr 8,8\n llgtr 9,9\n" /* instructions that work */
> + " llgtr 10,10\n llgtr 11,11\n" /* in both addressing */
> + " llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */
> + " llgtr 14,14\n llgtr 15,15\n"
> + " lpsw %0\n"
> + : : "Q" (*pswl) : "cc");
> +}
> +
> +/*
> + * Limitations in QEMU's CCW support complicate the IPL process. Details can
> + * be found in docs/devel/s390-dasd-ipl.txt
> + */
> +void dasd_ipl(SubChannelId schid)
> +{
> + uint32_t ipl2_addr;
> +
> + /* Construct Read IPL CCW and run it to read IPL1 from boot disk */
> + make_readipl();
> + run_readipl(schid);
> + ipl2_addr = read_ipl2_addr();
> + check_ipl1();
> +
> + /*
> + * Fixup IPL1 channel program to account for QEMU limitations, then run it
> + * to read IPL2 channel program from boot disk.
> + */
> + ipl1_fixup();
> + run_ipl1(schid);
> + check_ipl2(ipl2_addr);
> +
> + /*
> + * Run IPL2 channel program to read operating system code from boot disk
> + * then transfer control to the guest operating system
> + */
> + run_ipl2(schid, ipl2_addr);
> + lpsw(0);
> +}
> diff --git a/pc-bios/s390-ccw/dasd-ipl.h b/pc-bios/s390-ccw/dasd-ipl.h
> new file mode 100644
> index 0000000..56bba82
> --- /dev/null
> +++ b/pc-bios/s390-ccw/dasd-ipl.h
> @@ -0,0 +1,16 @@
> +/*
> + * S390 IPL (boot) from a real DASD device via vfio framework.
> + *
> + * Copyright (c) 2018 Jason J. Herne <jjherne@us.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#ifndef DASD_IPL_H
> +#define DASD_IPL_H
> +
> +void dasd_ipl(SubChannelId schid);
> +
> +#endif /* DASD_IPL_H */
> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
> index 5ee02c3..0a46339 100644
> --- a/pc-bios/s390-ccw/main.c
> +++ b/pc-bios/s390-ccw/main.c
> @@ -13,6 +13,7 @@
> #include "s390-ccw.h"
> #include "cio.h"
> #include "virtio.h"
> +#include "dasd-ipl.h"
>
> char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
> static SubChannelId blk_schid = { .one = 1 };
> @@ -210,6 +211,9 @@ int main(void)
>
> cutype = cu_type(blk_schid) ;
> switch (cutype) {
> + case CU_TYPE_DASD_3990:
> + dasd_ipl(blk_schid); /* no return */
> + break;
> case CU_TYPE_VIRTIO:
> virtio_setup();
> zipl_load(); /* no return */
> diff --git a/pc-bios/s390-ccw/s390-arch.h b/pc-bios/s390-ccw/s390-arch.h
> index 47eaa04..0438d42 100644
> --- a/pc-bios/s390-ccw/s390-arch.h
> +++ b/pc-bios/s390-ccw/s390-arch.h
> @@ -97,4 +97,17 @@ typedef struct LowCore {
>
> extern const LowCore *lowcore;
>
> +static inline void set_prefix(uint32_t address)
> +{
> + asm volatile("spx %0" : : "m" (address) : "memory");
> +}
> +
> +static inline uint32_t store_prefix(void)
> +{
> + uint32_t address;
> +
> + asm volatile("stpx %0" : "=m" (address));
> + return address;
> +}
> +
> #endif
>
On 2/20/19 9:52 PM, Eric Farman wrote:
>
>
> On 01/29/2019 08:29 AM, Jason J. Herne wrote:
>> Allows guest to boot from a vfio configured real dasd device.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
>> ---
>> docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
>> pc-bios/s390-ccw/Makefile | 2 +-
>> pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++
>> pc-bios/s390-ccw/dasd-ipl.h | 16 +++
>> pc-bios/s390-ccw/main.c | 4 +
>> pc-bios/s390-ccw/s390-arch.h | 13 +++
>> 6 files changed, 415 insertions(+), 1 deletion(-)
>> create mode 100644 docs/devel/s390-dasd-ipl.txt
>> create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
>> create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
>
> ...snip...
>
>> diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
>> new file mode 100644
>> index 0000000..b7ce6d9
>> --- /dev/null
>> +++ b/pc-bios/s390-ccw/dasd-ipl.c
>> @@ -0,0 +1,249 @@
>
> ...snip...
>
>> +static void ipl1_fixup(void)
>> +{
>> + Ccw0 *ccwSeek = (Ccw0 *) 0x08;
>> + Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
>> + Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
>> + Ccw0 *ccwRead = (Ccw0 *) 0x20;
>> + CcwSeekData *seekData = (CcwSeekData *) 0x30;
>> + CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
>> +
>> + /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
>> + memcpy(ccwRead, (void *)0x08, 16);
>> +
>> + /* Disable chaining so we don't TIC to IPL2 channel program */
>> + ccwRead->chain = 0x00;
>> +
>> + ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
>> + ccwSeek->cda = ptr2u32(seekData);
>> + ccwSeek->chain = 1;
>> + ccwSeek->count = sizeof(seekData);
>
> This needs to be sizeof(*seekData)
>
Good catch! Thanks. C can be such a pain sometimes. It should do what I WANT... not what I
SAY :-).
>> + seekData->reserved = 0x00;
>> + seekData->cyl = 0x00;
>> + seekData->head = 0x00;
>> +
>> + ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
>> + ccwSearchID->cda = ptr2u32(searchData);
>> + ccwSearchID->chain = 1;
>> + ccwSearchID->count = sizeof(searchData);
>
> sizeof(*searchData)
>
> I notice that vfio sees the count for each of these as 8 bytes despite them being packed
> structs of 6 or 5 bytes.
>
>> + searchData->cyl = 0;
>> + searchData->head = 0;
>> + searchData->record = 2;
>> +
>> + /* Go back to Search CCW if correct record not yet found */
>> + ccwSearchTic->cmd_code = CCW_CMD_TIC;
>> + ccwSearchTic->cda = ptr2u32(ccwSearchID);
>> +}
>> +
>> +static void run_ipl1(SubChannelId schid)
>> + {
>> + uint32_t startAddr = 0x08;
>> +
>> + if (do_cio(schid, startAddr, CCW_FMT0)) {
>> + panic("dasd-ipl: Failed to run IPL1 channel program");
>> + }
>> +}
>> +
>> +static void run_ipl2(SubChannelId schid, uint32_t addr)
>> +{
>> +
>> + if (run_dynamic_ccw_program(schid, addr)) {
>> + panic("dasd-ipl: Failed to run IPL2 channel program");
>> + }
>> +}
>> +
>> +static void lpsw(void *psw_addr)
>> +{
>> + PSWLegacy *pswl = (PSWLegacy *) psw_addr;
>> +
>> + pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */
>> + pswl->addr |= PSW_MASK_BAMODE;
>> + asm volatile(" llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */
>> + " llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear */
>> + " llgtr 4,4\n llgtr 5,5\n" /* high part of regs to */
>> + " llgtr 6,6\n llgtr 7,7\n" /* avoid messing up */
>> + " llgtr 8,8\n llgtr 9,9\n" /* instructions that work */
>> + " llgtr 10,10\n llgtr 11,11\n" /* in both addressing */
>> + " llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */
>> + " llgtr 14,14\n llgtr 15,15\n"
>> + " lpsw %0\n"
>> + : : "Q" (*pswl) : "cc");
>> +}
>> +
>> +/*
>> + * Limitations in QEMU's CCW support complicate the IPL process. Details can
>> + * be found in docs/devel/s390-dasd-ipl.txt
>> + */
>> +void dasd_ipl(SubChannelId schid)
>> +{
>> + uint32_t ipl2_addr;
>> +
>> + /* Construct Read IPL CCW and run it to read IPL1 from boot disk */
>> + make_readipl();
>> + run_readipl(schid);
>> + ipl2_addr = read_ipl2_addr();
>> + check_ipl1();
>> +
>> + /*
>> + * Fixup IPL1 channel program to account for QEMU limitations, then run it
>> + * to read IPL2 channel program from boot disk.
>> + */
>> + ipl1_fixup();
>> + run_ipl1(schid);
>> + check_ipl2(ipl2_addr);
>> +
>> + /*
>> + * Run IPL2 channel program to read operating system code from boot disk
>> + * then transfer control to the guest operating system
>> + */
>> + run_ipl2(schid, ipl2_addr);
>> + lpsw(0);
>> +}
>> diff --git a/pc-bios/s390-ccw/dasd-ipl.h b/pc-bios/s390-ccw/dasd-ipl.h
>> new file mode 100644
>> index 0000000..56bba82
>> --- /dev/null
>> +++ b/pc-bios/s390-ccw/dasd-ipl.h
>> @@ -0,0 +1,16 @@
>> +/*
>> + * S390 IPL (boot) from a real DASD device via vfio framework.
>> + *
>> + * Copyright (c) 2018 Jason J. Herne <jjherne@us.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>> + * your option) any later version. See the COPYING file in the top-level
>> + * directory.
>> + */
>> +
>> +#ifndef DASD_IPL_H
>> +#define DASD_IPL_H
>> +
>> +void dasd_ipl(SubChannelId schid);
>> +
>> +#endif /* DASD_IPL_H */
>> diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
>> index 5ee02c3..0a46339 100644
>> --- a/pc-bios/s390-ccw/main.c
>> +++ b/pc-bios/s390-ccw/main.c
>> @@ -13,6 +13,7 @@
>> #include "s390-ccw.h"
>> #include "cio.h"
>> #include "virtio.h"
>> +#include "dasd-ipl.h"
>> char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
>> static SubChannelId blk_schid = { .one = 1 };
>> @@ -210,6 +211,9 @@ int main(void)
>> cutype = cu_type(blk_schid) ;
>> switch (cutype) {
>> + case CU_TYPE_DASD_3990:
>> + dasd_ipl(blk_schid); /* no return */
>> + break;
>> case CU_TYPE_VIRTIO:
>> virtio_setup();
>> zipl_load(); /* no return */
>> diff --git a/pc-bios/s390-ccw/s390-arch.h b/pc-bios/s390-ccw/s390-arch.h
>> index 47eaa04..0438d42 100644
>> --- a/pc-bios/s390-ccw/s390-arch.h
>> +++ b/pc-bios/s390-ccw/s390-arch.h
>> @@ -97,4 +97,17 @@ typedef struct LowCore {
>> extern const LowCore *lowcore;
>> +static inline void set_prefix(uint32_t address)
>> +{
>> + asm volatile("spx %0" : : "m" (address) : "memory");
>> +}
>> +
>> +static inline uint32_t store_prefix(void)
>> +{
>> + uint32_t address;
>> +
>> + asm volatile("stpx %0" : "=m" (address));
>> + return address;
>> +}
>> +
>> #endif
>>
>
--
-- Jason J. Herne (jjherne@linux.ibm.com)
On Tue, 29 Jan 2019 08:29:22 -0500
"Jason J. Herne" <jjherne@linux.ibm.com> wrote:
> Allows guest to boot from a vfio configured real dasd device.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
> ---
> docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
> pc-bios/s390-ccw/Makefile | 2 +-
> pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++
> pc-bios/s390-ccw/dasd-ipl.h | 16 +++
> pc-bios/s390-ccw/main.c | 4 +
> pc-bios/s390-ccw/s390-arch.h | 13 +++
> 6 files changed, 415 insertions(+), 1 deletion(-)
> create mode 100644 docs/devel/s390-dasd-ipl.txt
> create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
> create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
>
> diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
> new file mode 100644
> index 0000000..84ec7b8
> --- /dev/null
> +++ b/docs/devel/s390-dasd-ipl.txt
> @@ -0,0 +1,132 @@
> +*****************************
> +***** s390 hardware IPL *****
> +*****************************
> +
> +The s390 hardware IPL process consists of the following steps.
> +
> +1. A READ IPL ccw is constructed in memory location 0x0.
> + This ccw, by definition, reads the IPL1 record which is located on the disk
> + at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
> + so when it is complete another ccw will be fetched and executed from memory
> + location 0x08.
> +
> +2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
> + IPL1 data is 24 bytes in length and consists of the following pieces of
> + information: [psw][read ccw][tic ccw]. When the machine executes the Read
> + IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
> + location 0x0. Then the ccw program at 0x08 which consists of a read
> + ccw and a tic ccw is automatically executed because of the chain flag from
> + the original READ IPL ccw. The read ccw will read the IPL2 data into memory
> + and the TIC (Tranfer In Channel) will transfer control to the channel
> + program contained in the IPL2 data. The TIC channel command is the
> + equivalent of a branch/jump/goto instruction for channel programs.
> + NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
> +
> +3. Execute IPL2.
> + The TIC ccw instruction at the end of the IPL1 channel program will begin
> + the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
> + process and will contain a larger channel program than IPL1. The point of
> + IPL2 is to find and load either the operating system or a small program that
> + loads the operating system from disk. At the end of this step all or some of
> + the real operating system is loaded into memory and we are ready to hand
> + control over to the guest operating system. At this point the guest
> + operating system is entirely responsible for loading any more data it might
> + need to function. NOTE: The IPL2 channel program might read data into memory
> + location 0 thereby overwriting the IPL1 psw and channel program. This is ok
> + as long as the data placed in location 0 contains a psw whose instruction
> + address points to the guest operating system code to execute at the end of
> + the IPL/boot process.
> + NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
> +
> +4. Start executing the guest operating system.
> + The psw that was loaded into memory location 0 as part of the ipl process
> + should contain the needed flags for the operating system we have loaded. The
> + psw's instruction address will point to the location in memory where we want
> + to start executing the operating system. This psw is loaded (via LPSW
> + instruction) causing control to be passed to the operating system code.
> +
> +In a non-virtualized environment this process, handled entirely by the hardware,
> +is kicked off by the user initiating a "Load" procedure from the hardware
> +management console. This "Load" procedure crafts a special "Read IPL" ccw in
> +memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
> +off the reading of IPL1 data. Since the channel program from IPL1 will be
> +written immediately after the special "Read IPL" ccw, the IPL1 channel program
> +will be executed immediately (the special read ccw has the chaining bit turned
> +on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
> +program to be executed automatically. After this sequence completes the "Load"
> +procedure then loads the psw from 0x0.
Nice summary!
> +
> +*****************************************
> +***** How this all pertains to Qemu *****
s/Qemu/QEMU/
(also below)
> +*****************************************
> +
> +In theory we should merely have to do the following to IPL/boot a guest
> +operating system from a DASD device:
> +
> +1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
> +2. Execute channel program at 0x0.
> +3. LPSW 0x0.
> +
> +However, our emulation of the machine's channel program logic is missing one key
> +feature that is required for this process to work: non-prefetch of ccw data.
> +
> +When we start a channel program we pass the channel subsystem parameters via an
> +ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
> +bit is on then Qemu is allowed to read the entire channel program from guest
> +memory before it starts executing it. This means that any channel commands that
> +read additional channel commands will not work as expected because the newly
> +read commands will only exist in guest memory and NOT within Qemu's channel
> +subsystem memory. Qemu's channel subsystem's implementation currently requires
But isn't that the vfio-ccw backend, rather than the channel subsystem
implementation?
> +this bit to be on for all channel programs. This is a problem because the IPL
> +process consists of transferring control from the "Read IPL" ccw immediately to
> +the IPL1 channel program that was read by "Read IPL".
> +
> +Not being able to turn off prefetch will also prevent the TIC at the end of the
> +IPL1 channel program from transferring control to the IPL2 channel program.
> +
> +Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
> +tansfers control to another channel program segment immediately after reading it
> +from the disk. So we need to be able to handle this case.
> +
> +**************************
> +***** What Qemu does *****
> +**************************
> +
> +Since we are forced to live with prefetch we cannot use the very simple IPL
> +procedure we defined in the preceding section. So we compensate by doing the
> +following.
> +
> +1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
> +2. Execute "Read IPL" at 0x0.
> +
> + So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
> +
> +4. Write a custom channel program that will seek to the IPL2 record and then
> + execute the READ and TIC ccws from IPL1. Normamly the seek is not required
> + because after reading the IPL1 record the disk is automatically positioned
> + to read the very next record which will be IPL2. But since we are not reading
> + both IPL1 and IPL2 as part of the same channel program we must manually set
> + the position.
> +
> +5. Grab the target address of the TIC instruction from the IPL1 channel program.
> + This address is where the IPL2 channel program starts.
> +
> + Now IPL2 is loaded into memory somewhere, and we know the address.
> +
> +6. Execute the IPL2 channel program at the address obtained in step #5.
> +
> + Because this channel program can be dynamic, we must use a special algorithm
> + that detects a READ immediately followed by a TIC and breaks the ccw chain
> + by turning off the chain bit in the READ ccw. When control is returned from
> + the kernel/hardware to the Qemu bios code we immediately issue another start
> + subchannel to execute the remaining TIC instruction. This causes the entire
> + channel program (starting from the TIC) and all needed data to be refetched
> + thereby stepping around the limitation that would otherwise prevent this
> + channel program from executing properly.
> +
> + Now the operating system code is loaded somewhere in guest memory and the psw
> + in memory location 0x0 will point to entry code for the guest operating
> + system.
> +
> +7. LPSW 0x0.
> + LPSW transfers control to the guest operating system and we're done.
Also a good explanation of the procedure here!
(...)
> +static int run_dynamic_ccw_program(SubChannelId schid, uint32_t cpa)
> +{
> + bool has_next;
> + uint32_t next_cpa = 0;
> + int rc;
> +
> + do {
> + has_next = dynamic_cp_fixup(cpa, &next_cpa);
> +
> + print_int("executing ccw chain at ", cpa);
Do you want to keep the unconditional print here? Or make it a
debug_print_int, and maybe an unconditional print on error?
> + enable_prefixing();
> + rc = do_cio(schid, cpa, CCW_FMT0);
> + disable_prefixing();
> +
> + if (rc) {
> + break;
> + }
> + cpa = next_cpa;
> + } while (has_next);
> +
> + return rc;
> +}
Code looks fine after a quick browse.
On 2/4/19 7:02 AM, Cornelia Huck wrote:
> On Tue, 29 Jan 2019 08:29:22 -0500
> "Jason J. Herne" <jjherne@linux.ibm.com> wrote:
>
>> Allows guest to boot from a vfio configured real dasd device.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
>> ---
>> docs/devel/s390-dasd-ipl.txt | 132 +++++++++++++++++++++++
>> pc-bios/s390-ccw/Makefile | 2 +-
>> pc-bios/s390-ccw/dasd-ipl.c | 249 +++++++++++++++++++++++++++++++++++++++++++
>> pc-bios/s390-ccw/dasd-ipl.h | 16 +++
>> pc-bios/s390-ccw/main.c | 4 +
>> pc-bios/s390-ccw/s390-arch.h | 13 +++
>> 6 files changed, 415 insertions(+), 1 deletion(-)
>> create mode 100644 docs/devel/s390-dasd-ipl.txt
>> create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
>> create mode 100644 pc-bios/s390-ccw/dasd-ipl.h
>>
>> diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
>> new file mode 100644
>> index 0000000..84ec7b8
>> --- /dev/null
>> +++ b/docs/devel/s390-dasd-ipl.txt
>> @@ -0,0 +1,132 @@
>> +*****************************
>> +***** s390 hardware IPL *****
>> +*****************************
>> +
>> +The s390 hardware IPL process consists of the following steps.
>> +
>> +1. A READ IPL ccw is constructed in memory location 0x0.
>> + This ccw, by definition, reads the IPL1 record which is located on the disk
>> + at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
>> + so when it is complete another ccw will be fetched and executed from memory
>> + location 0x08.
>> +
>> +2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
>> + IPL1 data is 24 bytes in length and consists of the following pieces of
>> + information: [psw][read ccw][tic ccw]. When the machine executes the Read
>> + IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
>> + location 0x0. Then the ccw program at 0x08 which consists of a read
>> + ccw and a tic ccw is automatically executed because of the chain flag from
>> + the original READ IPL ccw. The read ccw will read the IPL2 data into memory
>> + and the TIC (Tranfer In Channel) will transfer control to the channel
>> + program contained in the IPL2 data. The TIC channel command is the
>> + equivalent of a branch/jump/goto instruction for channel programs.
>> + NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
>> +
>> +3. Execute IPL2.
>> + The TIC ccw instruction at the end of the IPL1 channel program will begin
>> + the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
>> + process and will contain a larger channel program than IPL1. The point of
>> + IPL2 is to find and load either the operating system or a small program that
>> + loads the operating system from disk. At the end of this step all or some of
>> + the real operating system is loaded into memory and we are ready to hand
>> + control over to the guest operating system. At this point the guest
>> + operating system is entirely responsible for loading any more data it might
>> + need to function. NOTE: The IPL2 channel program might read data into memory
>> + location 0 thereby overwriting the IPL1 psw and channel program. This is ok
>> + as long as the data placed in location 0 contains a psw whose instruction
>> + address points to the guest operating system code to execute at the end of
>> + the IPL/boot process.
>> + NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
>> +
>> +4. Start executing the guest operating system.
>> + The psw that was loaded into memory location 0 as part of the ipl process
>> + should contain the needed flags for the operating system we have loaded. The
>> + psw's instruction address will point to the location in memory where we want
>> + to start executing the operating system. This psw is loaded (via LPSW
>> + instruction) causing control to be passed to the operating system code.
>> +
>> +In a non-virtualized environment this process, handled entirely by the hardware,
>> +is kicked off by the user initiating a "Load" procedure from the hardware
>> +management console. This "Load" procedure crafts a special "Read IPL" ccw in
>> +memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
>> +off the reading of IPL1 data. Since the channel program from IPL1 will be
>> +written immediately after the special "Read IPL" ccw, the IPL1 channel program
>> +will be executed immediately (the special read ccw has the chaining bit turned
>> +on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
>> +program to be executed automatically. After this sequence completes the "Load"
>> +procedure then loads the psw from 0x0.
>
> Nice summary!
>
>> +
>> +*****************************************
>> +***** How this all pertains to Qemu *****
>
> s/Qemu/QEMU/
>
> (also below)
>
Fixed.
>> +*****************************************
>> +
>> +In theory we should merely have to do the following to IPL/boot a guest
>> +operating system from a DASD device:
>> +
>> +1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
>> +2. Execute channel program at 0x0.
>> +3. LPSW 0x0.
>> +
>> +However, our emulation of the machine's channel program logic is missing one key
>> +feature that is required for this process to work: non-prefetch of ccw data.
>> +
>> +When we start a channel program we pass the channel subsystem parameters via an
>> +ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
>> +bit is on then Qemu is allowed to read the entire channel program from guest
>> +memory before it starts executing it. This means that any channel commands that
>> +read additional channel commands will not work as expected because the newly
>> +read commands will only exist in guest memory and NOT within Qemu's channel
>> +subsystem memory. Qemu's channel subsystem's implementation currently requires
>
> But isn't that the vfio-ccw backend, rather than the channel subsystem
> implementation?
>
Yep, you're right. I'll clarify this.
>> +this bit to be on for all channel programs. This is a problem because the IPL
>> +process consists of transferring control from the "Read IPL" ccw immediately to
>> +the IPL1 channel program that was read by "Read IPL".
>> +
>> +Not being able to turn off prefetch will also prevent the TIC at the end of the
>> +IPL1 channel program from transferring control to the IPL2 channel program.
>> +
>> +Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
>> +tansfers control to another channel program segment immediately after reading it
>> +from the disk. So we need to be able to handle this case.
>> +
>> +**************************
>> +***** What Qemu does *****
>> +**************************
>> +
>> +Since we are forced to live with prefetch we cannot use the very simple IPL
>> +procedure we defined in the preceding section. So we compensate by doing the
>> +following.
>> +
>> +1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
>> +2. Execute "Read IPL" at 0x0.
>> +
>> + So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
>> +
>> +4. Write a custom channel program that will seek to the IPL2 record and then
>> + execute the READ and TIC ccws from IPL1. Normamly the seek is not required
>> + because after reading the IPL1 record the disk is automatically positioned
>> + to read the very next record which will be IPL2. But since we are not reading
>> + both IPL1 and IPL2 as part of the same channel program we must manually set
>> + the position.
>> +
>> +5. Grab the target address of the TIC instruction from the IPL1 channel program.
>> + This address is where the IPL2 channel program starts.
>> +
>> + Now IPL2 is loaded into memory somewhere, and we know the address.
>> +
>> +6. Execute the IPL2 channel program at the address obtained in step #5.
>> +
>> + Because this channel program can be dynamic, we must use a special algorithm
>> + that detects a READ immediately followed by a TIC and breaks the ccw chain
>> + by turning off the chain bit in the READ ccw. When control is returned from
>> + the kernel/hardware to the Qemu bios code we immediately issue another start
>> + subchannel to execute the remaining TIC instruction. This causes the entire
>> + channel program (starting from the TIC) and all needed data to be refetched
>> + thereby stepping around the limitation that would otherwise prevent this
>> + channel program from executing properly.
>> +
>> + Now the operating system code is loaded somewhere in guest memory and the psw
>> + in memory location 0x0 will point to entry code for the guest operating
>> + system.
>> +
>> +7. LPSW 0x0.
>> + LPSW transfers control to the guest operating system and we're done.
>
> Also a good explanation of the procedure here!
>
> (...)
>
>> +static int run_dynamic_ccw_program(SubChannelId schid, uint32_t cpa)
>> +{
>> + bool has_next;
>> + uint32_t next_cpa = 0;
>> + int rc;
>> +
>> + do {
>> + has_next = dynamic_cp_fixup(cpa, &next_cpa);
>> +
>> + print_int("executing ccw chain at ", cpa);
>
> Do you want to keep the unconditional print here? Or make it a
> debug_print_int, and maybe an unconditional print on error?
>
Personally, I like having this here unconditionally. If things hang up or go wrong this
lets us know if it was before or after we jumped into actual guest OS code. I know I could
make it debug only, but having it all the time means better first failure data capture.
--
-- Jason J. Herne (jjherne@linux.ibm.com)
© 2016 - 2026 Red Hat, Inc.