[PATCH] spapr: Rework hash<->radix transitions at CAS

Greg Kurz posted 1 patch 4 years, 2 months ago
Test docker-quick@centos7 passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/158160831807.3339719.7059822505220975954.stgit@bahia.lan
Maintainers: David Gibson <david@gibson.dropbear.id.au>
hw/ppc/spapr.c         |   25 +++++++++++++++-----
hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
include/hw/ppc/spapr.h |    1 +
3 files changed, 44 insertions(+), 41 deletions(-)
[PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by Greg Kurz 4 years, 2 months ago
Until the CAS negotiation is over, an HPT can be allocated on three
different paths:

1) during machine reset if the host doesn't support radix,

2) during CAS if the guest wants hash and doesn't support HPT resizing,
   in which case we pre-emptively resize the HPT to accomodate maxram,

3) during CAS if no CAS reboot was requested, the guest wants hash but
   we're currently configured for radix.

Depending on the various combinations of host or guest MMU support,
HPT resizing guest support and the possibility of a CAS reboot, it
is quite hard to know which of these allocates the HPT that will
be ultimately used by the guest that wants to do hash. Also, some of
them have bugs:

- 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
  and thus doesn't update the VRMA size, even though we've just extended
  the HPT. Not sure what issues this can cause,

- 3) doesn't check for HPT resizing support and will always allocate a
  small HPT based on the initial RAM size. This caps the total amount of
  RAM the guest can see, especially if maxram is much higher than the
  initial ram.

We only support guests that do CAS and we already assume that the HPT
isn't being used when we do the pre-emptive resizing at CAS. It thus
seems reasonable to only allocate the HPT at the end of CAS, when no
CAS reboot was requested.

Consolidate the logic so that we only create the HPT during 3), ie.
when we're done with the CAS reboot cycles, and ensure HPT resizing
is taken into account. This fixes the radix->hash transition for
all cases.

The guest can theoretically call CAS several times, without a CAS
reboot in between. Linux guests don't do that, but better safe than
sorry, let's ensure we can also handle the symmetrical hash->radix
transition correctly: free the HPT and set the GR bit in PATE.
An helper is introduced for the latter since this is already what
we do during machine reset when going for radix.

As a bonus, this removes one user of spapr->cas_reboot, which we
want to get rid of in the future.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c         |   25 +++++++++++++++-----
 hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
 include/hw/ppc/spapr.h |    1 +
 3 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 828e2cc1359a..88bc0e4e3ca1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 {
     int hpt_shift;
 
+    /*
+     * HPT resizing is a bit of a special case, because when enabled
+     * we assume an HPT guest will support it until it says it
+     * doesn't, instead of assuming it won't support it until it says
+     * it does.  Strictly speaking that approach could break for
+     * guests which don't make a CAS call, but those are so old we
+     * don't care about them.  Without that assumption we'd have to
+     * make at least a temporary allocation of an HPT sized for max
+     * memory, which could be impossibly difficult under KVM HV if
+     * maxram is large.
+     */
     if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
-        || (spapr->cas_reboot
-            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
+        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
         hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
     } else {
         uint64_t current_ram_size;
@@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
     return 0;
 }
 
+void spapr_reset_patb_entry(SpaprMachineState *spapr)
+{
+    spapr->patb_entry = PATE1_GR;
+    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
+}
+
 static void spapr_machine_reset(MachineState *machine)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
@@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
          * without a HPT because KVM will start them in radix mode.
          * Set the GR bit in PATE so that we know there is no HPT.
          */
-        spapr->patb_entry = PATE1_GR;
-        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
-    } else {
-        spapr_setup_hpt_and_vrma(spapr);
+        spapr_reset_patb_entry(spapr);
     }
 
     qemu_devices_reset();
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b8bb66b5c0d4..57ddf0fa6d05 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     bool raw_mode_supported = false;
     bool guest_xive;
     CPUState *cs;
+    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
 
     /* CAS is supposed to be called early when only the boot vCPU is active. */
     CPU_FOREACH(cs) {
@@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
 
-    /*
-     * HPT resizing is a bit of a special case, because when enabled
-     * we assume an HPT guest will support it until it says it
-     * doesn't, instead of assuming it won't support it until it says
-     * it does.  Strictly speaking that approach could break for
-     * guests which don't make a CAS call, but those are so old we
-     * don't care about them.  Without that assumption we'd have to
-     * make at least a temporary allocation of an HPT sized for max
-     * memory, which could be impossibly difficult under KVM HV if
-     * maxram is large.
-     */
-    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
-        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
-
-        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
-            error_report(
-                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
-            exit(1);
-        }
-
-        if (spapr->htab_shift < maxshift) {
-            /* Guest doesn't know about HPT resizing, so we
-             * pre-emptively resize for the maximum permitted RAM.  At
-             * the point this is called, nothing should have been
-             * entered into the existing HPT */
-            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
-            push_sregs_to_kvm_pr(spapr);
-        }
-    }
-
     /* NOTE: there are actually a number of ov5 bits where input from the
      * guest is always zero, and the platform/QEMU enables them independently
      * of guest input. To model these properly we'd want some sort of mask,
@@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             error_report("Guest requested unavailable MMU mode (hash).");
             exit(EXIT_FAILURE);
         }
+        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
+            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
+            error_report(
+                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
+            exit(1);
+        }
     }
     spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
     spapr_ovec_cleanup(ov1_guest);
@@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
         void *fdt;
         SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
 
-        /* If spapr_machine_reset() did not set up a HPT but one is necessary
-         * (because the guest isn't going to use radix) then set it up here. */
-        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
-            /* legacy hash or new hash: */
-            spapr_setup_hpt_and_vrma(spapr);
+        if (!guest_radix) {
+            /*
+             * Either spapr_machine_reset() did not set up a HPT but one
+             * is necessary (because the guest isn't going to use radix),
+             * or the guest doesn't know about HPT resizing and we need to
+             * pre-emptively resize for the maximum permitted RAM. Set it
+             * up here. At the point this is called, nothing should have
+             * been entered into the existing HPT.
+             */
+            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
+                /* legacy hash or new hash: */
+                spapr_setup_hpt_and_vrma(spapr);
+                push_sregs_to_kvm_pr(spapr);
+            }
+        } else {
+            spapr_free_hpt(spapr);
+            spapr_reset_patb_entry(spapr);
         }
 
         if (fdt_bufsize < sizeof(hdr)) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 09110961a589..9d88b5596481 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
+void spapr_reset_patb_entry(SpaprMachineState *spapr);
 #endif /* HW_SPAPR_H */


Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by David Gibson 4 years, 2 months ago
On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> Until the CAS negotiation is over, an HPT can be allocated on three
> different paths:
> 
> 1) during machine reset if the host doesn't support radix,
> 
> 2) during CAS if the guest wants hash and doesn't support HPT resizing,
>    in which case we pre-emptively resize the HPT to accomodate maxram,
> 
> 3) during CAS if no CAS reboot was requested, the guest wants hash but
>    we're currently configured for radix.
> 
> Depending on the various combinations of host or guest MMU support,
> HPT resizing guest support and the possibility of a CAS reboot, it
> is quite hard to know which of these allocates the HPT that will
> be ultimately used by the guest that wants to do hash. Also, some of
> them have bugs:
> 
> - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
>   and thus doesn't update the VRMA size, even though we've just extended
>   the HPT. Not sure what issues this can cause,
> 
> - 3) doesn't check for HPT resizing support and will always allocate a
>   small HPT based on the initial RAM size. This caps the total amount of
>   RAM the guest can see, especially if maxram is much higher than the
>   initial ram.
> 
> We only support guests that do CAS and we already assume that the HPT
> isn't being used when we do the pre-emptive resizing at CAS. It thus
> seems reasonable to only allocate the HPT at the end of CAS, when no
> CAS reboot was requested.
> 
> Consolidate the logic so that we only create the HPT during 3), ie.
> when we're done with the CAS reboot cycles, and ensure HPT resizing
> is taken into account. This fixes the radix->hash transition for
> all cases.

Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
need the HPT at all times there, or there's nowhere to put VRMA
entries, so we can't run even in real mode.

> The guest can theoretically call CAS several times, without a CAS
> reboot in between. Linux guests don't do that, but better safe than
> sorry, let's ensure we can also handle the symmetrical hash->radix
> transition correctly: free the HPT and set the GR bit in PATE.
> An helper is introduced for the latter since this is already what
> we do during machine reset when going for radix.
> 
> As a bonus, this removes one user of spapr->cas_reboot, which we
> want to get rid of in the future.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/ppc/spapr.c         |   25 +++++++++++++++-----
>  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
>  include/hw/ppc/spapr.h |    1 +
>  3 files changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828e2cc1359a..88bc0e4e3ca1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>  {
>      int hpt_shift;
>  
> +    /*
> +     * HPT resizing is a bit of a special case, because when enabled
> +     * we assume an HPT guest will support it until it says it
> +     * doesn't, instead of assuming it won't support it until it says
> +     * it does.  Strictly speaking that approach could break for
> +     * guests which don't make a CAS call, but those are so old we
> +     * don't care about them.  Without that assumption we'd have to
> +     * make at least a temporary allocation of an HPT sized for max
> +     * memory, which could be impossibly difficult under KVM HV if
> +     * maxram is large.
> +     */
>      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> -        || (spapr->cas_reboot
> -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
>          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>      } else {
>          uint64_t current_ram_size;
> @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
>      return 0;
>  }
>  
> +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> +{
> +    spapr->patb_entry = PATE1_GR;
> +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> +}
> +
>  static void spapr_machine_reset(MachineState *machine)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
>           * without a HPT because KVM will start them in radix mode.
>           * Set the GR bit in PATE so that we know there is no HPT.
>           */
> -        spapr->patb_entry = PATE1_GR;
> -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> -    } else {
> -        spapr_setup_hpt_and_vrma(spapr);
> +        spapr_reset_patb_entry(spapr);
>      }
>  
>      qemu_devices_reset();
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index b8bb66b5c0d4..57ddf0fa6d05 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      bool raw_mode_supported = false;
>      bool guest_xive;
>      CPUState *cs;
> +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>  
>      /* CAS is supposed to be called early when only the boot vCPU is active. */
>      CPU_FOREACH(cs) {
> @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>  
>      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
>  
> -    /*
> -     * HPT resizing is a bit of a special case, because when enabled
> -     * we assume an HPT guest will support it until it says it
> -     * doesn't, instead of assuming it won't support it until it says
> -     * it does.  Strictly speaking that approach could break for
> -     * guests which don't make a CAS call, but those are so old we
> -     * don't care about them.  Without that assumption we'd have to
> -     * make at least a temporary allocation of an HPT sized for max
> -     * memory, which could be impossibly difficult under KVM HV if
> -     * maxram is large.
> -     */
> -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> -
> -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> -            error_report(
> -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> -            exit(1);
> -        }
> -
> -        if (spapr->htab_shift < maxshift) {
> -            /* Guest doesn't know about HPT resizing, so we
> -             * pre-emptively resize for the maximum permitted RAM.  At
> -             * the point this is called, nothing should have been
> -             * entered into the existing HPT */
> -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> -            push_sregs_to_kvm_pr(spapr);
> -        }
> -    }
> -
>      /* NOTE: there are actually a number of ov5 bits where input from the
>       * guest is always zero, and the platform/QEMU enables them independently
>       * of guest input. To model these properly we'd want some sort of mask,
> @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>              error_report("Guest requested unavailable MMU mode (hash).");
>              exit(EXIT_FAILURE);
>          }
> +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> +            error_report(
> +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> +            exit(1);
> +        }
>      }
>      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
>      spapr_ovec_cleanup(ov1_guest);
> @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>          void *fdt;
>          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
>  
> -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> -         * (because the guest isn't going to use radix) then set it up here. */
> -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> -            /* legacy hash or new hash: */
> -            spapr_setup_hpt_and_vrma(spapr);
> +        if (!guest_radix) {
> +            /*
> +             * Either spapr_machine_reset() did not set up a HPT but one
> +             * is necessary (because the guest isn't going to use radix),
> +             * or the guest doesn't know about HPT resizing and we need to
> +             * pre-emptively resize for the maximum permitted RAM. Set it
> +             * up here. At the point this is called, nothing should have
> +             * been entered into the existing HPT.
> +             */
> +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> +                /* legacy hash or new hash: */
> +                spapr_setup_hpt_and_vrma(spapr);
> +                push_sregs_to_kvm_pr(spapr);
> +            }
> +        } else {
> +            spapr_free_hpt(spapr);
> +            spapr_reset_patb_entry(spapr);
>          }
>  
>          if (fdt_bufsize < sizeof(hdr)) {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 09110961a589..9d88b5596481 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_reset_patb_entry(SpaprMachineState *spapr);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by Greg Kurz 4 years, 2 months ago
On Fri, 14 Feb 2020 09:28:35 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > Until the CAS negotiation is over, an HPT can be allocated on three
> > different paths:
> > 
> > 1) during machine reset if the host doesn't support radix,
> > 
> > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > 
> > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> >    we're currently configured for radix.
> > 
> > Depending on the various combinations of host or guest MMU support,
> > HPT resizing guest support and the possibility of a CAS reboot, it
> > is quite hard to know which of these allocates the HPT that will
> > be ultimately used by the guest that wants to do hash. Also, some of
> > them have bugs:
> > 
> > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> >   and thus doesn't update the VRMA size, even though we've just extended
> >   the HPT. Not sure what issues this can cause,
> > 
> > - 3) doesn't check for HPT resizing support and will always allocate a
> >   small HPT based on the initial RAM size. This caps the total amount of
> >   RAM the guest can see, especially if maxram is much higher than the
> >   initial ram.
> > 
> > We only support guests that do CAS and we already assume that the HPT
> > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > seems reasonable to only allocate the HPT at the end of CAS, when no
> > CAS reboot was requested.
> > 
> > Consolidate the logic so that we only create the HPT during 3), ie.
> > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > is taken into account. This fixes the radix->hash transition for
> > all cases.
> 
> Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> need the HPT at all times there, or there's nowhere to put VRMA
> entries, so we can't run even in real mode.
> 

Well it happens to be working anyway because KVM automatically
creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
if QEMU didn't do so already... Would a comment to emphasize this
be enough or do you prefer I don't drop the HPT allocation currently
performed at machine reset ?

> > The guest can theoretically call CAS several times, without a CAS
> > reboot in between. Linux guests don't do that, but better safe than
> > sorry, let's ensure we can also handle the symmetrical hash->radix
> > transition correctly: free the HPT and set the GR bit in PATE.
> > An helper is introduced for the latter since this is already what
> > we do during machine reset when going for radix.
> > 
> > As a bonus, this removes one user of spapr->cas_reboot, which we
> > want to get rid of in the future.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> >  include/hw/ppc/spapr.h |    1 +
> >  3 files changed, 44 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 828e2cc1359a..88bc0e4e3ca1 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> >  {
> >      int hpt_shift;
> >  
> > +    /*
> > +     * HPT resizing is a bit of a special case, because when enabled
> > +     * we assume an HPT guest will support it until it says it
> > +     * doesn't, instead of assuming it won't support it until it says
> > +     * it does.  Strictly speaking that approach could break for
> > +     * guests which don't make a CAS call, but those are so old we
> > +     * don't care about them.  Without that assumption we'd have to
> > +     * make at least a temporary allocation of an HPT sized for max
> > +     * memory, which could be impossibly difficult under KVM HV if
> > +     * maxram is large.
> > +     */
> >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > -        || (spapr->cas_reboot
> > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> >      } else {
> >          uint64_t current_ram_size;
> > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> >      return 0;
> >  }
> >  
> > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > +{
> > +    spapr->patb_entry = PATE1_GR;
> > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > +}
> > +
> >  static void spapr_machine_reset(MachineState *machine)
> >  {
> >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> >           * without a HPT because KVM will start them in radix mode.
> >           * Set the GR bit in PATE so that we know there is no HPT.
> >           */
> > -        spapr->patb_entry = PATE1_GR;
> > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > -    } else {
> > -        spapr_setup_hpt_and_vrma(spapr);
> > +        spapr_reset_patb_entry(spapr);
> >      }
> >  
> >      qemu_devices_reset();
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >      bool raw_mode_supported = false;
> >      bool guest_xive;
> >      CPUState *cs;
> > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> >  
> >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> >      CPU_FOREACH(cs) {
> > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >  
> >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> >  
> > -    /*
> > -     * HPT resizing is a bit of a special case, because when enabled
> > -     * we assume an HPT guest will support it until it says it
> > -     * doesn't, instead of assuming it won't support it until it says
> > -     * it does.  Strictly speaking that approach could break for
> > -     * guests which don't make a CAS call, but those are so old we
> > -     * don't care about them.  Without that assumption we'd have to
> > -     * make at least a temporary allocation of an HPT sized for max
> > -     * memory, which could be impossibly difficult under KVM HV if
> > -     * maxram is large.
> > -     */
> > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > -
> > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > -            error_report(
> > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > -            exit(1);
> > -        }
> > -
> > -        if (spapr->htab_shift < maxshift) {
> > -            /* Guest doesn't know about HPT resizing, so we
> > -             * pre-emptively resize for the maximum permitted RAM.  At
> > -             * the point this is called, nothing should have been
> > -             * entered into the existing HPT */
> > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > -            push_sregs_to_kvm_pr(spapr);
> > -        }
> > -    }
> > -
> >      /* NOTE: there are actually a number of ov5 bits where input from the
> >       * guest is always zero, and the platform/QEMU enables them independently
> >       * of guest input. To model these properly we'd want some sort of mask,
> > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >              error_report("Guest requested unavailable MMU mode (hash).");
> >              exit(EXIT_FAILURE);
> >          }
> > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > +            error_report(
> > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > +            exit(1);
> > +        }
> >      }
> >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> >      spapr_ovec_cleanup(ov1_guest);
> > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> >          void *fdt;
> >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> >  
> > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > -         * (because the guest isn't going to use radix) then set it up here. */
> > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > -            /* legacy hash or new hash: */
> > -            spapr_setup_hpt_and_vrma(spapr);
> > +        if (!guest_radix) {
> > +            /*
> > +             * Either spapr_machine_reset() did not set up a HPT but one
> > +             * is necessary (because the guest isn't going to use radix),
> > +             * or the guest doesn't know about HPT resizing and we need to
> > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > +             * up here. At the point this is called, nothing should have
> > +             * been entered into the existing HPT.
> > +             */
> > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > +                /* legacy hash or new hash: */
> > +                spapr_setup_hpt_and_vrma(spapr);
> > +                push_sregs_to_kvm_pr(spapr);
> > +            }
> > +        } else {
> > +            spapr_free_hpt(spapr);
> > +            spapr_reset_patb_entry(spapr);
> >          }
> >  
> >          if (fdt_bufsize < sizeof(hdr)) {
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 09110961a589..9d88b5596481 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> >  
> >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> >  hwaddr spapr_get_rtas_addr(void);
> > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> >  #endif /* HW_SPAPR_H */
> > 
> 

Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by David Gibson 4 years, 2 months ago
On Fri, Feb 14, 2020 at 07:19:00PM +0100, Greg Kurz wrote:
> On Fri, 14 Feb 2020 09:28:35 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > > Until the CAS negotiation is over, an HPT can be allocated on three
> > > different paths:
> > > 
> > > 1) during machine reset if the host doesn't support radix,
> > > 
> > > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> > >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > > 
> > > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> > >    we're currently configured for radix.
> > > 
> > > Depending on the various combinations of host or guest MMU support,
> > > HPT resizing guest support and the possibility of a CAS reboot, it
> > > is quite hard to know which of these allocates the HPT that will
> > > be ultimately used by the guest that wants to do hash. Also, some of
> > > them have bugs:
> > > 
> > > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> > >   and thus doesn't update the VRMA size, even though we've just extended
> > >   the HPT. Not sure what issues this can cause,
> > > 
> > > - 3) doesn't check for HPT resizing support and will always allocate a
> > >   small HPT based on the initial RAM size. This caps the total amount of
> > >   RAM the guest can see, especially if maxram is much higher than the
> > >   initial ram.
> > > 
> > > We only support guests that do CAS and we already assume that the HPT
> > > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > > seems reasonable to only allocate the HPT at the end of CAS, when no
> > > CAS reboot was requested.
> > > 
> > > Consolidate the logic so that we only create the HPT during 3), ie.
> > > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > > is taken into account. This fixes the radix->hash transition for
> > > all cases.
> > 
> > Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> > need the HPT at all times there, or there's nowhere to put VRMA
> > entries, so we can't run even in real mode.
> > 
> 
> Well it happens to be working anyway because KVM automatically
> creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
> if QEMU didn't do so already... Would a comment to emphasize this
> be enough or do you prefer I don't drop the HPT allocation currently
> performed at machine reset ?

Relying on the automatic allocation is not a good idea.  With host
kernels before HPT resizing, once that automatic allocation happens,
we can't change the HPT size *at all*, even with a reset or CAS.

So, yes, the current code is annoyingly complex, but it's that way for
a reason.

> > > The guest can theoretically call CAS several times, without a CAS
> > > reboot in between. Linux guests don't do that, but better safe than
> > > sorry, let's ensure we can also handle the symmetrical hash->radix
> > > transition correctly: free the HPT and set the GR bit in PATE.
> > > An helper is introduced for the latter since this is already what
> > > we do during machine reset when going for radix.
> > > 
> > > As a bonus, this removes one user of spapr->cas_reboot, which we
> > > want to get rid of in the future.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > ---
> > >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> > >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> > >  include/hw/ppc/spapr.h |    1 +
> > >  3 files changed, 44 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 828e2cc1359a..88bc0e4e3ca1 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> > >  {
> > >      int hpt_shift;
> > >  
> > > +    /*
> > > +     * HPT resizing is a bit of a special case, because when enabled
> > > +     * we assume an HPT guest will support it until it says it
> > > +     * doesn't, instead of assuming it won't support it until it says
> > > +     * it does.  Strictly speaking that approach could break for
> > > +     * guests which don't make a CAS call, but those are so old we
> > > +     * don't care about them.  Without that assumption we'd have to
> > > +     * make at least a temporary allocation of an HPT sized for max
> > > +     * memory, which could be impossibly difficult under KVM HV if
> > > +     * maxram is large.
> > > +     */
> > >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > > -        || (spapr->cas_reboot
> > > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> > >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > >      } else {
> > >          uint64_t current_ram_size;
> > > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> > >      return 0;
> > >  }
> > >  
> > > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > > +{
> > > +    spapr->patb_entry = PATE1_GR;
> > > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > +}
> > > +
> > >  static void spapr_machine_reset(MachineState *machine)
> > >  {
> > >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> > >           * without a HPT because KVM will start them in radix mode.
> > >           * Set the GR bit in PATE so that we know there is no HPT.
> > >           */
> > > -        spapr->patb_entry = PATE1_GR;
> > > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > -    } else {
> > > -        spapr_setup_hpt_and_vrma(spapr);
> > > +        spapr_reset_patb_entry(spapr);
> > >      }
> > >  
> > >      qemu_devices_reset();
> > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > > --- a/hw/ppc/spapr_hcall.c
> > > +++ b/hw/ppc/spapr_hcall.c
> > > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >      bool raw_mode_supported = false;
> > >      bool guest_xive;
> > >      CPUState *cs;
> > > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > >  
> > >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> > >      CPU_FOREACH(cs) {
> > > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >  
> > >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> > >  
> > > -    /*
> > > -     * HPT resizing is a bit of a special case, because when enabled
> > > -     * we assume an HPT guest will support it until it says it
> > > -     * doesn't, instead of assuming it won't support it until it says
> > > -     * it does.  Strictly speaking that approach could break for
> > > -     * guests which don't make a CAS call, but those are so old we
> > > -     * don't care about them.  Without that assumption we'd have to
> > > -     * make at least a temporary allocation of an HPT sized for max
> > > -     * memory, which could be impossibly difficult under KVM HV if
> > > -     * maxram is large.
> > > -     */
> > > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > -
> > > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > > -            error_report(
> > > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > -            exit(1);
> > > -        }
> > > -
> > > -        if (spapr->htab_shift < maxshift) {
> > > -            /* Guest doesn't know about HPT resizing, so we
> > > -             * pre-emptively resize for the maximum permitted RAM.  At
> > > -             * the point this is called, nothing should have been
> > > -             * entered into the existing HPT */
> > > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > > -            push_sregs_to_kvm_pr(spapr);
> > > -        }
> > > -    }
> > > -
> > >      /* NOTE: there are actually a number of ov5 bits where input from the
> > >       * guest is always zero, and the platform/QEMU enables them independently
> > >       * of guest input. To model these properly we'd want some sort of mask,
> > > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >              error_report("Guest requested unavailable MMU mode (hash).");
> > >              exit(EXIT_FAILURE);
> > >          }
> > > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > +            error_report(
> > > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > +            exit(1);
> > > +        }
> > >      }
> > >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> > >      spapr_ovec_cleanup(ov1_guest);
> > > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > >          void *fdt;
> > >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> > >  
> > > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > > -         * (because the guest isn't going to use radix) then set it up here. */
> > > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > > -            /* legacy hash or new hash: */
> > > -            spapr_setup_hpt_and_vrma(spapr);
> > > +        if (!guest_radix) {
> > > +            /*
> > > +             * Either spapr_machine_reset() did not set up a HPT but one
> > > +             * is necessary (because the guest isn't going to use radix),
> > > +             * or the guest doesn't know about HPT resizing and we need to
> > > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > > +             * up here. At the point this is called, nothing should have
> > > +             * been entered into the existing HPT.
> > > +             */
> > > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > > +                /* legacy hash or new hash: */
> > > +                spapr_setup_hpt_and_vrma(spapr);
> > > +                push_sregs_to_kvm_pr(spapr);
> > > +            }
> > > +        } else {
> > > +            spapr_free_hpt(spapr);
> > > +            spapr_reset_patb_entry(spapr);
> > >          }
> > >  
> > >          if (fdt_bufsize < sizeof(hdr)) {
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index 09110961a589..9d88b5596481 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> > >  
> > >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > >  hwaddr spapr_get_rtas_addr(void);
> > > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> > >  #endif /* HW_SPAPR_H */
> > > 
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by Greg Kurz 4 years, 1 month ago
On Wed, 19 Feb 2020 10:21:05 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Feb 14, 2020 at 07:19:00PM +0100, Greg Kurz wrote:
> > On Fri, 14 Feb 2020 09:28:35 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > > > Until the CAS negotiation is over, an HPT can be allocated on three
> > > > different paths:
> > > > 
> > > > 1) during machine reset if the host doesn't support radix,
> > > > 
> > > > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> > > >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > > > 
> > > > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> > > >    we're currently configured for radix.
> > > > 
> > > > Depending on the various combinations of host or guest MMU support,
> > > > HPT resizing guest support and the possibility of a CAS reboot, it
> > > > is quite hard to know which of these allocates the HPT that will
> > > > be ultimately used by the guest that wants to do hash. Also, some of
> > > > them have bugs:
> > > > 
> > > > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> > > >   and thus doesn't update the VRMA size, even though we've just extended
> > > >   the HPT. Not sure what issues this can cause,
> > > > 
> > > > - 3) doesn't check for HPT resizing support and will always allocate a
> > > >   small HPT based on the initial RAM size. This caps the total amount of
> > > >   RAM the guest can see, especially if maxram is much higher than the
> > > >   initial ram.
> > > > 
> > > > We only support guests that do CAS and we already assume that the HPT
> > > > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > > > seems reasonable to only allocate the HPT at the end of CAS, when no
> > > > CAS reboot was requested.
> > > > 
> > > > Consolidate the logic so that we only create the HPT during 3), ie.
> > > > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > > > is taken into account. This fixes the radix->hash transition for
> > > > all cases.
> > > 
> > > Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> > > need the HPT at all times there, or there's nowhere to put VRMA
> > > entries, so we can't run even in real mode.
> > > 
> > 
> > Well it happens to be working anyway because KVM automatically
> > creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
> > if QEMU didn't do so already... Would a comment to emphasize this
> > be enough or do you prefer I don't drop the HPT allocation currently
> > performed at machine reset ?
> 
> Relying on the automatic allocation is not a good idea.  With host
> kernels before HPT resizing, once that automatic allocation happens,
> we can't change the HPT size *at all*, even with a reset or CAS.
> 

Ah ok I see. With these older host kernels, we need QEMU to allocate the
HPT to fit ms->maxram_size, which KVM doesn't know about, or we'll have
troubles with VMs that would need a bigger HPT. And I guess we want to
support bigger VMs with pre-4.11 host kernels.

> So, yes, the current code is annoyingly complex, but it's that way for
> a reason.
> 

My motivation here is to get rid of CAS reboot... it definitely needs more
thinking on my side.

> > > > The guest can theoretically call CAS several times, without a CAS
> > > > reboot in between. Linux guests don't do that, but better safe than
> > > > sorry, let's ensure we can also handle the symmetrical hash->radix
> > > > transition correctly: free the HPT and set the GR bit in PATE.
> > > > An helper is introduced for the latter since this is already what
> > > > we do during machine reset when going for radix.
> > > > 
> > > > As a bonus, this removes one user of spapr->cas_reboot, which we
> > > > want to get rid of in the future.
> > > > 
> > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > ---
> > > >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> > > >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> > > >  include/hw/ppc/spapr.h |    1 +
> > > >  3 files changed, 44 insertions(+), 41 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index 828e2cc1359a..88bc0e4e3ca1 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> > > >  {
> > > >      int hpt_shift;
> > > >  
> > > > +    /*
> > > > +     * HPT resizing is a bit of a special case, because when enabled
> > > > +     * we assume an HPT guest will support it until it says it
> > > > +     * doesn't, instead of assuming it won't support it until it says
> > > > +     * it does.  Strictly speaking that approach could break for
> > > > +     * guests which don't make a CAS call, but those are so old we
> > > > +     * don't care about them.  Without that assumption we'd have to
> > > > +     * make at least a temporary allocation of an HPT sized for max
> > > > +     * memory, which could be impossibly difficult under KVM HV if
> > > > +     * maxram is large.
> > > > +     */
> > > >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > > > -        || (spapr->cas_reboot
> > > > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > > > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> > > >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > >      } else {
> > > >          uint64_t current_ram_size;
> > > > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> > > >      return 0;
> > > >  }
> > > >  
> > > > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > > > +{
> > > > +    spapr->patb_entry = PATE1_GR;
> > > > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > > +}
> > > > +
> > > >  static void spapr_machine_reset(MachineState *machine)
> > > >  {
> > > >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > > > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> > > >           * without a HPT because KVM will start them in radix mode.
> > > >           * Set the GR bit in PATE so that we know there is no HPT.
> > > >           */
> > > > -        spapr->patb_entry = PATE1_GR;
> > > > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > > -    } else {
> > > > -        spapr_setup_hpt_and_vrma(spapr);
> > > > +        spapr_reset_patb_entry(spapr);
> > > >      }
> > > >  
> > > >      qemu_devices_reset();
> > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > > > --- a/hw/ppc/spapr_hcall.c
> > > > +++ b/hw/ppc/spapr_hcall.c
> > > > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > >      bool raw_mode_supported = false;
> > > >      bool guest_xive;
> > > >      CPUState *cs;
> > > > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > >  
> > > >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> > > >      CPU_FOREACH(cs) {
> > > > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > >  
> > > >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> > > >  
> > > > -    /*
> > > > -     * HPT resizing is a bit of a special case, because when enabled
> > > > -     * we assume an HPT guest will support it until it says it
> > > > -     * doesn't, instead of assuming it won't support it until it says
> > > > -     * it does.  Strictly speaking that approach could break for
> > > > -     * guests which don't make a CAS call, but those are so old we
> > > > -     * don't care about them.  Without that assumption we'd have to
> > > > -     * make at least a temporary allocation of an HPT sized for max
> > > > -     * memory, which could be impossibly difficult under KVM HV if
> > > > -     * maxram is large.
> > > > -     */
> > > > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > > -
> > > > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > > > -            error_report(
> > > > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > > -            exit(1);
> > > > -        }
> > > > -
> > > > -        if (spapr->htab_shift < maxshift) {
> > > > -            /* Guest doesn't know about HPT resizing, so we
> > > > -             * pre-emptively resize for the maximum permitted RAM.  At
> > > > -             * the point this is called, nothing should have been
> > > > -             * entered into the existing HPT */
> > > > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > > > -            push_sregs_to_kvm_pr(spapr);
> > > > -        }
> > > > -    }
> > > > -
> > > >      /* NOTE: there are actually a number of ov5 bits where input from the
> > > >       * guest is always zero, and the platform/QEMU enables them independently
> > > >       * of guest input. To model these properly we'd want some sort of mask,
> > > > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > >              error_report("Guest requested unavailable MMU mode (hash).");
> > > >              exit(EXIT_FAILURE);
> > > >          }
> > > > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > > > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > > +            error_report(
> > > > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > > +            exit(1);
> > > > +        }
> > > >      }
> > > >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> > > >      spapr_ovec_cleanup(ov1_guest);
> > > > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > >          void *fdt;
> > > >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> > > >  
> > > > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > > > -         * (because the guest isn't going to use radix) then set it up here. */
> > > > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > > > -            /* legacy hash or new hash: */
> > > > -            spapr_setup_hpt_and_vrma(spapr);
> > > > +        if (!guest_radix) {
> > > > +            /*
> > > > +             * Either spapr_machine_reset() did not set up a HPT but one
> > > > +             * is necessary (because the guest isn't going to use radix),
> > > > +             * or the guest doesn't know about HPT resizing and we need to
> > > > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > > > +             * up here. At the point this is called, nothing should have
> > > > +             * been entered into the existing HPT.
> > > > +             */
> > > > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > > > +                /* legacy hash or new hash: */
> > > > +                spapr_setup_hpt_and_vrma(spapr);
> > > > +                push_sregs_to_kvm_pr(spapr);
> > > > +            }
> > > > +        } else {
> > > > +            spapr_free_hpt(spapr);
> > > > +            spapr_reset_patb_entry(spapr);
> > > >          }
> > > >  
> > > >          if (fdt_bufsize < sizeof(hdr)) {
> > > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > > index 09110961a589..9d88b5596481 100644
> > > > --- a/include/hw/ppc/spapr.h
> > > > +++ b/include/hw/ppc/spapr.h
> > > > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> > > >  
> > > >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > > >  hwaddr spapr_get_rtas_addr(void);
> > > > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> > > >  #endif /* HW_SPAPR_H */
> > > > 
> > > 
> > 
> 
> 
> 

Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Posted by David Gibson 4 years, 1 month ago
On Mon, Feb 24, 2020 at 12:18:27PM +0100, Greg Kurz wrote:
> On Wed, 19 Feb 2020 10:21:05 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Feb 14, 2020 at 07:19:00PM +0100, Greg Kurz wrote:
> > > On Fri, 14 Feb 2020 09:28:35 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > 
> > > > On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > > > > Until the CAS negotiation is over, an HPT can be allocated on three
> > > > > different paths:
> > > > > 
> > > > > 1) during machine reset if the host doesn't support radix,
> > > > > 
> > > > > 2) during CAS if the guest wants hash and doesn't support HPT resizing,
> > > > >    in which case we pre-emptively resize the HPT to accomodate maxram,
> > > > > 
> > > > > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> > > > >    we're currently configured for radix.
> > > > > 
> > > > > Depending on the various combinations of host or guest MMU support,
> > > > > HPT resizing guest support and the possibility of a CAS reboot, it
> > > > > is quite hard to know which of these allocates the HPT that will
> > > > > be ultimately used by the guest that wants to do hash. Also, some of
> > > > > them have bugs:
> > > > > 
> > > > > - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
> > > > >   and thus doesn't update the VRMA size, even though we've just extended
> > > > >   the HPT. Not sure what issues this can cause,
> > > > > 
> > > > > - 3) doesn't check for HPT resizing support and will always allocate a
> > > > >   small HPT based on the initial RAM size. This caps the total amount of
> > > > >   RAM the guest can see, especially if maxram is much higher than the
> > > > >   initial ram.
> > > > > 
> > > > > We only support guests that do CAS and we already assume that the HPT
> > > > > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > > > > seems reasonable to only allocate the HPT at the end of CAS, when no
> > > > > CAS reboot was requested.
> > > > > 
> > > > > Consolidate the logic so that we only create the HPT during 3), ie.
> > > > > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > > > > is taken into account. This fixes the radix->hash transition for
> > > > > all cases.
> > > > 
> > > > Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> > > > need the HPT at all times there, or there's nowhere to put VRMA
> > > > entries, so we can't run even in real mode.
> > > > 
> > > 
> > > Well it happens to be working anyway because KVM automatically
> > > creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
> > > if QEMU didn't do so already... Would a comment to emphasize this
> > > be enough or do you prefer I don't drop the HPT allocation currently
> > > performed at machine reset ?
> > 
> > Relying on the automatic allocation is not a good idea.  With host
> > kernels before HPT resizing, once that automatic allocation happens,
> > we can't change the HPT size *at all*, even with a reset or CAS.
> > 
> 
> Ah ok I see. With these older host kernels, we need QEMU to allocate the
> HPT to fit ms->maxram_size, which KVM doesn't know about, or we'll have
> troubles with VMs that would need a bigger HPT.

Exactly.

> And I guess we want to
> support bigger VMs with pre-4.11 host kernels.

Well, with the usual sizing rules, a 16MiB HPT only supports a 2GiB
guest, so it doesn't have to be a particularly large VM to trip this.

> 
> > So, yes, the current code is annoyingly complex, but it's that way for
> > a reason.
> > 
> 
> My motivation here is to get rid of CAS reboot... it definitely needs more
> thinking on my side.

Yeah, I think so.

> 
> > > > > The guest can theoretically call CAS several times, without a CAS
> > > > > reboot in between. Linux guests don't do that, but better safe than
> > > > > sorry, let's ensure we can also handle the symmetrical hash->radix
> > > > > transition correctly: free the HPT and set the GR bit in PATE.
> > > > > An helper is introduced for the latter since this is already what
> > > > > we do during machine reset when going for radix.
> > > > > 
> > > > > As a bonus, this removes one user of spapr->cas_reboot, which we
> > > > > want to get rid of in the future.
> > > > > 
> > > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > > ---
> > > > >  hw/ppc/spapr.c         |   25 +++++++++++++++-----
> > > > >  hw/ppc/spapr_hcall.c   |   59 ++++++++++++++++++++----------------------------
> > > > >  include/hw/ppc/spapr.h |    1 +
> > > > >  3 files changed, 44 insertions(+), 41 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 828e2cc1359a..88bc0e4e3ca1 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> > > > >  {
> > > > >      int hpt_shift;
> > > > >  
> > > > > +    /*
> > > > > +     * HPT resizing is a bit of a special case, because when enabled
> > > > > +     * we assume an HPT guest will support it until it says it
> > > > > +     * doesn't, instead of assuming it won't support it until it says
> > > > > +     * it does.  Strictly speaking that approach could break for
> > > > > +     * guests which don't make a CAS call, but those are so old we
> > > > > +     * don't care about them.  Without that assumption we'd have to
> > > > > +     * make at least a temporary allocation of an HPT sized for max
> > > > > +     * memory, which could be impossibly difficult under KVM HV if
> > > > > +     * maxram is large.
> > > > > +     */
> > > > >      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> > > > > -        || (spapr->cas_reboot
> > > > > -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> > > > > +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
> > > > >          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > > >      } else {
> > > > >          uint64_t current_ram_size;
> > > > > @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void *opaque)
> > > > >      return 0;
> > > > >  }
> > > > >  
> > > > > +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> > > > > +{
> > > > > +    spapr->patb_entry = PATE1_GR;
> > > > > +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > > > +}
> > > > > +
> > > > >  static void spapr_machine_reset(MachineState *machine)
> > > > >  {
> > > > >      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> > > > > @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
> > > > >           * without a HPT because KVM will start them in radix mode.
> > > > >           * Set the GR bit in PATE so that we know there is no HPT.
> > > > >           */
> > > > > -        spapr->patb_entry = PATE1_GR;
> > > > > -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> > > > > -    } else {
> > > > > -        spapr_setup_hpt_and_vrma(spapr);
> > > > > +        spapr_reset_patb_entry(spapr);
> > > > >      }
> > > > >  
> > > > >      qemu_devices_reset();
> > > > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > > > > index b8bb66b5c0d4..57ddf0fa6d05 100644
> > > > > --- a/hw/ppc/spapr_hcall.c
> > > > > +++ b/hw/ppc/spapr_hcall.c
> > > > > @@ -1677,6 +1677,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > > >      bool raw_mode_supported = false;
> > > > >      bool guest_xive;
> > > > >      CPUState *cs;
> > > > > +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > > >  
> > > > >      /* CAS is supposed to be called early when only the boot vCPU is active. */
> > > > >      CPU_FOREACH(cs) {
> > > > > @@ -1739,36 +1740,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > > >  
> > > > >      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
> > > > >  
> > > > > -    /*
> > > > > -     * HPT resizing is a bit of a special case, because when enabled
> > > > > -     * we assume an HPT guest will support it until it says it
> > > > > -     * doesn't, instead of assuming it won't support it until it says
> > > > > -     * it does.  Strictly speaking that approach could break for
> > > > > -     * guests which don't make a CAS call, but those are so old we
> > > > > -     * don't care about them.  Without that assumption we'd have to
> > > > > -     * make at least a temporary allocation of an HPT sized for max
> > > > > -     * memory, which could be impossibly difficult under KVM HV if
> > > > > -     * maxram is large.
> > > > > -     */
> > > > > -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > > > -        int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> > > > > -
> > > > > -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> > > > > -            error_report(
> > > > > -                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > > > -            exit(1);
> > > > > -        }
> > > > > -
> > > > > -        if (spapr->htab_shift < maxshift) {
> > > > > -            /* Guest doesn't know about HPT resizing, so we
> > > > > -             * pre-emptively resize for the maximum permitted RAM.  At
> > > > > -             * the point this is called, nothing should have been
> > > > > -             * entered into the existing HPT */
> > > > > -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> > > > > -            push_sregs_to_kvm_pr(spapr);
> > > > > -        }
> > > > > -    }
> > > > > -
> > > > >      /* NOTE: there are actually a number of ov5 bits where input from the
> > > > >       * guest is always zero, and the platform/QEMU enables them independently
> > > > >       * of guest input. To model these properly we'd want some sort of mask,
> > > > > @@ -1806,6 +1777,12 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > > >              error_report("Guest requested unavailable MMU mode (hash).");
> > > > >              exit(EXIT_FAILURE);
> > > > >          }
> > > > > +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> > > > > +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> > > > > +            error_report(
> > > > > +                "h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
> > > > > +            exit(1);
> > > > > +        }
> > > > >      }
> > > > >      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
> > > > >      spapr_ovec_cleanup(ov1_guest);
> > > > > @@ -1838,11 +1815,23 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> > > > >          void *fdt;
> > > > >          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> > > > >  
> > > > > -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> > > > > -         * (because the guest isn't going to use radix) then set it up here. */
> > > > > -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> > > > > -            /* legacy hash or new hash: */
> > > > > -            spapr_setup_hpt_and_vrma(spapr);
> > > > > +        if (!guest_radix) {
> > > > > +            /*
> > > > > +             * Either spapr_machine_reset() did not set up a HPT but one
> > > > > +             * is necessary (because the guest isn't going to use radix),
> > > > > +             * or the guest doesn't know about HPT resizing and we need to
> > > > > +             * pre-emptively resize for the maximum permitted RAM. Set it
> > > > > +             * up here. At the point this is called, nothing should have
> > > > > +             * been entered into the existing HPT.
> > > > > +             */
> > > > > +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < maxshift) {
> > > > > +                /* legacy hash or new hash: */
> > > > > +                spapr_setup_hpt_and_vrma(spapr);
> > > > > +                push_sregs_to_kvm_pr(spapr);
> > > > > +            }
> > > > > +        } else {
> > > > > +            spapr_free_hpt(spapr);
> > > > > +            spapr_reset_patb_entry(spapr);
> > > > >          }
> > > > >  
> > > > >          if (fdt_bufsize < sizeof(hdr)) {
> > > > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > > > index 09110961a589..9d88b5596481 100644
> > > > > --- a/include/hw/ppc/spapr.h
> > > > > +++ b/include/hw/ppc/spapr.h
> > > > > @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
> > > > >  
> > > > >  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
> > > > >  hwaddr spapr_get_rtas_addr(void);
> > > > > +void spapr_reset_patb_entry(SpaprMachineState *spapr);
> > > > >  #endif /* HW_SPAPR_H */
> > > > > 
> > > > 
> > > 
> > 
> > 
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
[PATCH] spapr: Make spapr_reallocate_hpt() static
Posted by Greg Kurz 4 years, 2 months ago
Its only users are in spapr.c.

Signed-off-by: Greg Kurz <groug@kaod.org>
---

I realized just after posting that spapr_reallocate_hpt() didn't need to
be extern anymore. Maybe worth squashing this into the "spapr: Rework
hash<->radix transitions at CAS" patch ?
---
 hw/ppc/spapr.c         |    4 ++--
 include/hw/ppc/spapr.h |    2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 88bc0e4e3ca1..1537866533cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1519,8 +1519,8 @@ void spapr_free_hpt(SpaprMachineState *spapr)
     close_htab_fd(spapr);
 }
 
-void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
-                          Error **errp)
+static void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
+                                 Error **errp)
 {
     long rc;
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9d88b5596481..0be8abc70744 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -821,8 +821,6 @@ void spapr_hotplug_req_add_by_count_indexed(SpaprDrcType drc_type,
 void spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type,
                                                uint32_t count, uint32_t index);
 int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
-void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
-                          Error **errp);
 void spapr_clear_pending_events(SpaprMachineState *spapr);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,