arch/powerpc/configs/powernv_defconfig | 2 +- arch/powerpc/configs/ppc64_defconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
higher tick rate due to high-resolution timers and concerns about timer
interrupt overhead and cascading effects in the timer wheel.
However, improvements have been made to the timer wheel algorithm since
then, particularly in eliminating cascading effects at the cost of minor
timekeeping inaccuracies. More details are available here
https://lwn.net/Articles/646950/. This removes the original concern about
cascading, and the reliance on high-resolution timers is not applicable
to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
With the introduction of the EEVDF scheduler, users can specify custom
slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
(10ms tick interval), base_slice is ineffective. Workloads like stress-ng
that do not voluntarily yield the CPU run for ~10ms before switching out.
Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
task latency, but this effect is lost due to the coarse 10ms tick.
By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
and user-defined slices work as expected. Benchmark results support this
change:
Latency improvements in schbench with EEVDF under stress-ng-induced noise:
Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
--------------------------------------------------------------------
EEVDF 1000 No 0.30x
EEVDF 1000 2 ms 0.29x
EEVDF (default) 100 No 1.00x
Switching to HZ=1000 reduces the 99th percentile latency in schbench by
~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
schbench gets CPU time sooner, reducing its latency.
Daytrader Performance:
Daytrader results show minor variation within standard deviation,
indicating no significant regression.
Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
--------------------------------------------------------------------------
30 u, 1 i +3.01% (1.62%)
60 u, 1 i +1.46% (2.69%)
90 u, 1 i –1.33% (3.09%)
30 u, 2 i -1.20% (1.71%)
30 u, 3 i –0.07% (1.33%)
Avg. Response Time: No Change (=)
pgbench select queries:
Metric 1000HZ vs 100HZ (Std Dev%)
------------------------------------------------------------------
Average TPS Change +2.16% (1.27%)
Average Latency Change –2.21% (1.21%)
Average TPS: Higher the better
Average Latency: Lower the better
pgbench shows both throughput and latency improvements beyond standard
deviation.
Given these results and the improvements in timer wheel implementation,
increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
base_slice and allows fine-tuned scheduling for latency-sensitive
workloads.
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
---
arch/powerpc/configs/powernv_defconfig | 2 +-
arch/powerpc/configs/ppc64_defconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index 6b6d7467fecf..8abf17d26b3a 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_IDLE=y
-CONFIG_HZ_100=y
+CONFIG_HZ_1000=y
CONFIG_BINFMT_MISC=m
CONFIG_PPC_TRANSACTIONAL_MEM=y
CONFIG_PPC_UV=y
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index 5fa154185efa..45d437e4c62e 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_PMAC64=y
-CONFIG_HZ_100=y
+CONFIG_HZ_1000=y
CONFIG_PPC_TRANSACTIONAL_MEM=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
--
2.47.0
Hi Maddy,
Ping.
Any thoughts on this? Can it be picked up?
Thanks,
Madadi Vineeth Reddy
On 30/03/25 13:17, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
* Madadi Vineeth Reddy <vineethr@linux.ibm.com> [2025-03-30 13:17:34]:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Good work Vineeth,
As pointed by you, the base slice is 3ms and having base slice as a multiple
of tick will help. The numbers also support this change.
Looks good to me.
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
--
Thanks and Regards
Srikar Dronamraju
On Sun, Mar 30, 2025 at 01:17:34PM +0530, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
NIT: default value on top would be a little less confusing.
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
> --
> 2.47.0
>
LGTM
Reviewed-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com>
On 3/30/25 13:17, Madadi Vineeth Reddy wrote:
> Commit 030bdc3fd080 ("powerpc/defconfigs: Set HZ=100 on pseries and ppc64
> defconfigs") lowered CONFIG_HZ from 250 to 100, citing reduced need for a
> higher tick rate due to high-resolution timers and concerns about timer
> interrupt overhead and cascading effects in the timer wheel.
>
> However, improvements have been made to the timer wheel algorithm since
> then, particularly in eliminating cascading effects at the cost of minor
> timekeeping inaccuracies. More details are available here
> https://lwn.net/Articles/646950/. This removes the original concern about
> cascading, and the reliance on high-resolution timers is not applicable
> to the scheduler, which still depends on periodic ticks set by CONFIG_HZ.
>
> With the introduction of the EEVDF scheduler, users can specify custom
> slices for workloads. The default base_slice is 3ms, but with CONFIG_HZ=100
> (10ms tick interval), base_slice is ineffective. Workloads like stress-ng
> that do not voluntarily yield the CPU run for ~10ms before switching out.
> Additionally, setting a custom slice below 3ms (e.g., 2ms) should lower
> task latency, but this effect is lost due to the coarse 10ms tick.
>
It makes sense since base_slice is the only tunable available under EEVDF.
This would allow the users to make use of it.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> By increasing CONFIG_HZ to 1000 (1ms tick), base_slice is properly honored,
> and user-defined slices work as expected. Benchmark results support this
> change:
>
> Latency improvements in schbench with EEVDF under stress-ng-induced noise:
>
> Scheduler CONFIG_HZ Custom Slice 99th Percentile Latency (µs)
> --------------------------------------------------------------------
> EEVDF 1000 No 0.30x
> EEVDF 1000 2 ms 0.29x
> EEVDF (default) 100 No 1.00x
>
> Switching to HZ=1000 reduces the 99th percentile latency in schbench by
> ~70%. This improvement occurs because, with HZ=1000, stress-ng tasks run
> for ~3ms before yielding, compared to ~10ms with HZ=100. As a result,
> schbench gets CPU time sooner, reducing its latency.
>
> Daytrader Performance:
>
> Daytrader results show minor variation within standard deviation,
> indicating no significant regression.
>
> Workload (Users/Instances) Throughput 1000HZ vs 100HZ (Std Dev%)
> --------------------------------------------------------------------------
> 30 u, 1 i +3.01% (1.62%)
> 60 u, 1 i +1.46% (2.69%)
> 90 u, 1 i –1.33% (3.09%)
> 30 u, 2 i -1.20% (1.71%)
> 30 u, 3 i –0.07% (1.33%)
>
> Avg. Response Time: No Change (=)
>
> pgbench select queries:
>
> Metric 1000HZ vs 100HZ (Std Dev%)
> ------------------------------------------------------------------
> Average TPS Change +2.16% (1.27%)
> Average Latency Change –2.21% (1.21%)
>
> Average TPS: Higher the better
> Average Latency: Lower the better
>
> pgbench shows both throughput and latency improvements beyond standard
> deviation.
>
> Given these results and the improvements in timer wheel implementation,
> increasing CONFIG_HZ to 1000 ensures that powerpc benefits from EEVDF’s
> base_slice and allows fine-tuned scheduling for latency-sensitive
> workloads.
>
> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
> ---
> arch/powerpc/configs/powernv_defconfig | 2 +-
> arch/powerpc/configs/ppc64_defconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
> index 6b6d7467fecf..8abf17d26b3a 100644
> --- a/arch/powerpc/configs/powernv_defconfig
> +++ b/arch/powerpc/configs/powernv_defconfig
> @@ -46,7 +46,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_IDLE=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_BINFMT_MISC=m
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_PPC_UV=y
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 5fa154185efa..45d437e4c62e 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -57,7 +57,7 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> CONFIG_CPU_FREQ_GOV_USERSPACE=y
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
> CONFIG_CPU_FREQ_PMAC64=y
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
> CONFIG_PPC_TRANSACTIONAL_MEM=y
> CONFIG_KEXEC=y
> CONFIG_KEXEC_FILE=y
© 2016 - 2025 Red Hat, Inc.