[Xen-devel] [PATCH 0/4] x86: reduce errors in frequency calculations / unrelated cleanup

Jan Beulich posted 4 patches 4 years, 1 month ago
Only 0 patches received!
[Xen-devel] [PATCH 0/4] x86: reduce errors in frequency calculations / unrelated cleanup
Posted by Jan Beulich 4 years, 1 month ago
While looking into ways to increase the accuracy of the clock speeds
we work with (in particular by possibly obtaining information from
hardware rather than measuring), I first of all noticed some
avoidable rounding errors in some of our calculations. That's what
patches 2 and 3 are intended to deal with. Patch 1 is preparatory
cleanup, while patch 4 simply addresses something I got annoyed by
yet another time while doing this investigation.

1: APIC: adjust types and comments in calibrate_APIC_clock()
2: time: reduce rounding errors in calculations
3: APIC: reduce rounding errors in calculations
4: APIC: restrict certain messages to BSP

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/4] x86/APIC: adjust types and comments in calibrate_APIC_clock()
Posted by Jan Beulich 4 years, 1 month ago
First and foremost the comment talking about potential underflow being
taken care of by using signed long type variables was true only on
32-bit, which we've not been supporting for quite some time. Drop the
comment and change all involved types to unsigned. Take the opportunity
and also replace bus_cycle's fixed width type.

Additionally there's no point using an "arbitrary (but long enough)
timeout" here. Just use the maximum possible value; Linux does so too,
just as an additional data point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1207,21 +1207,19 @@ static void wait_tick_pvh(void)
 static int __init calibrate_APIC_clock(void)
 {
     unsigned long long t1, t2;
-    long tt1, tt2;
-    long result;
-    int i;
+    unsigned long tt1, tt2, result;
+    unsigned int i;
     unsigned long bus_freq; /* KAF: pointer-size avoids compile warns. */
-    u32 bus_cycle;          /* length of one bus cycle in pico-seconds */
-    const int LOOPS = HZ/10;
+    unsigned int bus_cycle; /* length of one bus cycle in pico-seconds */
+    const unsigned int LOOPS = HZ/10;
 
     apic_printk(APIC_VERBOSE, "calibrating APIC timer ...\n");
 
     /*
-     * Put whatever arbitrary (but long enough) timeout
-     * value into the APIC clock, we just want to get the
-     * counter running for calibration.
+     * Setup the APIC counter to maximum. There is no way the lapic
+     * can underflow in the 100ms detection time frame.
      */
-    __setup_APIC_LVTT(1000000000);
+    __setup_APIC_LVTT(0xffffffff);
 
     if ( !xen_guest )
         /*
@@ -1251,14 +1249,6 @@ static int __init calibrate_APIC_clock(v
     tt2 = apic_read(APIC_TMCCT);
     t2 = rdtsc_ordered();
 
-    /*
-     * The APIC bus clock counter is 32 bits only, it
-     * might have overflown, but note that we use signed
-     * longs, thus no extra care needed.
-     *
-     * underflown to be exact, as the timer counts down ;)
-     */
-
     result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
 
     apic_printk(APIC_VERBOSE, "..... CPU clock speed is %ld.%04ld MHz.\n",


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/4] x86/APIC: adjust types and comments in calibrate_APIC_clock()
Posted by Andrew Cooper 4 years, 1 month ago
On 13/03/2020 09:25, Jan Beulich wrote:
> First and foremost the comment talking about potential underflow being
> taken care of by using signed long type variables was true only on
> 32-bit, which we've not been supporting for quite some time. Drop the
> comment and change all involved types to unsigned. Take the opportunity
> and also replace bus_cycle's fixed width type.
>
> Additionally there's no point using an "arbitrary (but long enough)
> timeout" here. Just use the maximum possible value; Linux does so too,
> just as an additional data point.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/4] x86/time: reduce rounding errors in calculations
Posted by Jan Beulich 4 years, 1 month ago
Plain (unsigned) integer division simply truncates the results. The
overall errors are smaller though if we use proper rounding. (Extend
this to the purely cosmetic aspect of time.c's freq_string(), which
before this change I've frequently observed to report e.g. NN.999MHz
HPET clock speeds.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
We could switch at least the first div/rem pair in calibrate_APIC_clock()
to use do_div(), but this would imply switching bus_freq (and then also
result) to unsigned int (as the divisor has to be 32-bit). While I think
there's pretty little risk for bus frequencies to go beyond 4GHz in the
next so many years, I still wasn't certain enough this would be a welcome
change.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1261,7 +1261,9 @@ static int __init calibrate_APIC_clock(v
     /* set up multipliers for accurate timer code */
     bus_freq   = result*HZ;
     bus_cycle  = (u32) (1000000000000LL/bus_freq); /* in pico seconds */
+    bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;
     bus_scale  = (1000*262144)/bus_cycle;
+    bus_scale += ((1000 * 262144) % bus_cycle) * 2 > bus_cycle;
 
     apic_printk(APIC_VERBOSE, "..... bus_scale = %#x\n", bus_scale);
     /* reset APIC to zero timeout value */
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -799,9 +799,9 @@ u64 __init hpet_setup(void)
     hpet_resume(hpet_boot_cfg);
 
     hpet_rate = 1000000000000000ULL; /* 10^15 */
-    (void)do_div(hpet_rate, hpet_period);
+    last = do_div(hpet_rate, hpet_period);
 
-    return hpet_rate;
+    return hpet_rate + (last * 2 > hpet_period);
 }
 
 void hpet_resume(u32 *boot_cfg)
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -275,7 +275,10 @@ static char *freq_string(u64 freq)
 {
     static char s[20];
     unsigned int x, y;
-    y = (unsigned int)do_div(freq, 1000000) / 1000;
+
+    if ( do_div(freq, 1000) > 500 )
+        ++freq;
+    y = (unsigned int)do_div(freq, 1000);
     x = (unsigned int)freq;
     snprintf(s, sizeof(s), "%u.%03uMHz", x, y);
     return s;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/4] x86/time: reduce rounding errors in calculations
Posted by Andrew Cooper 4 years, 1 month ago
On 13/03/2020 09:25, Jan Beulich wrote:
> Plain (unsigned) integer division simply truncates the results. The
> overall errors are smaller though if we use proper rounding. (Extend
> this to the purely cosmetic aspect of time.c's freq_string(), which
> before this change I've frequently observed to report e.g. NN.999MHz
> HPET clock speeds.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> We could switch at least the first div/rem pair in calibrate_APIC_clock()
> to use do_div(), but this would imply switching bus_freq (and then also
> result) to unsigned int (as the divisor has to be 32-bit). While I think
> there's pretty little risk for bus frequencies to go beyond 4GHz in the
> next so many years, I still wasn't certain enough this would be a welcome
> change.

Honestly, do_div() is very easy to get wrong (and in security relevant
ways in Linux).  I'd advocate for phasing its use out, irrespective of
this frequency concern.

As for 4GHz, I think its unlikely, but better to be safe in the code.

>
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -1261,7 +1261,9 @@ static int __init calibrate_APIC_clock(v
>      /* set up multipliers for accurate timer code */
>      bus_freq   = result*HZ;
>      bus_cycle  = (u32) (1000000000000LL/bus_freq); /* in pico seconds */
> +    bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;

These two differ in signedness of the numerator.  GCC seems to cope with
combining the two into a single div instruction, but I think we should
be consistent with the constant nevertheless.

Otherwise, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/4] x86/time: reduce rounding errors in calculations
Posted by Jan Beulich 4 years, 1 month ago
On 13.03.2020 16:14, Andrew Cooper wrote:
> On 13/03/2020 09:25, Jan Beulich wrote:
>> Plain (unsigned) integer division simply truncates the results. The
>> overall errors are smaller though if we use proper rounding. (Extend
>> this to the purely cosmetic aspect of time.c's freq_string(), which
>> before this change I've frequently observed to report e.g. NN.999MHz
>> HPET clock speeds.)
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> We could switch at least the first div/rem pair in calibrate_APIC_clock()
>> to use do_div(), but this would imply switching bus_freq (and then also
>> result) to unsigned int (as the divisor has to be 32-bit). While I think
>> there's pretty little risk for bus frequencies to go beyond 4GHz in the
>> next so many years, I still wasn't certain enough this would be a welcome
>> change.
> 
> Honestly, do_div() is very easy to get wrong (and in security relevant
> ways in Linux).  I'd advocate for phasing its use out, irrespective of
> this frequency concern.
> 
> As for 4GHz, I think its unlikely, but better to be safe in the code.
> 
>>
>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -1261,7 +1261,9 @@ static int __init calibrate_APIC_clock(v
>>      /* set up multipliers for accurate timer code */
>>      bus_freq   = result*HZ;
>>      bus_cycle  = (u32) (1000000000000LL/bus_freq); /* in pico seconds */
>> +    bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;
> 
> These two differ in signedness of the numerator.  GCC seems to cope with
> combining the two into a single div instruction, but I think we should
> be consistent with the constant nevertheless.

IOW you'd like me to change the other line too, to have a UL
suffix? If so, at that point I'd drop the stray cast, too.

> Otherwise, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks, but please let me know if the above is a correct
understanding of mine.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/4] x86/time: reduce rounding errors in calculations
Posted by Andrew Cooper 4 years, 1 month ago
On 16/03/2020 08:58, Jan Beulich wrote:
> On 13.03.2020 16:14, Andrew Cooper wrote:
>> On 13/03/2020 09:25, Jan Beulich wrote:
>>> Plain (unsigned) integer division simply truncates the results. The
>>> overall errors are smaller though if we use proper rounding. (Extend
>>> this to the purely cosmetic aspect of time.c's freq_string(), which
>>> before this change I've frequently observed to report e.g. NN.999MHz
>>> HPET clock speeds.)
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> We could switch at least the first div/rem pair in calibrate_APIC_clock()
>>> to use do_div(), but this would imply switching bus_freq (and then also
>>> result) to unsigned int (as the divisor has to be 32-bit). While I think
>>> there's pretty little risk for bus frequencies to go beyond 4GHz in the
>>> next so many years, I still wasn't certain enough this would be a welcome
>>> change.
>>
>> Honestly, do_div() is very easy to get wrong (and in security relevant
>> ways in Linux).  I'd advocate for phasing its use out, irrespective of
>> this frequency concern.
>>
>> As for 4GHz, I think its unlikely, but better to be safe in the code.
>>
>>>
>>> --- a/xen/arch/x86/apic.c
>>> +++ b/xen/arch/x86/apic.c
>>> @@ -1261,7 +1261,9 @@ static int __init calibrate_APIC_clock(v
>>>      /* set up multipliers for accurate timer code */
>>>      bus_freq   = result*HZ;
>>>      bus_cycle  = (u32) (1000000000000LL/bus_freq); /* in pico seconds */
>>> +    bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;
>>
>> These two differ in signedness of the numerator.  GCC seems to cope with
>> combining the two into a single div instruction, but I think we should
>> be consistent with the constant nevertheless.
> 
> IOW you'd like me to change the other line too, to have a UL
> suffix? If so, at that point I'd drop the stray cast, too.

That would be fine yes.

~Andrew

> 
>> Otherwise, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Thanks, but please let me know if the above is a correct
> understanding of mine.
> 
> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/4] x86/APIC: reduce rounding errors in calculations
Posted by Jan Beulich 4 years, 1 month ago
Dividing by HZ/10 just to subsequently multiply by HZ again in all uses
of the respective variable is pretty pointlessly introducing rounding
(really: truncation) errors. While transforming the respective
expressions it became apparent that "result" would be left unused except
for its use as function return value. As the sole caller of the function
doesn't look at the returned value, simply convert the function to have
"void" return type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1204,14 +1204,14 @@ static void wait_tick_pvh(void)
  * APIC irq that way.
  */
 
-static int __init calibrate_APIC_clock(void)
+static void __init calibrate_APIC_clock(void)
 {
     unsigned long long t1, t2;
-    unsigned long tt1, tt2, result;
+    unsigned long tt1, tt2;
     unsigned int i;
     unsigned long bus_freq; /* KAF: pointer-size avoids compile warns. */
     unsigned int bus_cycle; /* length of one bus cycle in pico-seconds */
-    const unsigned int LOOPS = HZ/10;
+#define LOOPS_FRAC 10U      /* measure for one tenth of a second */
 
     apic_printk(APIC_VERBOSE, "calibrating APIC timer ...\n");
 
@@ -1238,9 +1238,9 @@ static int __init calibrate_APIC_clock(v
     tt1 = apic_read(APIC_TMCCT);
 
     /*
-     * Let's wait LOOPS ticks:
+     * Let's wait HZ / LOOPS_FRAC ticks:
      */
-    for (i = 0; i < LOOPS; i++)
+    for (i = 0; i < HZ / LOOPS_FRAC; i++)
         if ( !xen_guest )
             wait_8254_wraparound();
         else
@@ -1249,17 +1249,16 @@ static int __init calibrate_APIC_clock(v
     tt2 = apic_read(APIC_TMCCT);
     t2 = rdtsc_ordered();
 
-    result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
+    bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
 
-    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %ld.%04ld MHz.\n",
-                ((long)(t2 - t1) / LOOPS) / (1000000 / HZ),
-                ((long)(t2 - t1) / LOOPS) % (1000000 / HZ));
+    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
+                ((unsigned long)(t2 - t1) * LOOPS_FRAC) / 1000000,
+                ((unsigned long)(t2 - t1) * LOOPS_FRAC / 100) % 10000);
 
     apic_printk(APIC_VERBOSE, "..... host bus clock speed is %ld.%04ld MHz.\n",
-                result / (1000000 / HZ), result % (1000000 / HZ));
+                bus_freq / 1000000, (bus_freq / 100) % 10000);
 
     /* set up multipliers for accurate timer code */
-    bus_freq   = result*HZ;
     bus_cycle  = (u32) (1000000000000LL/bus_freq); /* in pico seconds */
     bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;
     bus_scale  = (1000*262144)/bus_cycle;
@@ -1269,7 +1268,7 @@ static int __init calibrate_APIC_clock(v
     /* reset APIC to zero timeout value */
     __setup_APIC_LVTT(0);
 
-    return result;
+#undef LOOPS_FRAC
 }
 
 void __init setup_boot_APIC_clock(void)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/4] x86/APIC: reduce rounding errors in calculations
Posted by Andrew Cooper 4 years, 1 month ago
On 13/03/2020 09:26, Jan Beulich wrote:
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -1249,17 +1249,16 @@ static int __init calibrate_APIC_clock(v
>      tt2 = apic_read(APIC_TMCCT);
>      t2 = rdtsc_ordered();
>  
> -    result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
> +    bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
>  
> -    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %ld.%04ld MHz.\n",
> -                ((long)(t2 - t1) / LOOPS) / (1000000 / HZ),
> -                ((long)(t2 - t1) / LOOPS) % (1000000 / HZ));
> +    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC) / 1000000,
> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC / 100) % 10000);

If I'm not mistaken, this expression only works because of the L->R
associativity (given the same precedence of * and /) grouping it as
((t2-t1) * 10)  / 100 as opposed to (t2-t1) * (10 / 100), where the
latter would truncate to 0.  I think some extra brackets would help for
clarity.

That said, what is wrong with keeping the original form?  I realise this
is only for a printk(), but the div instruction can't be shared between
the two halves.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/4] x86/APIC: reduce rounding errors in calculations
Posted by Jan Beulich 4 years, 1 month ago
On 13.03.2020 16:50, Andrew Cooper wrote:
> On 13/03/2020 09:26, Jan Beulich wrote:
>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -1249,17 +1249,16 @@ static int __init calibrate_APIC_clock(v
>>      tt2 = apic_read(APIC_TMCCT);
>>      t2 = rdtsc_ordered();
>>  
>> -    result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
>> +    bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
>>  
>> -    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %ld.%04ld MHz.\n",
>> -                ((long)(t2 - t1) / LOOPS) / (1000000 / HZ),
>> -                ((long)(t2 - t1) / LOOPS) % (1000000 / HZ));
>> +    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
>> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC) / 1000000,
>> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC / 100) % 10000);
> 
> If I'm not mistaken, this expression only works because of the L->R
> associativity (given the same precedence of * and /) grouping it as
> ((t2-t1) * 10)  / 100 as opposed to (t2-t1) * (10 / 100), where the
> latter would truncate to 0.  I think some extra brackets would help for
> clarity.

Well, yes, done. The alternative would have been to drop more of
them.

> That said, what is wrong with keeping the original form?

The same as elsewhere in this patch, and as said in the description -
there's been pointless rounding (really: truncation) errors here from
first dividing by HZ (to be precise: by HZ/10) and then effectively
multiplying by this value again. The original division-only argument
would not be affected afaict, but the remainder one would. Furthermore
I'd like to avoid having to retain the LOOPS constant.

>  I realise this
> is only for a printk(), but the div instruction can't be shared between
> the two halves.

This being an __init function, I don't think the number of divisions
is a concern here, the more that the compiler - with the divisor
being a constant - will convert them to multiplications anyway.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/4] x86/APIC: reduce rounding errors in calculations
Posted by Andrew Cooper 4 years, 1 month ago
On 16/03/2020 09:07, Jan Beulich wrote:
> On 13.03.2020 16:50, Andrew Cooper wrote:
>> On 13/03/2020 09:26, Jan Beulich wrote:
>>> --- a/xen/arch/x86/apic.c
>>> +++ b/xen/arch/x86/apic.c
>>> @@ -1249,17 +1249,16 @@ static int __init calibrate_APIC_clock(v
>>>      tt2 = apic_read(APIC_TMCCT);
>>>      t2 = rdtsc_ordered();
>>>  
>>> -    result = (tt1-tt2)*APIC_DIVISOR/LOOPS;
>>> +    bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
>>>  
>>> -    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %ld.%04ld MHz.\n",
>>> -                ((long)(t2 - t1) / LOOPS) / (1000000 / HZ),
>>> -                ((long)(t2 - t1) / LOOPS) % (1000000 / HZ));
>>> +    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
>>> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC) / 1000000,
>>> +                ((unsigned long)(t2 - t1) * LOOPS_FRAC / 100) % 10000);
>>
>> If I'm not mistaken, this expression only works because of the L->R
>> associativity (given the same precedence of * and /) grouping it as
>> ((t2-t1) * 10)  / 100 as opposed to (t2-t1) * (10 / 100), where the
>> latter would truncate to 0.  I think some extra brackets would help for
>> clarity.
> 
> Well, yes, done. The alternative would have been to drop more of
> them.
> 
>> That said, what is wrong with keeping the original form?
> 
> The same as elsewhere in this patch, and as said in the description -
> there's been pointless rounding (really: truncation) errors here from
> first dividing by HZ (to be precise: by HZ/10) and then effectively
> multiplying by this value again. The original division-only argument
> would not be affected afaict, but the remainder one would. Furthermore
> I'd like to avoid having to retain the LOOPS constant.
> 
>>   I realise this
>> is only for a printk(), but the div instruction can't be shared between
>> the two halves.
> 
> This being an __init function, I don't think the number of divisions
> is a concern here, the more that the compiler - with the divisor
> being a constant - will convert them to multiplications anyway.

Good point.  Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Jan Beulich 4 years, 1 month ago
All CPUs get an equal setting of EOI broadcast suppression; no need to
log one message per CPU, even if it's only in verbose APIC mode.

Only the BSP is eligible to possibly get ExtINT enabled; no need to log
that it gets disabled on all APs, even if - again - it's only in verbose
APIC mode.

Take the opportunity and introduce a "bsp" parameter to the function, to
stop using smp_processor_id() to tell BSP from APs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -499,7 +499,7 @@ static void resume_x2apic(void)
     __enable_x2apic();
 }
 
-void setup_local_APIC(void)
+void setup_local_APIC(bool bsp)
 {
     unsigned long oldvalue, value, maxlvt;
     int i, j;
@@ -598,8 +598,8 @@ void setup_local_APIC(void)
     if ( directed_eoi_enabled )
     {
         value |= APIC_SPIV_DIRECTED_EOI;
-        apic_printk(APIC_VERBOSE, "Suppress EOI broadcast on CPU#%d\n",
-                    smp_processor_id());
+        if ( bsp )
+            apic_printk(APIC_VERBOSE, "Suppressing EOI broadcast\n");
     }
 
     apic_write(APIC_SPIV, value);
@@ -615,21 +615,22 @@ void setup_local_APIC(void)
      * TODO: set up through-local-APIC from through-I/O-APIC? --macro
      */
     value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
-    if (!smp_processor_id() && (pic_mode || !value)) {
+    if (bsp && (pic_mode || !value)) {
         value = APIC_DM_EXTINT;
         apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n",
                     smp_processor_id());
     } else {
         value = APIC_DM_EXTINT | APIC_LVT_MASKED;
-        apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
-                    smp_processor_id());
+        if (bsp)
+            apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
+                        smp_processor_id());
     }
     apic_write(APIC_LVT0, value);
 
     /*
      * only the BP should see the LINT1 NMI signal, obviously.
      */
-    if (!smp_processor_id())
+    if (bsp)
         value = APIC_DM_NMI;
     else
         value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -663,7 +664,7 @@ void setup_local_APIC(void)
         printk("Leaving ESR disabled.\n");
     }
 
-    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
+    if (nmi_watchdog == NMI_LOCAL_APIC && !bsp)
         setup_apic_nmi_watchdog();
     apic_pm_activate();
 }
@@ -1474,7 +1475,7 @@ int __init APIC_init_uniprocessor (void)
     physids_clear(phys_cpu_present_map);
     physid_set(boot_cpu_physical_apicid, phys_cpu_present_map);
 
-    setup_local_APIC();
+    setup_local_APIC(true);
 
     if (nmi_watchdog == NMI_LOCAL_APIC)
         check_nmi_watchdog();
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -191,7 +191,7 @@ static void smp_callin(void)
      */
     Dprintk("CALLIN, before setup_local_APIC().\n");
     x2apic_ap_setup();
-    setup_local_APIC();
+    setup_local_APIC(false);
 
     /* Save our processor parameters. */
     if ( !smp_store_cpu_info(cpu) )
@@ -1165,7 +1165,7 @@ void __init smp_prepare_cpus(void)
     verify_local_APIC();
 
     connect_bsp_APIC();
-    setup_local_APIC();
+    setup_local_APIC(true);
 
     if ( !skip_ioapic_setup && nr_ioapics )
         setup_IO_APIC();
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -169,7 +169,7 @@ extern int verify_local_APIC (void);
 extern void cache_APIC_registers (void);
 extern void sync_Arb_IDs (void);
 extern void init_bsp_APIC (void);
-extern void setup_local_APIC (void);
+extern void setup_local_APIC(bool bsp);
 extern void init_apic_mappings (void);
 extern void smp_local_timer_interrupt (struct cpu_user_regs *regs);
 extern void setup_boot_APIC_clock (void);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Andrew Cooper 4 years, 1 month ago
On 13/03/2020 09:26, Jan Beulich wrote:
> All CPUs get an equal setting of EOI broadcast suppression; no need to
> log one message per CPU, even if it's only in verbose APIC mode.
>
> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
> that it gets disabled on all APs, even if - again - it's only in verbose
> APIC mode.

How true is this in practice?

I know it is certainly the recommended configuration, but interrupt
remapping can definitely point ExtINT at arbitrary CPUs, and IIRC, the
MP spec simply says that "only one CPU should be configured to receive
ExtINT".

I know we definitely have bugs with ExiINT handling, but I haven't had
time to debug further.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Jan Beulich 4 years, 1 month ago
On 13.03.2020 17:18, Andrew Cooper wrote:
> On 13/03/2020 09:26, Jan Beulich wrote:
>> All CPUs get an equal setting of EOI broadcast suppression; no need to
>> log one message per CPU, even if it's only in verbose APIC mode.
>>
>> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
>> that it gets disabled on all APs, even if - again - it's only in verbose
>> APIC mode.
> 
> How true is this in practice?

I guess you read "eligible" as "in theory", whereas it is meant as "with
how our [and in particular this] code is working right now". Even if we
decided to switch the CPU to handle ExtINT, it still wouldn't need to be
one message per CPU - it would suffice to issue the message for the one
CPU getting ExtINT enabled.

Jan

> I know it is certainly the recommended configuration, but interrupt
> remapping can definitely point ExtINT at arbitrary CPUs, and IIRC, the
> MP spec simply says that "only one CPU should be configured to receive
> ExtINT".
> 
> I know we definitely have bugs with ExiINT handling, but I haven't had
> time to debug further.
> 
> ~Andrew
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Ping: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Jan Beulich 4 years ago
On 16.03.2020 10:10, Jan Beulich wrote:
> On 13.03.2020 17:18, Andrew Cooper wrote:
>> On 13/03/2020 09:26, Jan Beulich wrote:
>>> All CPUs get an equal setting of EOI broadcast suppression; no need to
>>> log one message per CPU, even if it's only in verbose APIC mode.
>>>
>>> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
>>> that it gets disabled on all APs, even if - again - it's only in verbose
>>> APIC mode.
>>
>> How true is this in practice?
> 
> I guess you read "eligible" as "in theory", whereas it is meant as "with
> how our [and in particular this] code is working right now". Even if we
> decided to switch the CPU to handle ExtINT, it still wouldn't need to be
> one message per CPU - it would suffice to issue the message for the one
> CPU getting ExtINT enabled.

Are you okay with the above explanation, and hence willing to give an
ack? If not, what alternative suggestion do you have to limit this
particular part of the log spam on very-many-CPU systems?

Jan

Re: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Andrew Cooper 4 years ago
On 13/03/2020 09:26, Jan Beulich wrote:
> All CPUs get an equal setting of EOI broadcast suppression; no need to
> log one message per CPU, even if it's only in verbose APIC mode.
>
> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
> that it gets disabled on all APs, even if - again - it's only in verbose
> APIC mode.
>
> Take the opportunity and introduce a "bsp" parameter to the function, to
> stop using smp_processor_id() to tell BSP from APs.

On further consideration, this is introducing a latent bug.

In a theoretical world where we could take the BSP offline, it is still
the CPU with the ID 0 which needs various of these things setting back up.

You could argue that we could move ExtINT/NMI handling to a different
CPU, and in this case, BSP still isn't the right distinction.  We'd want
something to signify "the processor which is the target of legacy
interrupts", as in such a case, it would specifically no longer be the
CPU we booted on.

OTOH, the adjustment for the NMI watchdog does look to be different. 
AFAICT, that is for deferring the watchdog setup until later in boot, at
which point "the BSP" is the appropriate distinction to use.  (That said
- I'm not sure why anything should need delaying.  I suspect this is
misplaced code to begin with.)

As for the messages being printed, I think that is fine to restrict to
the BSP.

A conversation on LKML has revealed why LVT0.MASK gets sampled - it is
to distinguish between the two virtual wire modes.  LVT0.MASK needs to
stay masked on the BSP if the firmware configured it like that, because
the PIC is wired through an IO-APIC pin which ultimately ends up
delivering an MSI ExtINT interrupt, rather than using the dedicates
sideband bus message to emulate the legacy ExtINT/LINT0 line.

~Andrew

Re: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Jan Beulich 4 years ago
On 02.04.2020 19:55, Andrew Cooper wrote:
> On 13/03/2020 09:26, Jan Beulich wrote:
>> All CPUs get an equal setting of EOI broadcast suppression; no need to
>> log one message per CPU, even if it's only in verbose APIC mode.
>>
>> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
>> that it gets disabled on all APs, even if - again - it's only in verbose
>> APIC mode.
>>
>> Take the opportunity and introduce a "bsp" parameter to the function, to
>> stop using smp_processor_id() to tell BSP from APs.
> 
> On further consideration, this is introducing a latent bug.
> 
> In a theoretical world where we could take the BSP offline, it is still
> the CPU with the ID 0 which needs various of these things setting back up.

No. If we took the BSP offline, no CPU with ID 0 would remain
(until such time that the ID would be re-used).

> You could argue that we could move ExtINT/NMI handling to a different
> CPU, and in this case, BSP still isn't the right distinction.  We'd want
> something to signify "the processor which is the target of legacy
> interrupts", as in such a case, it would specifically no longer be the
> CPU we booted on.

I see nothing wrong with calling this new "focus" processor the
BSP again. Post boot the significance of BSP, afaict, reduces
to aspects that aren't really tied to having been the processor
to lead bootup.

The important point is - if we were to allow offlining the BSP,
whichever other one would take its position would _not_ come
through setup_local_APIC(). The adjustments needs would need
doing by other means (quite likely by splitting out some of the
code into a new helper). Hence in setup_local_APIC() itself the
BSP / non-BSP distinction is correct.

> OTOH, the adjustment for the NMI watchdog does look to be different. 
> AFAICT, that is for deferring the watchdog setup until later in boot, at
> which point "the BSP" is the appropriate distinction to use.  (That said
> - I'm not sure why anything should need delaying.  I suspect this is
> misplaced code to begin with.)

Well, in any event - an unrelated aspect, and I think the switch
from smp_processor_id() to bsp is correct there as well.

> As for the messages being printed, I think that is fine to restrict to
> the BSP.

As per above I understand this isn't an ack for the patch. However,
I then don't understand what concrete changes you mean me to make
for it to stand a chance of getting your ack.

Jan

Re: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Roger Pau Monné 3 years, 11 months ago
On Fri, Mar 13, 2020 at 10:26:47AM +0100, Jan Beulich wrote:
> All CPUs get an equal setting of EOI broadcast suppression; no need to
> log one message per CPU, even if it's only in verbose APIC mode.
> 
> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
> that it gets disabled on all APs, even if - again - it's only in verbose
> APIC mode.
> 
> Take the opportunity and introduce a "bsp" parameter to the function, to
> stop using smp_processor_id() to tell BSP from APs.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

LGTM:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

AFAICT this doesn't introduce any functional change in APIC setup or
behavior, the only functional change is the log message reduction.
Might be good to add a note to that effect to make this clear, since
the change from smp_processor_id() -> bsp might make this not obvious.

> 
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -499,7 +499,7 @@ static void resume_x2apic(void)
>      __enable_x2apic();
>  }
>  
> -void setup_local_APIC(void)
> +void setup_local_APIC(bool bsp)
>  {
>      unsigned long oldvalue, value, maxlvt;
>      int i, j;
> @@ -598,8 +598,8 @@ void setup_local_APIC(void)
>      if ( directed_eoi_enabled )
>      {
>          value |= APIC_SPIV_DIRECTED_EOI;
> -        apic_printk(APIC_VERBOSE, "Suppress EOI broadcast on CPU#%d\n",
> -                    smp_processor_id());
> +        if ( bsp )
> +            apic_printk(APIC_VERBOSE, "Suppressing EOI broadcast\n");
>      }
>  
>      apic_write(APIC_SPIV, value);
> @@ -615,21 +615,22 @@ void setup_local_APIC(void)
>       * TODO: set up through-local-APIC from through-I/O-APIC? --macro
>       */
>      value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
> -    if (!smp_processor_id() && (pic_mode || !value)) {
> +    if (bsp && (pic_mode || !value)) {
>          value = APIC_DM_EXTINT;
>          apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n",
>                      smp_processor_id());
>      } else {
>          value = APIC_DM_EXTINT | APIC_LVT_MASKED;
> -        apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
> -                    smp_processor_id());
> +        if (bsp)
> +            apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
> +                        smp_processor_id());

You might want to also drop the CPU#%d from the above messages, since
they would only be printed for the BSP.

>      }
>      apic_write(APIC_LVT0, value);
>  
>      /*
>       * only the BP should see the LINT1 NMI signal, obviously.
>       */
> -    if (!smp_processor_id())
> +    if (bsp)
>          value = APIC_DM_NMI;
>      else
>          value = APIC_DM_NMI | APIC_LVT_MASKED;

This would be shorter as:

value = APIC_DM_NMI | (bsp ? 0 : APIC_LVT_MASKED);

Not specially trilled anyway.

Thanks, Roger.

Re: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Jan Beulich 3 years, 11 months ago
On 14.05.2020 12:01, Roger Pau Monné wrote:
> On Fri, Mar 13, 2020 at 10:26:47AM +0100, Jan Beulich wrote:
>> All CPUs get an equal setting of EOI broadcast suppression; no need to
>> log one message per CPU, even if it's only in verbose APIC mode.
>>
>> Only the BSP is eligible to possibly get ExtINT enabled; no need to log
>> that it gets disabled on all APs, even if - again - it's only in verbose
>> APIC mode.
>>
>> Take the opportunity and introduce a "bsp" parameter to the function, to
>> stop using smp_processor_id() to tell BSP from APs.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> LGTM:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> AFAICT this doesn't introduce any functional change in APIC setup or
> behavior, the only functional change is the log message reduction.
> Might be good to add a note to that effect to make this clear, since
> the change from smp_processor_id() -> bsp might make this not obvious.

I've added "No functional change from this" to the last paragraph.

>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -499,7 +499,7 @@ static void resume_x2apic(void)
>>      __enable_x2apic();
>>  }
>>  
>> -void setup_local_APIC(void)
>> +void setup_local_APIC(bool bsp)
>>  {
>>      unsigned long oldvalue, value, maxlvt;
>>      int i, j;
>> @@ -598,8 +598,8 @@ void setup_local_APIC(void)
>>      if ( directed_eoi_enabled )
>>      {
>>          value |= APIC_SPIV_DIRECTED_EOI;
>> -        apic_printk(APIC_VERBOSE, "Suppress EOI broadcast on CPU#%d\n",
>> -                    smp_processor_id());
>> +        if ( bsp )
>> +            apic_printk(APIC_VERBOSE, "Suppressing EOI broadcast\n");
>>      }
>>  
>>      apic_write(APIC_SPIV, value);
>> @@ -615,21 +615,22 @@ void setup_local_APIC(void)
>>       * TODO: set up through-local-APIC from through-I/O-APIC? --macro
>>       */
>>      value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
>> -    if (!smp_processor_id() && (pic_mode || !value)) {
>> +    if (bsp && (pic_mode || !value)) {
>>          value = APIC_DM_EXTINT;
>>          apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n",
>>                      smp_processor_id());
>>      } else {
>>          value = APIC_DM_EXTINT | APIC_LVT_MASKED;
>> -        apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
>> -                    smp_processor_id());
>> +        if (bsp)
>> +            apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
>> +                        smp_processor_id());
> 
> You might want to also drop the CPU#%d from the above messages, since
> they would only be printed for the BSP.

I want to specifically keep them, so that once (if ever) we introduce
hot-unplug support for the BSP, the same or similar messages can be
used and matched against earlier ones in the log.

>>      }
>>      apic_write(APIC_LVT0, value);
>>  
>>      /*
>>       * only the BP should see the LINT1 NMI signal, obviously.
>>       */
>> -    if (!smp_processor_id())
>> +    if (bsp)
>>          value = APIC_DM_NMI;
>>      else
>>          value = APIC_DM_NMI | APIC_LVT_MASKED;
> 
> This would be shorter as:
> 
> value = APIC_DM_NMI | (bsp ? 0 : APIC_LVT_MASKED);

Indeed, at the expense of larger code churn. Seems like an at least
partially unrelated change to me (at risk of obscuring the actual
purpose of the change here).

Jan

Re: [PATCH 4/4] x86/APIC: restrict certain messages to BSP
Posted by Roger Pau Monné 3 years, 11 months ago
On Thu, May 14, 2020 at 02:31:33PM +0200, Jan Beulich wrote:
> On 14.05.2020 12:01, Roger Pau Monné wrote:
> > On Fri, Mar 13, 2020 at 10:26:47AM +0100, Jan Beulich wrote:
> >> --- a/xen/arch/x86/apic.c
> >> +++ b/xen/arch/x86/apic.c
> >> @@ -499,7 +499,7 @@ static void resume_x2apic(void)
> >>      __enable_x2apic();
> >>  }
> >>  
> >> -void setup_local_APIC(void)
> >> +void setup_local_APIC(bool bsp)
> >>  {
> >>      unsigned long oldvalue, value, maxlvt;
> >>      int i, j;
> >> @@ -598,8 +598,8 @@ void setup_local_APIC(void)
> >>      if ( directed_eoi_enabled )
> >>      {
> >>          value |= APIC_SPIV_DIRECTED_EOI;
> >> -        apic_printk(APIC_VERBOSE, "Suppress EOI broadcast on CPU#%d\n",
> >> -                    smp_processor_id());
> >> +        if ( bsp )
> >> +            apic_printk(APIC_VERBOSE, "Suppressing EOI broadcast\n");
> >>      }
> >>  
> >>      apic_write(APIC_SPIV, value);
> >> @@ -615,21 +615,22 @@ void setup_local_APIC(void)
> >>       * TODO: set up through-local-APIC from through-I/O-APIC? --macro
> >>       */
> >>      value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
> >> -    if (!smp_processor_id() && (pic_mode || !value)) {
> >> +    if (bsp && (pic_mode || !value)) {
> >>          value = APIC_DM_EXTINT;
> >>          apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n",
> >>                      smp_processor_id());
> >>      } else {
> >>          value = APIC_DM_EXTINT | APIC_LVT_MASKED;
> >> -        apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
> >> -                    smp_processor_id());
> >> +        if (bsp)
> >> +            apic_printk(APIC_VERBOSE, "masked ExtINT on CPU#%d\n",
> >> +                        smp_processor_id());
> > 
> > You might want to also drop the CPU#%d from the above messages, since
> > they would only be printed for the BSP.
> 
> I want to specifically keep them, so that once (if ever) we introduce
> hot-unplug support for the BSP, the same or similar messages can be
> used and matched against earlier ones in the log.
> 
> >>      }
> >>      apic_write(APIC_LVT0, value);
> >>  
> >>      /*
> >>       * only the BP should see the LINT1 NMI signal, obviously.
> >>       */
> >> -    if (!smp_processor_id())
> >> +    if (bsp)
> >>          value = APIC_DM_NMI;
> >>      else
> >>          value = APIC_DM_NMI | APIC_LVT_MASKED;
> > 
> > This would be shorter as:
> > 
> > value = APIC_DM_NMI | (bsp ? 0 : APIC_LVT_MASKED);
> 
> Indeed, at the expense of larger code churn. Seems like an at least
> partially unrelated change to me (at risk of obscuring the actual
> purpose of the change here).

FTR, I'm happy with both of the above and my RB stands.

Roger.