sched=null vwfi=native and call_rcu()

Stefano Stabellini posted 1 patch 2 weeks, 5 days ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/alpine.DEB.2.22.394.2201051615060.2060010@ubuntu-linux-20-04-desktop

sched=null vwfi=native and call_rcu()

Posted by Stefano Stabellini 2 weeks, 5 days ago
Hi all,

As you might remember, we have an outstanding issue with call_rcu() when
sched=null vwfi=native are used. That is because in that configuration
the CPU never goes idle so rcu_idle_enter() never gets called.

The issue was caught on the field and I managed to repro the problem
doing the following:

xl destroy test
xl create ./test.cfg

Resulting in the following error:

# Parsing config from ./test.cfg
# (XEN) IRQ 54 is already used by domain 1

The test domain has 3 interrupts remapped to it and they don't get
released before the new domain creation is requested.

Just FYI, the below hacky patch seems to reliably work-around the
problem in my environment.

Do you have any suggestions on what would be the right way to solve
this issue?

Cheers,

Stefano


diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3de..841a5cb3c9 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -289,6 +289,9 @@ void call_rcu(struct rcu_head *head,
         force_quiescent_state(rdp, &rcu_ctrlblk);
     }
     local_irq_restore(flags);
+    /* make sure that the CPU has a chance to check RCUs */
+    set_timer(&rdp->idle_timer, NOW() + SECONDS(1));
+    rdp->idle_timer_active = true;
 }
 
 /*


Re: sched=null vwfi=native and call_rcu()

Posted by Julien Grall 2 weeks, 4 days ago

On 06/01/2022 00:40, Stefano Stabellini wrote:
> Hi all,

Hi,

> As you might remember, we have an outstanding issue with call_rcu() when
> sched=null vwfi=native are used. That is because in that configuration
> the CPU never goes idle so rcu_idle_enter() never gets called.
> 
> The issue was caught on the field and I managed to repro the problem
> doing the following:
> 
> xl destroy test
> xl create ./test.cfg
> 
> Resulting in the following error:
> 
> # Parsing config from ./test.cfg
> # (XEN) IRQ 54 is already used by domain 1
> 
> The test domain has 3 interrupts remapped to it and they don't get
> released before the new domain creation is requested.
> 
> Just FYI, the below hacky patch seems to reliably work-around the
> problem in my environment.
> 
> Do you have any suggestions on what would be the right way to solve
> this issue?

This issue and solution were discussed numerous time on the ML. In 
short, we want to tell the RCU that CPU running in guest context are 
always quiesced. For more details, you can read the previous thread 
(which also contains a link to the one before):

https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/

Cheers,

-- 
Julien Grall

Re: sched=null vwfi=native and call_rcu()

Posted by Stefano Stabellini 2 weeks, 4 days ago
On Thu, 6 Jan 2022, Julien Grall wrote:
> On 06/01/2022 00:40, Stefano Stabellini wrote:
> > As you might remember, we have an outstanding issue with call_rcu() when
> > sched=null vwfi=native are used. That is because in that configuration
> > the CPU never goes idle so rcu_idle_enter() never gets called.
> > 
> > The issue was caught on the field and I managed to repro the problem
> > doing the following:
> > 
> > xl destroy test
> > xl create ./test.cfg
> > 
> > Resulting in the following error:
> > 
> > # Parsing config from ./test.cfg
> > # (XEN) IRQ 54 is already used by domain 1
> > 
> > The test domain has 3 interrupts remapped to it and they don't get
> > released before the new domain creation is requested.
> > 
> > Just FYI, the below hacky patch seems to reliably work-around the
> > problem in my environment.
> > 
> > Do you have any suggestions on what would be the right way to solve
> > this issue?
> 
> This issue and solution were discussed numerous time on the ML. In short, we
> want to tell the RCU that CPU running in guest context are always quiesced.
> For more details, you can read the previous thread (which also contains a link
> to the one before):
> 
> https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/

Thanks Julien for the pointer!

Dario, I forward-ported your three patches to staging:
https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet

I can confirm that they fix the bug. Note that I had to add a small
change on top to remove the ASSERT at the beginning of rcu_quiet_enter:
https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9

Would you be up for submitting them for upstreaming? I would prefer if
you send out the patches because I cannot claim to understand them
completely (except for the one doing renaming :-P )

I am also attaching the four patches for your convenience.

Re: sched=null vwfi=native and call_rcu()

Posted by Dario Faggioli 1 week, 3 days ago
On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
> On Thu, 6 Jan 2022, Julien Grall wrote:
> > 
> > This issue and solution were discussed numerous time on the ML. In
> > short, we
> > want to tell the RCU that CPU running in guest context are always
> > quiesced.
> > For more details, you can read the previous thread (which also
> > contains a link
> > to the one before):
> > 
> > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/
> 
> Thanks Julien for the pointer!
> 
> Dario, I forward-ported your three patches to staging:
> https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet
> 
Hi Stefano!

I definitely would like to see the end of this issue, so thanks a lot
for your interest and your help with the patches.

> I can confirm that they fix the bug. Note that I had to add a small
> change on top to remove the ASSERT at the beginning of
> rcu_quiet_enter:
> https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9
> 
Yeah, that should be fine.

> Would you be up for submitting them for upstreaming? I would prefer
> if
> you send out the patches because I cannot claim to understand them
> completely (except for the one doing renaming :-P )
> 
Haha! So, I am up for properly submitting, but there's one problem. As
you've probably got, the idea here is to use transitions toward the
guest and inside the hypervisor as RCU quiescence and "activation"
points.

Now, on ARM, that just meant calling rcu_quiet_exit() in
enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
definitely not an ARM person-- cloud understand it (although with some
help from Julien) and put the patches together.

However, the problem is really arch independent and, despite not
surfacing equally frequently, it affects x86 as well. And for x86 the
situation is by far not equally nice, when it comes to identifying all
the places from where to call rcu_quiet_{enter,exit}().

And finding out where to put them, among the various functions that we
have in the various entry.S variants is where I stopped. The plan was
to get back to it, but as shamefully as it sounds, I could not do that
yet.

So, if anyone wants to help with this, handing over suggestions for
potential good spots, that would help a lot.

Alternatively, we can submit the series as ARM-only... But I fear that
the x86 side of things would then be easily forgotten. :-(

Thanks again and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: sched=null vwfi=native and call_rcu()

Posted by Stefano Stabellini 1 week, 3 days ago
On Fri, 14 Jan 2022, Dario Faggioli wrote:
> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
> > On Thu, 6 Jan 2022, Julien Grall wrote:
> > > 
> > > This issue and solution were discussed numerous time on the ML. In
> > > short, we
> > > want to tell the RCU that CPU running in guest context are always
> > > quiesced.
> > > For more details, you can read the previous thread (which also
> > > contains a link
> > > to the one before):
> > > 
> > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/
> > 
> > Thanks Julien for the pointer!
> > 
> > Dario, I forward-ported your three patches to staging:
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet
> > 
> Hi Stefano!
> 
> I definitely would like to see the end of this issue, so thanks a lot
> for your interest and your help with the patches.
> 
> > I can confirm that they fix the bug. Note that I had to add a small
> > change on top to remove the ASSERT at the beginning of
> > rcu_quiet_enter:
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9
> > 
> Yeah, that should be fine.
> 
> > Would you be up for submitting them for upstreaming? I would prefer
> > if
> > you send out the patches because I cannot claim to understand them
> > completely (except for the one doing renaming :-P )
> > 
> Haha! So, I am up for properly submitting, but there's one problem. As
> you've probably got, the idea here is to use transitions toward the
> guest and inside the hypervisor as RCU quiescence and "activation"
> points.
> 
> Now, on ARM, that just meant calling rcu_quiet_exit() in
> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
> definitely not an ARM person-- cloud understand it (although with some
> help from Julien) and put the patches together.
> 
> However, the problem is really arch independent and, despite not
> surfacing equally frequently, it affects x86 as well. And for x86 the
> situation is by far not equally nice, when it comes to identifying all
> the places from where to call rcu_quiet_{enter,exit}().
> 
> And finding out where to put them, among the various functions that we
> have in the various entry.S variants is where I stopped. The plan was
> to get back to it, but as shamefully as it sounds, I could not do that
> yet.
> 
> So, if anyone wants to help with this, handing over suggestions for
> potential good spots, that would help a lot.

Unfortunately I cannot volunteer due to time and also because I wouldn't
know where to look and I don't have a reproducer or a test environment
on x86. I would be flying blind.


> Alternatively, we can submit the series as ARM-only... But I fear that
> the x86 side of things would then be easily forgotten. :-(

I agree with you on this, but at the same time we are having problems
with customers in the field -- it is not like we can wait to solve the
problem on ARM any longer. And the issue is certainly far less likely to
happen on x86 (there is no vwfi=native, right?) In other words, I think
it is better to have half of the solution now to solve the worst part of
the problem than to wait more months for a full solution.

Re: sched=null vwfi=native and call_rcu()

Posted by George Dunlap 1 week ago

> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Fri, 14 Jan 2022, Dario Faggioli wrote:
>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
>>> On Thu, 6 Jan 2022, Julien Grall wrote:
>>>> 
>>>> This issue and solution were discussed numerous time on the ML. In
>>>> short, we
>>>> want to tell the RCU that CPU running in guest context are always
>>>> quiesced.
>>>> For more details, you can read the previous thread (which also
>>>> contains a link
>>>> to the one before):
>>>> 
>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&amp;reserved=0
>>> 
>>> Thanks Julien for the pointer!
>>> 
>>> Dario, I forward-ported your three patches to staging:
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&amp;reserved=0
>>> 
>> Hi Stefano!
>> 
>> I definitely would like to see the end of this issue, so thanks a lot
>> for your interest and your help with the patches.
>> 
>>> I can confirm that they fix the bug. Note that I had to add a small
>>> change on top to remove the ASSERT at the beginning of
>>> rcu_quiet_enter:
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&amp;reserved=0
>>> 
>> Yeah, that should be fine.
>> 
>>> Would you be up for submitting them for upstreaming? I would prefer
>>> if
>>> you send out the patches because I cannot claim to understand them
>>> completely (except for the one doing renaming :-P )
>>> 
>> Haha! So, I am up for properly submitting, but there's one problem. As
>> you've probably got, the idea here is to use transitions toward the
>> guest and inside the hypervisor as RCU quiescence and "activation"
>> points.
>> 
>> Now, on ARM, that just meant calling rcu_quiet_exit() in
>> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
>> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
>> definitely not an ARM person-- cloud understand it (although with some
>> help from Julien) and put the patches together.
>> 
>> However, the problem is really arch independent and, despite not
>> surfacing equally frequently, it affects x86 as well. And for x86 the
>> situation is by far not equally nice, when it comes to identifying all
>> the places from where to call rcu_quiet_{enter,exit}().
>> 
>> And finding out where to put them, among the various functions that we
>> have in the various entry.S variants is where I stopped. The plan was
>> to get back to it, but as shamefully as it sounds, I could not do that
>> yet.
>> 
>> So, if anyone wants to help with this, handing over suggestions for
>> potential good spots, that would help a lot.
> 
> Unfortunately I cannot volunteer due to time and also because I wouldn't
> know where to look and I don't have a reproducer or a test environment
> on x86. I would be flying blind.
> 
> 
>> Alternatively, we can submit the series as ARM-only... But I fear that
>> the x86 side of things would then be easily forgotten. :-(
> 
> I agree with you on this, but at the same time we are having problems
> with customers in the field -- it is not like we can wait to solve the
> problem on ARM any longer. And the issue is certainly far less likely to
> happen on x86 (there is no vwfi=native, right?) In other words, I think
> it is better to have half of the solution now to solve the worst part of
> the problem than to wait more months for a full solution.

An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody has asked for it yet.  I agree that we need to fix if for ARM, and so in the absence of someone with the time to fix up the x86 side, I think fixing ARM-only is the way to go.

It would be good if we could add appropriate comments warning anyone who implements `hlt=native` on x86 the problems they’ll face and how to fix them.  Not sure the best place to do that; in the VMX / SVM code that sets the exit for HLT &c?

 -George

Re: sched=null vwfi=native and call_rcu()

Posted by Juergen Gross 1 week ago
On 17.01.22 12:05, George Dunlap wrote:
> 
> 
>> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>
>> On Fri, 14 Jan 2022, Dario Faggioli wrote:
>>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
>>>> On Thu, 6 Jan 2022, Julien Grall wrote:
>>>>>
>>>>> This issue and solution were discussed numerous time on the ML. In
>>>>> short, we
>>>>> want to tell the RCU that CPU running in guest context are always
>>>>> quiesced.
>>>>> For more details, you can read the previous thread (which also
>>>>> contains a link
>>>>> to the one before):
>>>>>
>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&amp;reserved=0
>>>>
>>>> Thanks Julien for the pointer!
>>>>
>>>> Dario, I forward-ported your three patches to staging:
>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&amp;reserved=0
>>>>
>>> Hi Stefano!
>>>
>>> I definitely would like to see the end of this issue, so thanks a lot
>>> for your interest and your help with the patches.
>>>
>>>> I can confirm that they fix the bug. Note that I had to add a small
>>>> change on top to remove the ASSERT at the beginning of
>>>> rcu_quiet_enter:
>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&amp;reserved=0
>>>>
>>> Yeah, that should be fine.
>>>
>>>> Would you be up for submitting them for upstreaming? I would prefer
>>>> if
>>>> you send out the patches because I cannot claim to understand them
>>>> completely (except for the one doing renaming :-P )
>>>>
>>> Haha! So, I am up for properly submitting, but there's one problem. As
>>> you've probably got, the idea here is to use transitions toward the
>>> guest and inside the hypervisor as RCU quiescence and "activation"
>>> points.
>>>
>>> Now, on ARM, that just meant calling rcu_quiet_exit() in
>>> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
>>> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
>>> definitely not an ARM person-- cloud understand it (although with some
>>> help from Julien) and put the patches together.
>>>
>>> However, the problem is really arch independent and, despite not
>>> surfacing equally frequently, it affects x86 as well. And for x86 the
>>> situation is by far not equally nice, when it comes to identifying all
>>> the places from where to call rcu_quiet_{enter,exit}().
>>>
>>> And finding out where to put them, among the various functions that we
>>> have in the various entry.S variants is where I stopped. The plan was
>>> to get back to it, but as shamefully as it sounds, I could not do that
>>> yet.
>>>
>>> So, if anyone wants to help with this, handing over suggestions for
>>> potential good spots, that would help a lot.
>>
>> Unfortunately I cannot volunteer due to time and also because I wouldn't
>> know where to look and I don't have a reproducer or a test environment
>> on x86. I would be flying blind.
>>
>>
>>> Alternatively, we can submit the series as ARM-only... But I fear that
>>> the x86 side of things would then be easily forgotten. :-(
>>
>> I agree with you on this, but at the same time we are having problems
>> with customers in the field -- it is not like we can wait to solve the
>> problem on ARM any longer. And the issue is certainly far less likely to
>> happen on x86 (there is no vwfi=native, right?) In other words, I think
>> it is better to have half of the solution now to solve the worst part of
>> the problem than to wait more months for a full solution.
> 
> An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody has asked for it yet.  I agree that we need to fix if for ARM, and so in the absence of someone with the time to fix up the x86 side, I think fixing ARM-only is the way to go.
> 
> It would be good if we could add appropriate comments warning anyone who implements `hlt=native` on x86 the problems they’ll face and how to fix them.  Not sure the best place to do that; in the VMX / SVM code that sets the exit for HLT &c?

But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer
from the same problem?

And wouldn't that be a problem for PV guests, too?


Juergen

Re: sched=null vwfi=native and call_rcu()

Posted by Dario Faggioli 6 days, 11 hours ago
On Mon, 2022-01-17 at 12:13 +0100, Juergen Gross wrote:
> On 17.01.22 12:05, George Dunlap wrote:
> > 
> > An x86 equivalent of vwfi=native could be implemented easily, but
> > AFAIK nobody has asked for it yet.  I agree that we need to fix if
> > for ARM, and so in the absence of someone with the time to fix up
> > the x86 side, I think fixing ARM-only is the way to go.
> > 
> > It would be good if we could add appropriate comments warning
> > anyone who implements `hlt=native` on x86 the problems they’ll face
> > and how to fix them.  Not sure the best place to do that; in the
> > VMX / SVM code that sets the exit for HLT &c?
> 
> But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer
> from the same problem?
> 
Right, and even more 'idle=poll' as a _guest_ kernel command line
parameter, IMO.

That does not change what happens when the guest issue an HLT, but it
drastically reduces the frequency of it doing so (or at least, it did
the last time I tried it).

So it's not exactly like wfi=native on ARM, but on the other hand, it
can be under the guest's control.

> And wouldn't that be a problem for PV guests, too?
> 
Yeah, that's one of the things that makes it tricky

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: sched=null vwfi=native and call_rcu()

Posted by Julien Grall 1 week ago
Hi,

On 17/01/2022 15:13, Juergen Gross wrote:
> On 17.01.22 12:05, George Dunlap wrote:
>>> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini 
>>> <sstabellini@kernel.org> wrote:
>>> On Fri, 14 Jan 2022, Dario Faggioli wrote:
>>>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
>>>>> On Thu, 6 Jan 2022, Julien Grall wrote:
>>>> Alternatively, we can submit the series as ARM-only... But I fear that
>>>> the x86 side of things would then be easily forgotten. :-(
>>>
>>> I agree with you on this, but at the same time we are having problems
>>> with customers in the field -- it is not like we can wait to solve the
>>> problem on ARM any longer. And the issue is certainly far less likely to
>>> happen on x86 (there is no vwfi=native, right?) In other words, I think
>>> it is better to have half of the solution now to solve the worst part of
>>> the problem than to wait more months for a full solution.

Well, it all depends on how your guest OS works A "malicious" guest that 
will configure the vCPU to busy loop without wfi will result to the same 
problem (this is one of the reason why NULL scheduler is not security 
supported).

>>
>> An x86 equivalent of vwfi=native could be implemented easily, but 
>> AFAIK nobody has asked for it yet.  I agree that we need to fix if for 
>> ARM, and so in the absence of someone with the time to fix up the x86 
>> side, I think fixing ARM-only is the way to go.
>>
>> It would be good if we could add appropriate comments warning anyone 
>> who implements `hlt=native` on x86 the problems they’ll face and how 
>> to fix them.  Not sure the best place to do that; in the VMX / SVM 
>> code that sets the exit for HLT &c?
> 
> But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer
> from the same problem?

This is not a problem on x86 because there will always an exit to the 
hypervisor a timed interval (IIRC for some rendezvous?). On Arm, using 
the NULL scheduler will result to a completely tickless hypervisor.

Cheers,

-- 
Julien Grall