Hi all,
As you might remember, we have an outstanding issue with call_rcu() when
sched=null vwfi=native are used. That is because in that configuration
the CPU never goes idle so rcu_idle_enter() never gets called.
The issue was caught on the field and I managed to repro the problem
doing the following:
xl destroy test
xl create ./test.cfg
Resulting in the following error:
# Parsing config from ./test.cfg
# (XEN) IRQ 54 is already used by domain 1
The test domain has 3 interrupts remapped to it and they don't get
released before the new domain creation is requested.
Just FYI, the below hacky patch seems to reliably work-around the
problem in my environment.
Do you have any suggestions on what would be the right way to solve
this issue?
Cheers,
Stefano
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3de..841a5cb3c9 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -289,6 +289,9 @@ void call_rcu(struct rcu_head *head,
force_quiescent_state(rdp, &rcu_ctrlblk);
}
local_irq_restore(flags);
+ /* make sure that the CPU has a chance to check RCUs */
+ set_timer(&rdp->idle_timer, NOW() + SECONDS(1));
+ rdp->idle_timer_active = true;
}
/*
On 06/01/2022 00:40, Stefano Stabellini wrote: > Hi all, Hi, > As you might remember, we have an outstanding issue with call_rcu() when > sched=null vwfi=native are used. That is because in that configuration > the CPU never goes idle so rcu_idle_enter() never gets called. > > The issue was caught on the field and I managed to repro the problem > doing the following: > > xl destroy test > xl create ./test.cfg > > Resulting in the following error: > > # Parsing config from ./test.cfg > # (XEN) IRQ 54 is already used by domain 1 > > The test domain has 3 interrupts remapped to it and they don't get > released before the new domain creation is requested. > > Just FYI, the below hacky patch seems to reliably work-around the > problem in my environment. > > Do you have any suggestions on what would be the right way to solve > this issue? This issue and solution were discussed numerous time on the ML. In short, we want to tell the RCU that CPU running in guest context are always quiesced. For more details, you can read the previous thread (which also contains a link to the one before): https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/ Cheers, -- Julien Grall
On Thu, 6 Jan 2022, Julien Grall wrote: > On 06/01/2022 00:40, Stefano Stabellini wrote: > > As you might remember, we have an outstanding issue with call_rcu() when > > sched=null vwfi=native are used. That is because in that configuration > > the CPU never goes idle so rcu_idle_enter() never gets called. > > > > The issue was caught on the field and I managed to repro the problem > > doing the following: > > > > xl destroy test > > xl create ./test.cfg > > > > Resulting in the following error: > > > > # Parsing config from ./test.cfg > > # (XEN) IRQ 54 is already used by domain 1 > > > > The test domain has 3 interrupts remapped to it and they don't get > > released before the new domain creation is requested. > > > > Just FYI, the below hacky patch seems to reliably work-around the > > problem in my environment. > > > > Do you have any suggestions on what would be the right way to solve > > this issue? > > This issue and solution were discussed numerous time on the ML. In short, we > want to tell the RCU that CPU running in guest context are always quiesced. > For more details, you can read the previous thread (which also contains a link > to the one before): > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/ Thanks Julien for the pointer! Dario, I forward-ported your three patches to staging: https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet I can confirm that they fix the bug. Note that I had to add a small change on top to remove the ASSERT at the beginning of rcu_quiet_enter: https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9 Would you be up for submitting them for upstreaming? I would prefer if you send out the patches because I cannot claim to understand them completely (except for the one doing renaming :-P ) I am also attaching the four patches for your convenience.
On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: > On Thu, 6 Jan 2022, Julien Grall wrote: > > > > This issue and solution were discussed numerous time on the ML. In > > short, we > > want to tell the RCU that CPU running in guest context are always > > quiesced. > > For more details, you can read the previous thread (which also > > contains a link > > to the one before): > > > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/ > > Thanks Julien for the pointer! > > Dario, I forward-ported your three patches to staging: > https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet > Hi Stefano! I definitely would like to see the end of this issue, so thanks a lot for your interest and your help with the patches. > I can confirm that they fix the bug. Note that I had to add a small > change on top to remove the ASSERT at the beginning of > rcu_quiet_enter: > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9 > Yeah, that should be fine. > Would you be up for submitting them for upstreaming? I would prefer > if > you send out the patches because I cannot claim to understand them > completely (except for the one doing renaming :-P ) > Haha! So, I am up for properly submitting, but there's one problem. As you've probably got, the idea here is to use transitions toward the guest and inside the hypervisor as RCU quiescence and "activation" points. Now, on ARM, that just meant calling rcu_quiet_exit() in enter_hypervisor_from_guest() and calling rcu_quiet_enter() in leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm definitely not an ARM person-- cloud understand it (although with some help from Julien) and put the patches together. However, the problem is really arch independent and, despite not surfacing equally frequently, it affects x86 as well. And for x86 the situation is by far not equally nice, when it comes to identifying all the places from where to call rcu_quiet_{enter,exit}(). And finding out where to put them, among the various functions that we have in the various entry.S variants is where I stopped. The plan was to get back to it, but as shamefully as it sounds, I could not do that yet. So, if anyone wants to help with this, handing over suggestions for potential good spots, that would help a lot. Alternatively, we can submit the series as ARM-only... But I fear that the x86 side of things would then be easily forgotten. :-( Thanks again and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
On Fri, 14 Jan 2022, Dario Faggioli wrote: > On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: > > On Thu, 6 Jan 2022, Julien Grall wrote: > > > > > > This issue and solution were discussed numerous time on the ML. In > > > short, we > > > want to tell the RCU that CPU running in guest context are always > > > quiesced. > > > For more details, you can read the previous thread (which also > > > contains a link > > > to the one before): > > > > > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@codiax.se/ > > > > Thanks Julien for the pointer! > > > > Dario, I forward-ported your three patches to staging: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet > > > Hi Stefano! > > I definitely would like to see the end of this issue, so thanks a lot > for your interest and your help with the patches. > > > I can confirm that they fix the bug. Note that I had to add a small > > change on top to remove the ASSERT at the beginning of > > rcu_quiet_enter: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9 > > > Yeah, that should be fine. > > > Would you be up for submitting them for upstreaming? I would prefer > > if > > you send out the patches because I cannot claim to understand them > > completely (except for the one doing renaming :-P ) > > > Haha! So, I am up for properly submitting, but there's one problem. As > you've probably got, the idea here is to use transitions toward the > guest and inside the hypervisor as RCU quiescence and "activation" > points. > > Now, on ARM, that just meant calling rcu_quiet_exit() in > enter_hypervisor_from_guest() and calling rcu_quiet_enter() in > leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm > definitely not an ARM person-- cloud understand it (although with some > help from Julien) and put the patches together. > > However, the problem is really arch independent and, despite not > surfacing equally frequently, it affects x86 as well. And for x86 the > situation is by far not equally nice, when it comes to identifying all > the places from where to call rcu_quiet_{enter,exit}(). > > And finding out where to put them, among the various functions that we > have in the various entry.S variants is where I stopped. The plan was > to get back to it, but as shamefully as it sounds, I could not do that > yet. > > So, if anyone wants to help with this, handing over suggestions for > potential good spots, that would help a lot. Unfortunately I cannot volunteer due to time and also because I wouldn't know where to look and I don't have a reproducer or a test environment on x86. I would be flying blind. > Alternatively, we can submit the series as ARM-only... But I fear that > the x86 side of things would then be easily forgotten. :-( I agree with you on this, but at the same time we are having problems with customers in the field -- it is not like we can wait to solve the problem on ARM any longer. And the issue is certainly far less likely to happen on x86 (there is no vwfi=native, right?) In other words, I think it is better to have half of the solution now to solve the worst part of the problem than to wait more months for a full solution.
> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Fri, 14 Jan 2022, Dario Faggioli wrote: >> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: >>> On Thu, 6 Jan 2022, Julien Grall wrote: >>>> >>>> This issue and solution were discussed numerous time on the ML. In >>>> short, we >>>> want to tell the RCU that CPU running in guest context are always >>>> quiesced. >>>> For more details, you can read the previous thread (which also >>>> contains a link >>>> to the one before): >>>> >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&reserved=0 >>> >>> Thanks Julien for the pointer! >>> >>> Dario, I forward-ported your three patches to staging: >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&reserved=0 >>> >> Hi Stefano! >> >> I definitely would like to see the end of this issue, so thanks a lot >> for your interest and your help with the patches. >> >>> I can confirm that they fix the bug. Note that I had to add a small >>> change on top to remove the ASSERT at the beginning of >>> rcu_quiet_enter: >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&reserved=0 >>> >> Yeah, that should be fine. >> >>> Would you be up for submitting them for upstreaming? I would prefer >>> if >>> you send out the patches because I cannot claim to understand them >>> completely (except for the one doing renaming :-P ) >>> >> Haha! So, I am up for properly submitting, but there's one problem. As >> you've probably got, the idea here is to use transitions toward the >> guest and inside the hypervisor as RCU quiescence and "activation" >> points. >> >> Now, on ARM, that just meant calling rcu_quiet_exit() in >> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in >> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm >> definitely not an ARM person-- cloud understand it (although with some >> help from Julien) and put the patches together. >> >> However, the problem is really arch independent and, despite not >> surfacing equally frequently, it affects x86 as well. And for x86 the >> situation is by far not equally nice, when it comes to identifying all >> the places from where to call rcu_quiet_{enter,exit}(). >> >> And finding out where to put them, among the various functions that we >> have in the various entry.S variants is where I stopped. The plan was >> to get back to it, but as shamefully as it sounds, I could not do that >> yet. >> >> So, if anyone wants to help with this, handing over suggestions for >> potential good spots, that would help a lot. > > Unfortunately I cannot volunteer due to time and also because I wouldn't > know where to look and I don't have a reproducer or a test environment > on x86. I would be flying blind. > > >> Alternatively, we can submit the series as ARM-only... But I fear that >> the x86 side of things would then be easily forgotten. :-( > > I agree with you on this, but at the same time we are having problems > with customers in the field -- it is not like we can wait to solve the > problem on ARM any longer. And the issue is certainly far less likely to > happen on x86 (there is no vwfi=native, right?) In other words, I think > it is better to have half of the solution now to solve the worst part of > the problem than to wait more months for a full solution. An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody has asked for it yet. I agree that we need to fix if for ARM, and so in the absence of someone with the time to fix up the x86 side, I think fixing ARM-only is the way to go. It would be good if we could add appropriate comments warning anyone who implements `hlt=native` on x86 the problems they’ll face and how to fix them. Not sure the best place to do that; in the VMX / SVM code that sets the exit for HLT &c? -George
On 17.01.22 12:05, George Dunlap wrote: > > >> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabellini@kernel.org> wrote: >> >> On Fri, 14 Jan 2022, Dario Faggioli wrote: >>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: >>>> On Thu, 6 Jan 2022, Julien Grall wrote: >>>>> >>>>> This issue and solution were discussed numerous time on the ML. In >>>>> short, we >>>>> want to tell the RCU that CPU running in guest context are always >>>>> quiesced. >>>>> For more details, you can read the previous thread (which also >>>>> contains a link >>>>> to the one before): >>>>> >>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&reserved=0 >>>> >>>> Thanks Julien for the pointer! >>>> >>>> Dario, I forward-ported your three patches to staging: >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&reserved=0 >>>> >>> Hi Stefano! >>> >>> I definitely would like to see the end of this issue, so thanks a lot >>> for your interest and your help with the patches. >>> >>>> I can confirm that they fix the bug. Note that I had to add a small >>>> change on top to remove the ASSERT at the beginning of >>>> rcu_quiet_enter: >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&reserved=0 >>>> >>> Yeah, that should be fine. >>> >>>> Would you be up for submitting them for upstreaming? I would prefer >>>> if >>>> you send out the patches because I cannot claim to understand them >>>> completely (except for the one doing renaming :-P ) >>>> >>> Haha! So, I am up for properly submitting, but there's one problem. As >>> you've probably got, the idea here is to use transitions toward the >>> guest and inside the hypervisor as RCU quiescence and "activation" >>> points. >>> >>> Now, on ARM, that just meant calling rcu_quiet_exit() in >>> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in >>> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm >>> definitely not an ARM person-- cloud understand it (although with some >>> help from Julien) and put the patches together. >>> >>> However, the problem is really arch independent and, despite not >>> surfacing equally frequently, it affects x86 as well. And for x86 the >>> situation is by far not equally nice, when it comes to identifying all >>> the places from where to call rcu_quiet_{enter,exit}(). >>> >>> And finding out where to put them, among the various functions that we >>> have in the various entry.S variants is where I stopped. The plan was >>> to get back to it, but as shamefully as it sounds, I could not do that >>> yet. >>> >>> So, if anyone wants to help with this, handing over suggestions for >>> potential good spots, that would help a lot. >> >> Unfortunately I cannot volunteer due to time and also because I wouldn't >> know where to look and I don't have a reproducer or a test environment >> on x86. I would be flying blind. >> >> >>> Alternatively, we can submit the series as ARM-only... But I fear that >>> the x86 side of things would then be easily forgotten. :-( >> >> I agree with you on this, but at the same time we are having problems >> with customers in the field -- it is not like we can wait to solve the >> problem on ARM any longer. And the issue is certainly far less likely to >> happen on x86 (there is no vwfi=native, right?) In other words, I think >> it is better to have half of the solution now to solve the worst part of >> the problem than to wait more months for a full solution. > > An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody has asked for it yet. I agree that we need to fix if for ARM, and so in the absence of someone with the time to fix up the x86 side, I think fixing ARM-only is the way to go. > > It would be good if we could add appropriate comments warning anyone who implements `hlt=native` on x86 the problems they’ll face and how to fix them. Not sure the best place to do that; in the VMX / SVM code that sets the exit for HLT &c? But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer from the same problem? And wouldn't that be a problem for PV guests, too? Juergen
Hi, On 17/01/2022 15:13, Juergen Gross wrote: > On 17.01.22 12:05, George Dunlap wrote: >>> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini >>> <sstabellini@kernel.org> wrote: >>> On Fri, 14 Jan 2022, Dario Faggioli wrote: >>>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: >>>>> On Thu, 6 Jan 2022, Julien Grall wrote: >>>> Alternatively, we can submit the series as ARM-only... But I fear that >>>> the x86 side of things would then be easily forgotten. :-( >>> >>> I agree with you on this, but at the same time we are having problems >>> with customers in the field -- it is not like we can wait to solve the >>> problem on ARM any longer. And the issue is certainly far less likely to >>> happen on x86 (there is no vwfi=native, right?) In other words, I think >>> it is better to have half of the solution now to solve the worst part of >>> the problem than to wait more months for a full solution. Well, it all depends on how your guest OS works A "malicious" guest that will configure the vCPU to busy loop without wfi will result to the same problem (this is one of the reason why NULL scheduler is not security supported). >> >> An x86 equivalent of vwfi=native could be implemented easily, but >> AFAIK nobody has asked for it yet. I agree that we need to fix if for >> ARM, and so in the absence of someone with the time to fix up the x86 >> side, I think fixing ARM-only is the way to go. >> >> It would be good if we could add appropriate comments warning anyone >> who implements `hlt=native` on x86 the problems they’ll face and how >> to fix them. Not sure the best place to do that; in the VMX / SVM >> code that sets the exit for HLT &c? > > But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer > from the same problem? This is not a problem on x86 because there will always an exit to the hypervisor a timed interval (IIRC for some rendezvous?). On Arm, using the NULL scheduler will result to a completely tickless hypervisor. Cheers, -- Julien Grall
On Mon, 2022-01-17 at 12:13 +0100, Juergen Gross wrote: > On 17.01.22 12:05, George Dunlap wrote: > > > > An x86 equivalent of vwfi=native could be implemented easily, but > > AFAIK nobody has asked for it yet. I agree that we need to fix if > > for ARM, and so in the absence of someone with the time to fix up > > the x86 side, I think fixing ARM-only is the way to go. > > > > It would be good if we could add appropriate comments warning > > anyone who implements `hlt=native` on x86 the problems they’ll face > > and how to fix them. Not sure the best place to do that; in the > > VMX / SVM code that sets the exit for HLT &c? > > But wouldn't a guest in a busy loop on x86 with NULL scheduler suffer > from the same problem? > Right, and even more 'idle=poll' as a _guest_ kernel command line parameter, IMO. That does not change what happens when the guest issue an HLT, but it drastically reduces the frequency of it doing so (or at least, it did the last time I tried it). So it's not exactly like wfi=native on ARM, but on the other hand, it can be under the guest's control. > And wouldn't that be a problem for PV guests, too? > Yeah, that's one of the things that makes it tricky Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
© 2016 - 2024 Red Hat, Inc.