[Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Juergen Gross posted 8 patches 32 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200213125449.14226-1-jgross@suse.com
docs/misc/xen-command-line.pandoc        |  9 ++++++
xen/arch/x86/domain.c                    |  9 ++++--
xen/arch/x86/io_apic.c                   | 53 ++++++++++++++++++++++++--------
xen/arch/x86/irq.c                       |  5 ++-
xen/arch/x86/mm/p2m-ept.c                |  4 +++
xen/arch/x86/msi.c                       |  4 ++-
xen/arch/x86/numa.c                      | 16 ++++++----
xen/arch/x86/time.c                      |  5 +++
xen/common/event_channel.c               |  3 +-
xen/common/grant_table.c                 | 39 +++++++++++++++++++++--
xen/common/keyhandler.c                  | 29 ++++++++++++++++-
xen/common/livepatch.c                   | 11 ++-----
xen/common/rangeset.c                    | 10 +++---
xen/common/sched/core.c                  |  7 +++++
xen/common/sched/cpupool.c               |  4 ++-
xen/common/sched/credit.c                | 31 ++++++++++++-------
xen/common/sched/credit2.c               | 20 +++++++-----
xen/common/sched/null.c                  | 50 +++++++++++++++++-------------
xen/common/sched/private.h               |  1 +
xen/common/sched/rt.c                    | 13 ++++----
xen/common/spinlock.c                    | 18 +++++++++--
xen/common/timer.c                       | 15 +++++----
xen/drivers/passthrough/amd/iommu_intr.c | 14 ++++++---
xen/drivers/passthrough/iommu.c          |  5 +++
xen/drivers/passthrough/pci.c            | 14 +++++++--
xen/drivers/vpci/msi.c                   |  5 ++-
xen/include/xen/keyhandler.h             | 26 ++++++++++++++++
xen/include/xen/rangeset.h               |  2 --
xen/include/xen/rwlock.h                 | 37 ++++++++++++++++++++++
29 files changed, 352 insertions(+), 107 deletions(-)

[Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Posted by Juergen Gross 32 weeks ago
Keyhandlers dumping hypervisor information to the console often need
to take locks while accessing data. In order to not block in case of
system inconsistencies it is convenient to use trylock variants when
obtaining the locks. On the other hand a busy system might easily
encounter held locks, so this patch series is adding special trylock
variants with a timeout used by keyhandlers.

Only the keyhandlers dumping out data are modified to use those new
lock variants. There might be still some potentially blocking locks
involved (e.g. the console lock for printing), but most critical
cases should be handled by this series.

The timeout per lock is 1 millisecond as default, the value can be
changed by boot and runtime parameter setting.

Patch 1: cleanup (can go in independently of all other patches)
Patch 2: fix in keyhandlers where domlist_read_lock was not taken when
         using for_each_domain() (can go in independently)
Patch 3: cleanup of lock handling in scheduling code (prerequisite
         patch for patch 5)
Patch 4: main infrastructure for trylocks with timeout plus adaption
         of keyhandlers directly in keyhandler.c
Patches 5-8: adaptions of other keyhandlers (split roughly by path)

Juergen Gross (8):
  xen: make rangeset_printk() static
  xen: add using domlist_read_lock in keyhandlers
  xen/sched: don't use irqsave locks in dumping functions
  xen: add locks with timeouts for keyhandlers
  xen/sched: use keyhandler locks when dumping data to console
  xen/common: use keyhandler locks when dumping data to console
  xen/drivers: use keyhandler locks when dumping data to console
  xen/x86: use keyhandler locks when dumping data to console

 docs/misc/xen-command-line.pandoc        |  9 ++++++
 xen/arch/x86/domain.c                    |  9 ++++--
 xen/arch/x86/io_apic.c                   | 53 ++++++++++++++++++++++++--------
 xen/arch/x86/irq.c                       |  5 ++-
 xen/arch/x86/mm/p2m-ept.c                |  4 +++
 xen/arch/x86/msi.c                       |  4 ++-
 xen/arch/x86/numa.c                      | 16 ++++++----
 xen/arch/x86/time.c                      |  5 +++
 xen/common/event_channel.c               |  3 +-
 xen/common/grant_table.c                 | 39 +++++++++++++++++++++--
 xen/common/keyhandler.c                  | 29 ++++++++++++++++-
 xen/common/livepatch.c                   | 11 ++-----
 xen/common/rangeset.c                    | 10 +++---
 xen/common/sched/core.c                  |  7 +++++
 xen/common/sched/cpupool.c               |  4 ++-
 xen/common/sched/credit.c                | 31 ++++++++++++-------
 xen/common/sched/credit2.c               | 20 +++++++-----
 xen/common/sched/null.c                  | 50 +++++++++++++++++-------------
 xen/common/sched/private.h               |  1 +
 xen/common/sched/rt.c                    | 13 ++++----
 xen/common/spinlock.c                    | 18 +++++++++--
 xen/common/timer.c                       | 15 +++++----
 xen/drivers/passthrough/amd/iommu_intr.c | 14 ++++++---
 xen/drivers/passthrough/iommu.c          |  5 +++
 xen/drivers/passthrough/pci.c            | 14 +++++++--
 xen/drivers/vpci/msi.c                   |  5 ++-
 xen/include/xen/keyhandler.h             | 26 ++++++++++++++++
 xen/include/xen/rangeset.h               |  2 --
 xen/include/xen/rwlock.h                 | 37 ++++++++++++++++++++++
 29 files changed, 352 insertions(+), 107 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Posted by Andrew Cooper 32 weeks ago
On 13/02/2020 12:54, Juergen Gross wrote:
> Keyhandlers dumping hypervisor information to the console often need
> to take locks while accessing data. In order to not block in case of
> system inconsistencies it is convenient to use trylock variants when
> obtaining the locks. On the other hand a busy system might easily
> encounter held locks, so this patch series is adding special trylock
> variants with a timeout used by keyhandlers.

This is a backwards step.

Keyhandlers are for debugging purposes.  When debugging it is far more 
important to get the requested data, than almost anything else.

The system will cope with a multi-second outage occurring approximately 
never.  A person debugging who can't get the data has no chance of 
fixing whatever problem they are looking for.

This series seems to be breaking the one critical usecase for 
keyhandlers, to fix what - not let debugging get in the way of the 
smooth running of the system?  A system in need of debugging in the 
first place has bigger problems than needing to run smoothly.

The only thing which should happen to improve system stability is for 
keyhandlers to disable the system watchdog while they are running, in 
case they happen to run for seconds of wallclock time. This is an issue 
which isn't addressed by the series, because once a keyhandler does get 
a lock, it keeps it until it is done.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Posted by Jan Beulich 32 weeks ago
On 13.02.2020 19:38, Andrew Cooper wrote:
> On 13/02/2020 12:54, Juergen Gross wrote:
>> Keyhandlers dumping hypervisor information to the console often need
>> to take locks while accessing data. In order to not block in case of
>> system inconsistencies it is convenient to use trylock variants when
>> obtaining the locks. On the other hand a busy system might easily
>> encounter held locks, so this patch series is adding special trylock
>> variants with a timeout used by keyhandlers.
> 
> This is a backwards step.
> 
> Keyhandlers are for debugging purposes.  When debugging it is far more 
> important to get the requested data, than almost anything else.
> 
> The system will cope with a multi-second outage occurring approximately 
> never.  A person debugging who can't get the data has no chance of 
> fixing whatever problem they are looking for.
> 
> This series seems to be breaking the one critical usecase for 
> keyhandlers, to fix what - not let debugging get in the way of the 
> smooth running of the system?  A system in need of debugging in the 
> first place has bigger problems than needing to run smoothly.

I certainly accept what you say further up, but I don't think this
last statement is universally true. There may be a single guest in
trouble, which - to find out about its state - some debugging keys
may want issuing. Disturbing the host and all other guests for this
is not a good idea imo.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Posted by Julien Grall 32 weeks ago
Hi Jan,

On 14/02/2020 09:37, Jan Beulich wrote:
> On 13.02.2020 19:38, Andrew Cooper wrote:
>> On 13/02/2020 12:54, Juergen Gross wrote:
>>> Keyhandlers dumping hypervisor information to the console often need
>>> to take locks while accessing data. In order to not block in case of
>>> system inconsistencies it is convenient to use trylock variants when
>>> obtaining the locks. On the other hand a busy system might easily
>>> encounter held locks, so this patch series is adding special trylock
>>> variants with a timeout used by keyhandlers.
>>
>> This is a backwards step.
>>
>> Keyhandlers are for debugging purposes.  When debugging it is far more
>> important to get the requested data, than almost anything else.
>>
>> The system will cope with a multi-second outage occurring approximately
>> never.  A person debugging who can't get the data has no chance of
>> fixing whatever problem they are looking for.
>>
>> This series seems to be breaking the one critical usecase for
>> keyhandlers, to fix what - not let debugging get in the way of the
>> smooth running of the system?  A system in need of debugging in the
>> first place has bigger problems than needing to run smoothly.
> 
> I certainly accept what you say further up, but I don't think this
> last statement is universally true. There may be a single guest in
> trouble, which - to find out about its state - some debugging keys
> may want issuing. Disturbing the host and all other guests for this
> is not a good idea imo.

This seems to suggest that you only want information for a single guest. 
Therefore using debugging keys was already a bad idea because it will 
disturb all the other guests.

For your setup, it might be worth considering to extend xenctx or 
introduce a way to dump information for a specific domain.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks

Posted by Jürgen Groß 32 weeks ago
On 13.02.20 19:38, Andrew Cooper wrote:
> On 13/02/2020 12:54, Juergen Gross wrote:
>> Keyhandlers dumping hypervisor information to the console often need
>> to take locks while accessing data. In order to not block in case of
>> system inconsistencies it is convenient to use trylock variants when
>> obtaining the locks. On the other hand a busy system might easily
>> encounter held locks, so this patch series is adding special trylock
>> variants with a timeout used by keyhandlers.
> 
> This is a backwards step.
> 
> Keyhandlers are for debugging purposes.  When debugging it is far more 
> important to get the requested data, than almost anything else.

Right.

> 
> The system will cope with a multi-second outage occurring approximately 
> never.  A person debugging who can't get the data has no chance of 
> fixing whatever problem they are looking for.

Right.

> This series seems to be breaking the one critical usecase for 
> keyhandlers, to fix what - not let debugging get in the way of the 
> smooth running of the system?  A system in need of debugging in the 
> first place has bigger problems than needing to run smoothly.

Okay, this warrants a longer default timeout.

A keyhandler blocking on a lock will produce exactly no further data,
and it will probably block other keyhandlers, too, due to hogging at
least one cpu completely.

With a longer lock timeout (1 second?) there is a much higher chance
that the keyhandler will finish its job producing more data than
without any timeout.

BTW, during development of my core scheduling series I was hit by that
problem multiple times. With the lock timeout I'd have spared dozens of
reboots.

> The only thing which should happen to improve system stability is for 
> keyhandlers to disable the system watchdog while they are running, in 
> case they happen to run for seconds of wallclock time. This is an issue 
> which isn't addressed by the series, because once a keyhandler does get 
> a lock, it keeps it until it is done.

Right, will add disabling the watchdog during keyhandler action.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel