arch/s390/kvm/interrupt.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
From: Thomas Huth <thuth@redhat.com>
When you run a KVM guest with vhost-net and migrate that guest to
another host, and you immediately enable postcopy after starting the
migration, there is a big chance that the network connection of the
guest won't work anymore on the destination side after the migration.
With a debug kernel v6.16.0, there is also a call trace that looks
like this:
FAULT_FLAG_ALLOW_RETRY missing 881
CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE
Hardware name: IBM 3931 LA1 400 (LPAR)
Workqueue: events irqfd_inject [kvm]
Call Trace:
[<00003173cbecc634>] dump_stack_lvl+0x104/0x168
[<00003173cca69588>] handle_userfault+0xde8/0x1310
[<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760
[<00003173cc759212>] __handle_mm_fault+0x452/0xa00
[<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0
[<00003173cc73409a>] __get_user_pages+0x4aa/0xba0
[<00003173cc7349e8>] get_user_pages_remote+0x258/0x770
[<000031734be6f052>] get_map_page+0xe2/0x190 [kvm]
[<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm]
[<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm]
[<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm]
[<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm]
[<00003173cc00c9ec>] process_one_work+0x87c/0x1490
[<00003173cc00dda6>] worker_thread+0x7a6/0x1010
[<00003173cc02dc36>] kthread+0x3b6/0x710
[<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0
[<00003173cdd737ca>] ret_from_fork+0xa/0x30
3 locks held by kworker/6:2/549:
#0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490
#1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490
#2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm]
The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd()
saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd
cannot remotely resolve it (because the caller was asking for an immediate
resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time).
With that, get_map_page() failed and the irq was lost.
We should not be strictly in an atomic environment here and the worker
should be sleepable (the call is done during an ioctl from userspace),
so we can allow adapter_indicators_set() to just sleep waiting for the
remote fault instead.
Link: https://issues.redhat.com/browse/RHEL-42486
Signed-off-by: Peter Xu <peterx@redhat.com>
[thuth: Assembled patch description and fixed some cosmetical issues]
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
Note: Instructions for reproducing the bug can be found in the ticket here:
https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116
arch/s390/kvm/interrupt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 60c360c18690f..dcce826ae9875 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
{
+ struct mm_struct *mm = kvm->mm;
struct page *page = NULL;
+ int locked = 1;
+
+ if (mmget_not_zero(mm)) {
+ mmap_read_lock(mm);
+ get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
+ &page, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ mmput(mm);
+ }
- mmap_read_lock(kvm->mm);
- get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL);
- mmap_read_unlock(kvm->mm);
return page;
}
--
2.50.1
CC Douglas, since Doug is looking into kvm_arch_set_irq_inatomic and this might have implications. Am 21.08.25 um 17:23 schrieb Thomas Huth: > From: Thomas Huth <thuth@redhat.com> > > When you run a KVM guest with vhost-net and migrate that guest to > another host, and you immediately enable postcopy after starting the > migration, there is a big chance that the network connection of the > guest won't work anymore on the destination side after the migration. > > With a debug kernel v6.16.0, there is also a call trace that looks > like this: > > FAULT_FLAG_ALLOW_RETRY missing 881 > CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE > Hardware name: IBM 3931 LA1 400 (LPAR) > Workqueue: events irqfd_inject [kvm] > Call Trace: > [<00003173cbecc634>] dump_stack_lvl+0x104/0x168 > [<00003173cca69588>] handle_userfault+0xde8/0x1310 > [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760 > [<00003173cc759212>] __handle_mm_fault+0x452/0xa00 > [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0 > [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0 > [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770 > [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm] > [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm] > [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm] > [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm] > [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm] > [<00003173cc00c9ec>] process_one_work+0x87c/0x1490 > [<00003173cc00dda6>] worker_thread+0x7a6/0x1010 > [<00003173cc02dc36>] kthread+0x3b6/0x710 > [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0 > [<00003173cdd737ca>] ret_from_fork+0xa/0x30 > 3 locks held by kworker/6:2/549: > #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490 > #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490 > #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm] > > The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd() > saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd > cannot remotely resolve it (because the caller was asking for an immediate > resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time). > With that, get_map_page() failed and the irq was lost. > > We should not be strictly in an atomic environment here and the worker > should be sleepable (the call is done during an ioctl from userspace), > so we can allow adapter_indicators_set() to just sleep waiting for the > remote fault instead. > > Link: https://issues.redhat.com/browse/RHEL-42486 > Signed-off-by: Peter Xu <peterx@redhat.com> > [thuth: Assembled patch description and fixed some cosmetical issues] > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > Note: Instructions for reproducing the bug can be found in the ticket here: > https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116 > > arch/s390/kvm/interrupt.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c > index 60c360c18690f..dcce826ae9875 100644 > --- a/arch/s390/kvm/interrupt.c > +++ b/arch/s390/kvm/interrupt.c > @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap) > > static struct page *get_map_page(struct kvm *kvm, u64 uaddr) > { > + struct mm_struct *mm = kvm->mm; > struct page *page = NULL; > + int locked = 1; > + > + if (mmget_not_zero(mm)) { > + mmap_read_lock(mm); > + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE, > + &page, &locked); > + if (locked) > + mmap_read_unlock(mm); > + mmput(mm); > + } > > - mmap_read_lock(kvm->mm); > - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE, > - &page, NULL); > - mmap_read_unlock(kvm->mm); > return page; > } >
On 8/21/25 5:23 PM, Thomas Huth wrote: > From: Thomas Huth <thuth@redhat.com> > > When you run a KVM guest with vhost-net and migrate that guest to > another host, and you immediately enable postcopy after starting the > migration, there is a big chance that the network connection of the > guest won't work anymore on the destination side after the migration. Do we want to add this? Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages")
On 26/08/2025 13.43, Janosch Frank wrote: > On 8/21/25 5:23 PM, Thomas Huth wrote: >> From: Thomas Huth <thuth@redhat.com> >> >> When you run a KVM guest with vhost-net and migrate that guest to >> another host, and you immediately enable postcopy after starting the >> migration, there is a big chance that the network connection of the >> guest won't work anymore on the destination side after the migration. > > Do we want to add this? > > Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages") Yes, that sounds like a good idea, please add it when picking up the patch! Thanks, Thomas
On Thu, 21 Aug 2025 17:23:09 +0200 Thomas Huth <thuth@redhat.com> wrote: > From: Thomas Huth <thuth@redhat.com> > > When you run a KVM guest with vhost-net and migrate that guest to > another host, and you immediately enable postcopy after starting the > migration, there is a big chance that the network connection of the > guest won't work anymore on the destination side after the migration. > > With a debug kernel v6.16.0, there is also a call trace that looks > like this: > > FAULT_FLAG_ALLOW_RETRY missing 881 > CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE > Hardware name: IBM 3931 LA1 400 (LPAR) > Workqueue: events irqfd_inject [kvm] > Call Trace: > [<00003173cbecc634>] dump_stack_lvl+0x104/0x168 > [<00003173cca69588>] handle_userfault+0xde8/0x1310 > [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760 > [<00003173cc759212>] __handle_mm_fault+0x452/0xa00 > [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0 > [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0 > [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770 > [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm] > [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm] > [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm] > [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm] > [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm] > [<00003173cc00c9ec>] process_one_work+0x87c/0x1490 > [<00003173cc00dda6>] worker_thread+0x7a6/0x1010 > [<00003173cc02dc36>] kthread+0x3b6/0x710 > [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0 > [<00003173cdd737ca>] ret_from_fork+0xa/0x30 > 3 locks held by kworker/6:2/549: > #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490 > #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490 > #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm] > > The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd() > saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd > cannot remotely resolve it (because the caller was asking for an immediate > resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time). > With that, get_map_page() failed and the irq was lost. > > We should not be strictly in an atomic environment here and the worker > should be sleepable (the call is done during an ioctl from userspace), > so we can allow adapter_indicators_set() to just sleep waiting for the > remote fault instead. > > Link: https://issues.redhat.com/browse/RHEL-42486 > Signed-off-by: Peter Xu <peterx@redhat.com> > [thuth: Assembled patch description and fixed some cosmetical issues] > Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > --- > Note: Instructions for reproducing the bug can be found in the ticket here: > https://issues.redhat.com/browse/RHEL-42486?focusedId=26661116#comment-26661116 > > arch/s390/kvm/interrupt.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c > index 60c360c18690f..dcce826ae9875 100644 > --- a/arch/s390/kvm/interrupt.c > +++ b/arch/s390/kvm/interrupt.c > @@ -2777,12 +2777,19 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap) > > static struct page *get_map_page(struct kvm *kvm, u64 uaddr) > { > + struct mm_struct *mm = kvm->mm; > struct page *page = NULL; > + int locked = 1; > + > + if (mmget_not_zero(mm)) { > + mmap_read_lock(mm); > + get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE, > + &page, &locked); > + if (locked) > + mmap_read_unlock(mm); > + mmput(mm); > + } > > - mmap_read_lock(kvm->mm); > - get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE, > - &page, NULL); > - mmap_read_unlock(kvm->mm); > return page; > } >
On 8/21/25 5:23 PM, Thomas Huth wrote: > From: Thomas Huth <thuth@redhat.com> > > When you run a KVM guest with vhost-net and migrate that guest to > another host, and you immediately enable postcopy after starting the > migration, there is a big chance that the network connection of the > guest won't work anymore on the destination side after the migration. > > With a debug kernel v6.16.0, there is also a call trace that looks > like this: > > FAULT_FLAG_ALLOW_RETRY missing 881 > CPU: 6 UID: 0 PID: 549 Comm: kworker/6:2 Kdump: loaded Not tainted 6.16.0 #56 NONE > Hardware name: IBM 3931 LA1 400 (LPAR) > Workqueue: events irqfd_inject [kvm] > Call Trace: > [<00003173cbecc634>] dump_stack_lvl+0x104/0x168 > [<00003173cca69588>] handle_userfault+0xde8/0x1310 > [<00003173cc756f0c>] handle_pte_fault+0x4fc/0x760 > [<00003173cc759212>] __handle_mm_fault+0x452/0xa00 > [<00003173cc7599ba>] handle_mm_fault+0x1fa/0x6a0 > [<00003173cc73409a>] __get_user_pages+0x4aa/0xba0 > [<00003173cc7349e8>] get_user_pages_remote+0x258/0x770 > [<000031734be6f052>] get_map_page+0xe2/0x190 [kvm] > [<000031734be6f910>] adapter_indicators_set+0x50/0x4a0 [kvm] > [<000031734be7f674>] set_adapter_int+0xc4/0x170 [kvm] > [<000031734be2f268>] kvm_set_irq+0x228/0x3f0 [kvm] > [<000031734be27000>] irqfd_inject+0xd0/0x150 [kvm] > [<00003173cc00c9ec>] process_one_work+0x87c/0x1490 > [<00003173cc00dda6>] worker_thread+0x7a6/0x1010 > [<00003173cc02dc36>] kthread+0x3b6/0x710 > [<00003173cbed2f0c>] __ret_from_fork+0xdc/0x7f0 > [<00003173cdd737ca>] ret_from_fork+0xa/0x30 > 3 locks held by kworker/6:2/549: > #0: 00000000800bc958 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ee/0x1490 > #1: 000030f3d527fbd0 ((work_completion)(&irqfd->inject)){+.+.}-{0:0}, at: process_one_work+0x81c/0x1490 > #2: 00000000f99862b0 (&mm->mmap_lock){++++}-{3:3}, at: get_map_page+0xa8/0x190 [kvm] > > The "FAULT_FLAG_ALLOW_RETRY missing" indicates that handle_userfaultfd() > saw a page fault request without ALLOW_RETRY flag set, hence userfaultfd > cannot remotely resolve it (because the caller was asking for an immediate > resolution, aka, FAULT_FLAG_NOWAIT, while remote faults can take time). > With that, get_map_page() failed and the irq was lost. > > We should not be strictly in an atomic environment here and the worker > should be sleepable (the call is done during an ioctl from userspace), > so we can allow adapter_indicators_set() to just sleep waiting for the > remote fault instead. > > Link: https://issues.redhat.com/browse/RHEL-42486 > Signed-off-by: Peter Xu <peterx@redhat.com> > [thuth: Assembled patch description and fixed some cosmetical issues] > Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Janosch Frank <frankja@linux.ibm.com>
© 2016 - 2025 Red Hat, Inc.