[PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP

fangyu.yu@linux.alibaba.com posted 1 patch 3 months, 2 weeks ago
There is a newer version of this series
arch/riscv/kvm/mmu.c | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)
[PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by fangyu.yu@linux.alibaba.com 3 months, 2 weeks ago
From: Fangyu Yu <fangyu.yu@linux.alibaba.com>

As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
regions. Using vma->vm_pgoff to derive the HPA here may therefore
produce incorrect mappings.

Instead, I/O mappings for such regions can be established on-demand
during g-stage page faults, making the upfront ioremap in this path
is unnecessary.

Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
---
 arch/riscv/kvm/mmu.c | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 525fb5a330c0..84c04c8f0892 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	/*
 	 * A memory region could potentially cover multiple VMAs, and
-	 * any holes between them, so iterate over all of them to find
-	 * out if we can map any of them right now.
+	 * any holes between them, so iterate over all of them.
 	 *
 	 *     +--------------------------------------------+
 	 * +---------------+----------------+   +----------------+
@@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		vm_end = min(reg_end, vma->vm_end);
 
 		if (vma->vm_flags & VM_PFNMAP) {
-			gpa_t gpa = base_gpa + (vm_start - hva);
-			phys_addr_t pa;
-
-			pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
-			pa += vm_start - vma->vm_start;
-
 			/* IO region dirty page logging not allowed */
 			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
 				ret = -EINVAL;
 				goto out;
 			}
-
-			ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
-						    writable, false);
-			if (ret)
-				break;
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
 
-	if (change == KVM_MR_FLAGS_ONLY)
-		goto out;
-
-	if (ret)
-		kvm_riscv_mmu_iounmap(kvm, base_gpa, size);
-
 out:
 	mmap_read_unlock(current->mm);
 	return ret;
-- 
2.50.1
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Anup Patel 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 6:38 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> produce incorrect mappings.
>
> Instead, I/O mappings for such regions can be established on-demand
> during g-stage page faults, making the upfront ioremap in this path
> is unnecessary.
>
> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>

LGTM.

Queued it as fix for Linux-6.18

Reviewed-by: Anup Patel <anup@brainfault.org>

Thanks,
Anup

> ---
>  arch/riscv/kvm/mmu.c | 20 +-------------------
>  1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 525fb5a330c0..84c04c8f0892 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>
>         /*
>          * A memory region could potentially cover multiple VMAs, and
> -        * any holes between them, so iterate over all of them to find
> -        * out if we can map any of them right now.
> +        * any holes between them, so iterate over all of them.
>          *
>          *     +--------------------------------------------+
>          * +---------------+----------------+   +----------------+
> @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                 vm_end = min(reg_end, vma->vm_end);
>
>                 if (vma->vm_flags & VM_PFNMAP) {
> -                       gpa_t gpa = base_gpa + (vm_start - hva);
> -                       phys_addr_t pa;
> -
> -                       pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> -                       pa += vm_start - vma->vm_start;
> -
>                         /* IO region dirty page logging not allowed */
>                         if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>                                 ret = -EINVAL;
>                                 goto out;
>                         }
> -
> -                       ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> -                                                   writable, false);
> -                       if (ret)
> -                               break;
>                 }
>                 hva = vm_end;
>         } while (hva < reg_end);
>
> -       if (change == KVM_MR_FLAGS_ONLY)
> -               goto out;
> -
> -       if (ret)
> -               kvm_riscv_mmu_iounmap(kvm, base_gpa, size);
> -
>  out:
>         mmap_read_unlock(current->mm);
>         return ret;
> --
> 2.50.1
>
Re: Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by fangyu.yu@linux.alibaba.com 3 months, 2 weeks ago
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>>
>> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
>> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
>> regions. Using vma->vm_pgoff to derive the HPA here may therefore
>> produce incorrect mappings.
>>
>> Instead, I/O mappings for such regions can be established on-demand
>> during g-stage page faults, making the upfront ioremap in this path
>> is unnecessary.
>>
>> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
>LGTM.
>
>Queued it as fix for Linux-6.18
>
>Reviewed-by: Anup Patel <anup@brainfault.org>
>
>Thanks,
>Anup
>

Hi Anup:

Thanks for the review.

Please note that this patch has two build warnings, and I have fixed
on patch V2:
https://lore.kernel.org/linux-riscv/20251021142131.78796-1-fangyu.yu@linux.alibaba.com/

Thanks,
Fangyu
Re: Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Anup Patel 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 7:01 PM <fangyu.yu@linux.alibaba.com> wrote:
>
> >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >>
> >> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> >> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> >> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> >> produce incorrect mappings.
> >>
> >> Instead, I/O mappings for such regions can be established on-demand
> >> during g-stage page faults, making the upfront ioremap in this path
> >> is unnecessary.
> >>
> >> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >
> >LGTM.
> >
> >Queued it as fix for Linux-6.18
> >
> >Reviewed-by: Anup Patel <anup@brainfault.org>
> >
> >Thanks,
> >Anup
> >
>
> Hi Anup:
>
> Thanks for the review.
>
> Please note that this patch has two build warnings, and I have fixed
> on patch V2:
> https://lore.kernel.org/linux-riscv/20251021142131.78796-1-fangyu.yu@linux.alibaba.com/
>

Can you send a separate patch with Fixes tag to this patch?

You can base the patch on riscv_kvm_next branch at:
https://github.com/kvm-riscv/linux.git

Regards,
Anup
Re: Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Anup Patel 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 8:55 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Fri, Oct 24, 2025 at 7:01 PM <fangyu.yu@linux.alibaba.com> wrote:
> >
> > >> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> > >>
> > >> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> > >> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> > >> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> > >> produce incorrect mappings.
> > >>
> > >> Instead, I/O mappings for such regions can be established on-demand
> > >> during g-stage page faults, making the upfront ioremap in this path
> > >> is unnecessary.
> > >>
> > >> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> > >> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> > >
> > >LGTM.
> > >
> > >Queued it as fix for Linux-6.18
> > >
> > >Reviewed-by: Anup Patel <anup@brainfault.org>
> > >
> > >Thanks,
> > >Anup
> > >
> >
> > Hi Anup:
> >
> > Thanks for the review.
> >
> > Please note that this patch has two build warnings, and I have fixed
> > on patch V2:
> > https://lore.kernel.org/linux-riscv/20251021142131.78796-1-fangyu.yu@linux.alibaba.com/
> >
>
> Can you send a separate patch with Fixes tag to this patch?
>
> You can base the patch on riscv_kvm_next branch at:
> https://github.com/kvm-riscv/linux.git
>

For some reason, your v2 patch never showed up in my inbox :(

No need for a separate patch. I have picked your v2.

Regards,
Anup
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Guo Ren 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 6:08 AM <fangyu.yu@linux.alibaba.com> wrote:
>
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>
> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> produce incorrect mappings.
>
> Instead, I/O mappings for such regions can be established on-demand
> during g-stage page faults, making the upfront ioremap in this path
> is unnecessary.
>
> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
The Fixes tag should be 'commit aac6db75a9fc ("vfio/pci: Use
unmap_mapping_range()")'.

A stable tree necessitates minimizing the "Fixes tag" interference.

We also need to
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
For review.

> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---
>  arch/riscv/kvm/mmu.c | 20 +-------------------
>  1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 525fb5a330c0..84c04c8f0892 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>
>         /*
>          * A memory region could potentially cover multiple VMAs, and
> -        * any holes between them, so iterate over all of them to find
> -        * out if we can map any of them right now.
> +        * any holes between them, so iterate over all of them.
>          *
>          *     +--------------------------------------------+
>          * +---------------+----------------+   +----------------+
> @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>                 vm_end = min(reg_end, vma->vm_end);
>
>                 if (vma->vm_flags & VM_PFNMAP) {
> -                       gpa_t gpa = base_gpa + (vm_start - hva);
> -                       phys_addr_t pa;
> -
> -                       pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> -                       pa += vm_start - vma->vm_start;
> -
>                         /* IO region dirty page logging not allowed */
>                         if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>                                 ret = -EINVAL;
>                                 goto out;
>                         }
> -
> -                       ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> -                                                   writable, false);
> -                       if (ret)
> -                               break;
Defering the ioremap to the g-stage page fault looks good to me, as it
simplifies the implementation here.

Acked-by: Guo Ren <guoren@kernel.org>

>                 }
>                 hva = vm_end;
>         } while (hva < reg_end);
>
> -       if (change == KVM_MR_FLAGS_ONLY)
> -               goto out;
> -
> -       if (ret)
> -               kvm_riscv_mmu_iounmap(kvm, base_gpa, size);
> -
>  out:
>         mmap_read_unlock(current->mm);
>         return ret;
> --
> 2.50.1
>


-- 
Best Regards
 Guo Ren
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Anup Patel 3 months, 2 weeks ago
On Tue, Oct 21, 2025 at 8:58 PM Guo Ren <guoren@kernel.org> wrote:
>
> On Mon, Oct 20, 2025 at 6:08 AM <fangyu.yu@linux.alibaba.com> wrote:
> >
> > From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >
> > As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> > vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> > regions. Using vma->vm_pgoff to derive the HPA here may therefore
> > produce incorrect mappings.
> >
> > Instead, I/O mappings for such regions can be established on-demand
> > during g-stage page faults, making the upfront ioremap in this path
> > is unnecessary.
> >
> > Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> The Fixes tag should be 'commit aac6db75a9fc ("vfio/pci: Use
> unmap_mapping_range()")'.
>
> A stable tree necessitates minimizing the "Fixes tag" interference.
>
> We also need to
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> For review.
>
> > Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> > ---
> >  arch/riscv/kvm/mmu.c | 20 +-------------------
> >  1 file changed, 1 insertion(+), 19 deletions(-)
> >
> > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > index 525fb5a330c0..84c04c8f0892 100644
> > --- a/arch/riscv/kvm/mmu.c
> > +++ b/arch/riscv/kvm/mmu.c
> > @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >
> >         /*
> >          * A memory region could potentially cover multiple VMAs, and
> > -        * any holes between them, so iterate over all of them to find
> > -        * out if we can map any of them right now.
> > +        * any holes between them, so iterate over all of them.
> >          *
> >          *     +--------------------------------------------+
> >          * +---------------+----------------+   +----------------+
> > @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >                 vm_end = min(reg_end, vma->vm_end);
> >
> >                 if (vma->vm_flags & VM_PFNMAP) {
> > -                       gpa_t gpa = base_gpa + (vm_start - hva);
> > -                       phys_addr_t pa;
> > -
> > -                       pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> > -                       pa += vm_start - vma->vm_start;
> > -
> >                         /* IO region dirty page logging not allowed */
> >                         if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> >                                 ret = -EINVAL;
> >                                 goto out;
> >                         }
> > -
> > -                       ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> > -                                                   writable, false);
> > -                       if (ret)
> > -                               break;
> Defering the ioremap to the g-stage page fault looks good to me, as it
> simplifies the implementation here.
>
> Acked-by: Guo Ren <guoren@kernel.org>

I think you meant Reviewed-by and not Acked-by.

I have updated the Fixes tag at the time of merging.

Regards,
Anup
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Guo Ren 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 12:32 AM Anup Patel <anup@brainfault.org> wrote:
>
> On Tue, Oct 21, 2025 at 8:58 PM Guo Ren <guoren@kernel.org> wrote:
> >
> > On Mon, Oct 20, 2025 at 6:08 AM <fangyu.yu@linux.alibaba.com> wrote:
> > >
> > > From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> > >
> > > As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> > > vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> > > regions. Using vma->vm_pgoff to derive the HPA here may therefore
> > > produce incorrect mappings.
> > >
> > > Instead, I/O mappings for such regions can be established on-demand
> > > during g-stage page faults, making the upfront ioremap in this path
> > > is unnecessary.
> > >
> > > Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> > The Fixes tag should be 'commit aac6db75a9fc ("vfio/pci: Use
> > unmap_mapping_range()")'.
> >
> > A stable tree necessitates minimizing the "Fixes tag" interference.
> >
> > We also need to
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > For review.
> >
> > > Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> > > ---
> > >  arch/riscv/kvm/mmu.c | 20 +-------------------
> > >  1 file changed, 1 insertion(+), 19 deletions(-)
> > >
> > > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> > > index 525fb5a330c0..84c04c8f0892 100644
> > > --- a/arch/riscv/kvm/mmu.c
> > > +++ b/arch/riscv/kvm/mmu.c
> > > @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > >
> > >         /*
> > >          * A memory region could potentially cover multiple VMAs, and
> > > -        * any holes between them, so iterate over all of them to find
> > > -        * out if we can map any of them right now.
> > > +        * any holes between them, so iterate over all of them.
> > >          *
> > >          *     +--------------------------------------------+
> > >          * +---------------+----------------+   +----------------+
> > > @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> > >                 vm_end = min(reg_end, vma->vm_end);
> > >
> > >                 if (vma->vm_flags & VM_PFNMAP) {
> > > -                       gpa_t gpa = base_gpa + (vm_start - hva);
> > > -                       phys_addr_t pa;
> > > -
> > > -                       pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> > > -                       pa += vm_start - vma->vm_start;
> > > -
> > >                         /* IO region dirty page logging not allowed */
> > >                         if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> > >                                 ret = -EINVAL;
> > >                                 goto out;
> > >                         }
> > > -
> > > -                       ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> > > -                                                   writable, false);
> > > -                       if (ret)
> > > -                               break;
> > Defering the ioremap to the g-stage page fault looks good to me, as it
> > simplifies the implementation here.
> >
> > Acked-by: Guo Ren <guoren@kernel.org>
>
> I think you meant Reviewed-by and not Acked-by.
Yes,

Reviewed-by: Guo Ren <guoren@kernel.org>

>
> I have updated the Fixes tag at the time of merging.
Okay.

>
> Regards,
> Anup



-- 
Best Regards
 Guo Ren
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Jason Gunthorpe 3 months, 2 weeks ago
On Tue, Oct 21, 2025 at 08:27:50AM -0700, Guo Ren wrote:
> On Mon, Oct 20, 2025 at 6:08 AM <fangyu.yu@linux.alibaba.com> wrote:
> >
> > From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> >
> > As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> > vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> > regions. Using vma->vm_pgoff to derive the HPA here may therefore
> > produce incorrect mappings.

Assuming things about vm_pgoff is certainly incorrect. Handling it
during a normal fault by looking up the actual PTE seems OK to me too.

Jason
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by kernel test robot 3 months, 2 weeks ago
Hi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kvm/queue]
[also build test WARNING on kvm/next linus/master v6.18-rc2 next-20251020]
[cannot apply to kvm/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/fangyu-yu-linux-alibaba-com/RISC-V-KVM-Remove-automatic-I-O-mapping-for-VM_PFNMAP/20251020-210957
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/20251020130801.68356-1-fangyu.yu%40linux.alibaba.com
patch subject: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
config: riscv-defconfig (https://download.01.org/0day-ci/archive/20251021/202510211010.XRaEeuBa-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 754ebc6ebb9fb9fbee7aef33478c74ea74949853)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251021/202510211010.XRaEeuBa-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510211010.XRaEeuBa-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/riscv/kvm/mmu.c:211:9: warning: variable 'vm_start' set but not used [-Wunused-but-set-variable]
     211 |                 hva_t vm_start, vm_end;
         |                       ^
>> arch/riscv/kvm/mmu.c:174:8: warning: variable 'base_gpa' set but not used [-Wunused-but-set-variable]
     174 |         gpa_t base_gpa;
         |               ^
   2 warnings generated.


vim +/vm_start +211 arch/riscv/kvm/mmu.c

99cdc6c18c2d815 Anup Patel          2021-09-27  167  
99cdc6c18c2d815 Anup Patel          2021-09-27  168  int kvm_arch_prepare_memory_region(struct kvm *kvm,
537a17b31493009 Sean Christopherson 2021-12-06  169  				const struct kvm_memory_slot *old,
537a17b31493009 Sean Christopherson 2021-12-06  170  				struct kvm_memory_slot *new,
99cdc6c18c2d815 Anup Patel          2021-09-27  171  				enum kvm_mr_change change)
99cdc6c18c2d815 Anup Patel          2021-09-27  172  {
d01495d4cffb327 Sean Christopherson 2021-12-06  173  	hva_t hva, reg_end, size;
d01495d4cffb327 Sean Christopherson 2021-12-06 @174  	gpa_t base_gpa;
d01495d4cffb327 Sean Christopherson 2021-12-06  175  	bool writable;
9d05c1fee837572 Anup Patel          2021-09-27  176  	int ret = 0;
9d05c1fee837572 Anup Patel          2021-09-27  177  
9d05c1fee837572 Anup Patel          2021-09-27  178  	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
9d05c1fee837572 Anup Patel          2021-09-27  179  			change != KVM_MR_FLAGS_ONLY)
99cdc6c18c2d815 Anup Patel          2021-09-27  180  		return 0;
9d05c1fee837572 Anup Patel          2021-09-27  181  
9d05c1fee837572 Anup Patel          2021-09-27  182  	/*
9d05c1fee837572 Anup Patel          2021-09-27  183  	 * Prevent userspace from creating a memory region outside of the GPA
9d05c1fee837572 Anup Patel          2021-09-27  184  	 * space addressable by the KVM guest GPA space.
9d05c1fee837572 Anup Patel          2021-09-27  185  	 */
537a17b31493009 Sean Christopherson 2021-12-06  186  	if ((new->base_gfn + new->npages) >=
dd82e35638d67f4 Anup Patel          2025-06-18  187  	    (kvm_riscv_gstage_gpa_size >> PAGE_SHIFT))
9d05c1fee837572 Anup Patel          2021-09-27  188  		return -EFAULT;
9d05c1fee837572 Anup Patel          2021-09-27  189  
d01495d4cffb327 Sean Christopherson 2021-12-06  190  	hva = new->userspace_addr;
d01495d4cffb327 Sean Christopherson 2021-12-06  191  	size = new->npages << PAGE_SHIFT;
d01495d4cffb327 Sean Christopherson 2021-12-06  192  	reg_end = hva + size;
d01495d4cffb327 Sean Christopherson 2021-12-06  193  	base_gpa = new->base_gfn << PAGE_SHIFT;
d01495d4cffb327 Sean Christopherson 2021-12-06  194  	writable = !(new->flags & KVM_MEM_READONLY);
d01495d4cffb327 Sean Christopherson 2021-12-06  195  
9d05c1fee837572 Anup Patel          2021-09-27  196  	mmap_read_lock(current->mm);
9d05c1fee837572 Anup Patel          2021-09-27  197  
9d05c1fee837572 Anup Patel          2021-09-27  198  	/*
9d05c1fee837572 Anup Patel          2021-09-27  199  	 * A memory region could potentially cover multiple VMAs, and
b35152e0fb35983 Fangyu Yu           2025-10-20  200  	 * any holes between them, so iterate over all of them.
9d05c1fee837572 Anup Patel          2021-09-27  201  	 *
9d05c1fee837572 Anup Patel          2021-09-27  202  	 *     +--------------------------------------------+
9d05c1fee837572 Anup Patel          2021-09-27  203  	 * +---------------+----------------+   +----------------+
9d05c1fee837572 Anup Patel          2021-09-27  204  	 * |   : VMA 1     |      VMA 2     |   |    VMA 3  :    |
9d05c1fee837572 Anup Patel          2021-09-27  205  	 * +---------------+----------------+   +----------------+
9d05c1fee837572 Anup Patel          2021-09-27  206  	 *     |               memory region                |
9d05c1fee837572 Anup Patel          2021-09-27  207  	 *     +--------------------------------------------+
9d05c1fee837572 Anup Patel          2021-09-27  208  	 */
9d05c1fee837572 Anup Patel          2021-09-27  209  	do {
fce11b667022766 Quan Zhou           2025-06-17  210  		struct vm_area_struct *vma;
9d05c1fee837572 Anup Patel          2021-09-27 @211  		hva_t vm_start, vm_end;
9d05c1fee837572 Anup Patel          2021-09-27  212  
fce11b667022766 Quan Zhou           2025-06-17  213  		vma = find_vma_intersection(current->mm, hva, reg_end);
fce11b667022766 Quan Zhou           2025-06-17  214  		if (!vma)
9d05c1fee837572 Anup Patel          2021-09-27  215  			break;
9d05c1fee837572 Anup Patel          2021-09-27  216  
9d05c1fee837572 Anup Patel          2021-09-27  217  		/*
9d05c1fee837572 Anup Patel          2021-09-27  218  		 * Mapping a read-only VMA is only allowed if the
9d05c1fee837572 Anup Patel          2021-09-27  219  		 * memory region is configured as read-only.
9d05c1fee837572 Anup Patel          2021-09-27  220  		 */
9d05c1fee837572 Anup Patel          2021-09-27  221  		if (writable && !(vma->vm_flags & VM_WRITE)) {
9d05c1fee837572 Anup Patel          2021-09-27  222  			ret = -EPERM;
9d05c1fee837572 Anup Patel          2021-09-27  223  			break;
9d05c1fee837572 Anup Patel          2021-09-27  224  		}
9d05c1fee837572 Anup Patel          2021-09-27  225  
9d05c1fee837572 Anup Patel          2021-09-27  226  		/* Take the intersection of this VMA with the memory region */
9d05c1fee837572 Anup Patel          2021-09-27  227  		vm_start = max(hva, vma->vm_start);
9d05c1fee837572 Anup Patel          2021-09-27  228  		vm_end = min(reg_end, vma->vm_end);
9d05c1fee837572 Anup Patel          2021-09-27  229  
9d05c1fee837572 Anup Patel          2021-09-27  230  		if (vma->vm_flags & VM_PFNMAP) {
9d05c1fee837572 Anup Patel          2021-09-27  231  			/* IO region dirty page logging not allowed */
537a17b31493009 Sean Christopherson 2021-12-06  232  			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
9d05c1fee837572 Anup Patel          2021-09-27  233  				ret = -EINVAL;
9d05c1fee837572 Anup Patel          2021-09-27  234  				goto out;
9d05c1fee837572 Anup Patel          2021-09-27  235  			}
9d05c1fee837572 Anup Patel          2021-09-27  236  		}
9d05c1fee837572 Anup Patel          2021-09-27  237  		hva = vm_end;
9d05c1fee837572 Anup Patel          2021-09-27  238  	} while (hva < reg_end);
9d05c1fee837572 Anup Patel          2021-09-27  239  
9d05c1fee837572 Anup Patel          2021-09-27  240  out:
9d05c1fee837572 Anup Patel          2021-09-27  241  	mmap_read_unlock(current->mm);
9d05c1fee837572 Anup Patel          2021-09-27  242  	return ret;
99cdc6c18c2d815 Anup Patel          2021-09-27  243  }
99cdc6c18c2d815 Anup Patel          2021-09-27  244  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by Daniel Henrique Barboza 3 months, 2 weeks ago

On 10/20/25 10:08 AM, fangyu.yu@linux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> 
> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> produce incorrect mappings.
> 
> Instead, I/O mappings for such regions can be established on-demand
> during g-stage page faults, making the upfront ioremap in this path
> is unnecessary.
> 
> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
> ---

Hi,

This patch fixes the issue observed by Drew in [1]. I was helping Drew
debug it using a QEMU guest inside an emulated risc-v host with the
'virt' machine + IOMMU enabled.

Using the patches from [2], without the workaround patch (18), booting a
guest with a passed-through PCI device fails with a store amo fault and a
kernel oops:

[    3.304776] Oops - store (or AMO) access fault [#1]
[    3.305159] Modules linked in:
[    3.305603] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4 #39
[    3.305988] Hardware name: riscv-virtio,qemu (DT)
[    3.306140] epc : __ew32+0x34/0xba
[    3.307910]  ra : e1000_irq_disable+0x1e/0x9a
[    3.307984] epc : ffffffff806ebfbe ra : ffffffff806ee3f8 sp : ff2000000000baf0
[    3.308022]  gp : ffffffff81719938 tp : ff600000018b8000 t0 : ff60000002c3b480
[    3.308055]  t1 : 0000000000000065 t2 : 3030206530303031 s0 : ff2000000000bb30
[    3.308086]  s1 : ff60000002a50a00 a0 : ff60000002a50fb8 a1 : 00000000000000d8
[    3.308118]  a2 : ffffffffffffffff a3 : 0000000000000002 a4 : 0000000000003000
[    3.308161]  a5 : ff200000001e00d8 a6 : 0000000000000008 a7 : 0000000000000038
[    3.308195]  s2 : ff60000002a50fb8 s3 : ff60000001865000 s4 : 00000000000000d8
[    3.308226]  s5 : ffffffffffffffff s6 : ff60000002a50a00 s7 : ffffffff812d2760
[    3.308258]  s8 : 0000000000000a00 s9 : 0000000000001000 s10: ff60000002a51000
[    3.308288]  s11: ff60000002a54000 t3 : ffffffff8172ec4f t4 : ffffffff8172ec4f
[    3.308475]  t5 : ffffffff8172ec50 t6 : ff2000000000b848
[    3.308763] status: 0000000200000120 badaddr: ff200000001e00d8 cause: 0000000000000007
[    3.308975] [<ffffffff806ebfbe>] __ew32+0x34/0xba
[    3.309196] [<ffffffff806ee3f8>] e1000_irq_disable+0x1e/0x9a
[    3.309241] [<ffffffff806f1e12>] e1000_probe+0x3b6/0xb50
[    3.309279] [<ffffffff80510554>] pci_device_probe+0x7e/0xf8
[    3.310001] [<ffffffff80610344>] really_probe+0x82/0x202
[    3.310409] [<ffffffff80610520>] __driver_probe_device+0x5c/0xd0
[    3.310622] [<ffffffff806105c0>] driver_probe_device+0x2c/0xb0
(...)


Further debugging showed that, as far as QEMU goes, the store fault happens in an
"unassigned io region", i.e. a region where there's no IO memory region mapped by
any device. There is no IOMMU faults being logged and, at least as far as I've
observed, no IOMMU translation bugs in the QEMU side as well.


Thanks for the fix!


Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>



[1] https://lore.kernel.org/all/20250920203851.2205115-38-ajones@ventanamicro.com/
[2] https://lore.kernel.org/all/20250920203851.2205115-20-ajones@ventanamicro.com/




>   arch/riscv/kvm/mmu.c | 20 +-------------------
>   1 file changed, 1 insertion(+), 19 deletions(-)
> 
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 525fb5a330c0..84c04c8f0892 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   
>   	/*
>   	 * A memory region could potentially cover multiple VMAs, and
> -	 * any holes between them, so iterate over all of them to find
> -	 * out if we can map any of them right now.
> +	 * any holes between them, so iterate over all of them.
>   	 *
>   	 *     +--------------------------------------------+
>   	 * +---------------+----------------+   +----------------+
> @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   		vm_end = min(reg_end, vma->vm_end);
>   
>   		if (vma->vm_flags & VM_PFNMAP) {
> -			gpa_t gpa = base_gpa + (vm_start - hva);
> -			phys_addr_t pa;
> -
> -			pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> -			pa += vm_start - vma->vm_start;
> -
>   			/* IO region dirty page logging not allowed */
>   			if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>   				ret = -EINVAL;
>   				goto out;
>   			}
> -
> -			ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> -						    writable, false);
> -			if (ret)
> -				break;
>   		}
>   		hva = vm_end;
>   	} while (hva < reg_end);
>   
> -	if (change == KVM_MR_FLAGS_ONLY)
> -		goto out;
> -
> -	if (ret)
> -		kvm_riscv_mmu_iounmap(kvm, base_gpa, size);
> -
>   out:
>   	mmap_read_unlock(current->mm);
>   	return ret;
Re: Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
Posted by fangyu.yu@linux.alibaba.com 3 months, 2 weeks ago
>> From: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> 
>> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
>> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
>> regions. Using vma->vm_pgoff to derive the HPA here may therefore
>> produce incorrect mappings.
>> 
>> Instead, I/O mappings for such regions can be established on-demand
>> during g-stage page faults, making the upfront ioremap in this path
>> is unnecessary.
>> 
>> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
>> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
>> ---
>
>Hi,
>
>This patch fixes the issue observed by Drew in [1]. I was helping Drew
>debug it using a QEMU guest inside an emulated risc-v host with the
>'virt' machine + IOMMU enabled.

Thank you for testing this patch.
As you can see below, because of the previous HPA calculation error,
the GVA is mapped to an incorrect HPA, and finally a store amo/access
exception occurs.

>
>Using the patches from [2], without the workaround patch (18), booting a
>guest with a passed-through PCI device fails with a store amo fault and a
>kernel oops:
>
>[    3.304776] Oops - store (or AMO) access fault [#1]
>[    3.305159] Modules linked in:
>[    3.305603] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4 #39
>[    3.305988] Hardware name: riscv-virtio,qemu (DT)
>[    3.306140] epc : __ew32+0x34/0xba
>[    3.307910]  ra : e1000_irq_disable+0x1e/0x9a
>[    3.307984] epc : ffffffff806ebfbe ra : ffffffff806ee3f8 sp : ff2000000000baf0
>[    3.308022]  gp : ffffffff81719938 tp : ff600000018b8000 t0 : ff60000002c3b480
>[    3.308055]  t1 : 0000000000000065 t2 : 3030206530303031 s0 : ff2000000000bb30
>[    3.308086]  s1 : ff60000002a50a00 a0 : ff60000002a50fb8 a1 : 00000000000000d8
>[    3.308118]  a2 : ffffffffffffffff a3 : 0000000000000002 a4 : 0000000000003000
>[    3.308161]  a5 : ff200000001e00d8 a6 : 0000000000000008 a7 : 0000000000000038
>[    3.308195]  s2 : ff60000002a50fb8 s3 : ff60000001865000 s4 : 00000000000000d8
>[    3.308226]  s5 : ffffffffffffffff s6 : ff60000002a50a00 s7 : ffffffff812d2760
>[    3.308258]  s8 : 0000000000000a00 s9 : 0000000000001000 s10: ff60000002a51000
>[    3.308288]  s11: ff60000002a54000 t3 : ffffffff8172ec4f t4 : ffffffff8172ec4f
>[    3.308475]  t5 : ffffffff8172ec50 t6 : ff2000000000b848
>[    3.308763] status: 0000000200000120 badaddr: ff200000001e00d8 cause: 0000000000000007
>[    3.308975] [<ffffffff806ebfbe>] __ew32+0x34/0xba
>[    3.309196] [<ffffffff806ee3f8>] e1000_irq_disable+0x1e/0x9a
>[    3.309241] [<ffffffff806f1e12>] e1000_probe+0x3b6/0xb50
>[    3.309279] [<ffffffff80510554>] pci_device_probe+0x7e/0xf8
>[    3.310001] [<ffffffff80610344>] really_probe+0x82/0x202
>[    3.310409] [<ffffffff80610520>] __driver_probe_device+0x5c/0xd0
>[    3.310622] [<ffffffff806105c0>] driver_probe_device+0x2c/0xb0
>(...)
>
>
>Further debugging showed that, as far as QEMU goes, the store fault happens in an
>"unassigned io region", i.e. a region where there's no IO memory region mapped by
>any device. There is no IOMMU faults being logged and, at least as far as I've
>observed, no IOMMU translation bugs in the QEMU side as well.
>
>
>Thanks for the fix!
>
>
>Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>
>
>
>[1] https://lore.kernel.org/all/20250920203851.2205115-38-ajones@ventanamicro.com/
>[2] https://lore.kernel.org/all/20250920203851.2205115-20-ajones@ventanamicro.com/
>

Thanks,
Fangyu