arch/arm64/include/asm/memory.h | 4 +- arch/arm64/include/asm/signal.h | 2 +- arch/arm64/include/asm/uaccess.h | 4 +- arch/arm64/kernel/hw_breakpoint.c | 2 +- arch/arm64/kernel/traps.c | 4 +- arch/arm64/mm/fault.c | 10 +- arch/sparc/include/asm/pgtable_64.h | 2 +- arch/sparc/include/asm/uaccess_64.h | 2 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/mmu.h | 6 + arch/x86/include/asm/mmu_context.h | 45 + arch/x86/include/asm/processor-flags.h | 4 +- arch/x86/include/asm/tlbflush.h | 35 + arch/x86/include/asm/uaccess.h | 42 +- arch/x86/include/uapi/asm/prctl.h | 4 + arch/x86/include/uapi/asm/processor-flags.h | 6 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/fpu/xstate.c | 47 - arch/x86/kernel/proc.c | 60 ++ arch/x86/kernel/process.c | 3 + arch/x86/kernel/process_64.c | 64 +- arch/x86/mm/tlb.c | 48 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- drivers/infiniband/hw/mlx4/mr.c | 2 +- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- .../staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +- drivers/tee/tee_shm.c | 2 +- drivers/vfio/vfio_iommu_type1.c | 2 +- fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 11 - include/linux/uaccess.h | 15 + lib/strncpy_from_user.c | 2 +- lib/strnlen_user.c | 2 +- mm/gup.c | 6 +- mm/madvise.c | 2 +- mm/mempolicy.c | 6 +- mm/migrate.c | 2 +- mm/mincore.c | 2 +- mm/mlock.c | 4 +- mm/mmap.c | 2 +- mm/mprotect.c | 2 +- mm/mremap.c | 2 +- mm/msync.c | 2 +- tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/lam.c | 909 ++++++++++++++++++ virt/kvm/kvm_main.c | 2 +- 49 files changed, 1268 insertions(+), 122 deletions(-) create mode 100644 arch/x86/kernel/proc.c create mode 100644 tools/testing/selftests/x86/lam.c
Linear Address Masking[1] (LAM) modifies the checking that is applied to
64-bit linear addresses, allowing software to use of the untranslated
address bits for metadata.
The patchset brings support for LAM for userspace addresses. Only LAM_U57 at
this time.
Please review and consider applying.
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam
v8:
- Drop redundant smb_mb() in prctl_enable_tagged_addr();
- Cleanup code around build_cr3();
- Fix commit messages;
- Selftests updates;
- Acked/Reviewed/Tested-bys from Alexander and Peter;
v7:
- Drop redundant smb_mb() in prctl_enable_tagged_addr();
- Cleanup code around build_cr3();
- Fix commit message;
- Fix indentation;
v6:
- Rebased onto v6.0-rc1
- LAM_U48 excluded from the patchet. Still available in the git tree;
- add ARCH_GET_MAX_TAG_BITS;
- Fix build without CONFIG_DEBUG_VM;
- Update comments;
- Reviewed/Tested-by from Alexander;
v5:
- Do not use switch_mm() in enable_lam_func()
- Use mb()/READ_ONCE() pair on LAM enabling;
- Add self-test by Weihong Zhang;
- Add comments;
v4:
- Fix untagged_addr() for LAM_U48;
- Remove no-threads restriction on LAM enabling;
- Fix mm_struct access from /proc/$PID/arch_status
- Fix LAM handling in initialize_tlbstate_and_flush()
- Pack tlb_state better;
- Comments and commit messages;
v3:
- Rebased onto v5.19-rc1
- Per-process enabling;
- API overhaul (again);
- Avoid branches and costly computations in the fast path;
- LAM_U48 is in optional patch.
v2:
- Rebased onto v5.18-rc1
- New arch_prctl(2)-based API
- Expose status of LAM (or other thread features) in
/proc/$PID/arch_status
[1] ISE, Chapter 10. https://cdrdv2.intel.com/v1/dl/getContent/671368
Kirill A. Shutemov (7):
x86/mm: Fix CR3_ADDR_MASK
x86: CPUID and CR3/CR4 flags for Linear Address Masking
mm: Pass down mm_struct to untagged_addr()
x86/mm: Handle LAM on context switch
x86/uaccess: Provide untagged_addr() and remove tags before address
check
x86/mm: Provide arch_prctl() interface for LAM
x86: Expose untagging mask in /proc/$PID/arch_status
Weihong Zhang (4):
selftests/x86/lam: Add malloc and tag-bits test cases for
linear-address masking
selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address
masking
selftests/x86/lam: Add io_uring test cases for linear-address masking
selftests/x86/lam: Add inherit test cases for linear-address masking
arch/arm64/include/asm/memory.h | 4 +-
arch/arm64/include/asm/signal.h | 2 +-
arch/arm64/include/asm/uaccess.h | 4 +-
arch/arm64/kernel/hw_breakpoint.c | 2 +-
arch/arm64/kernel/traps.c | 4 +-
arch/arm64/mm/fault.c | 10 +-
arch/sparc/include/asm/pgtable_64.h | 2 +-
arch/sparc/include/asm/uaccess_64.h | 2 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/mmu.h | 6 +
arch/x86/include/asm/mmu_context.h | 45 +
arch/x86/include/asm/processor-flags.h | 4 +-
arch/x86/include/asm/tlbflush.h | 35 +
arch/x86/include/asm/uaccess.h | 42 +-
arch/x86/include/uapi/asm/prctl.h | 4 +
arch/x86/include/uapi/asm/processor-flags.h | 6 +
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/fpu/xstate.c | 47 -
arch/x86/kernel/proc.c | 60 ++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 64 +-
arch/x86/mm/tlb.c | 48 +-
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
drivers/gpu/drm/radeon/radeon_gem.c | 2 +-
drivers/infiniband/hw/mlx4/mr.c | 2 +-
drivers/media/common/videobuf2/frame_vector.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
.../staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +-
drivers/tee/tee_shm.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 11 -
include/linux/uaccess.h | 15 +
lib/strncpy_from_user.c | 2 +-
lib/strnlen_user.c | 2 +-
mm/gup.c | 6 +-
mm/madvise.c | 2 +-
mm/mempolicy.c | 6 +-
mm/migrate.c | 2 +-
mm/mincore.c | 2 +-
mm/mlock.c | 4 +-
mm/mmap.c | 2 +-
mm/mprotect.c | 2 +-
mm/mremap.c | 2 +-
mm/msync.c | 2 +-
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/lam.c | 909 ++++++++++++++++++
virt/kvm/kvm_main.c | 2 +-
49 files changed, 1268 insertions(+), 122 deletions(-)
create mode 100644 arch/x86/kernel/proc.c
create mode 100644 tools/testing/selftests/x86/lam.c
--
2.35.1
On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > Linear Address Masking[1] (LAM) modifies the checking that is applied to > 64-bit linear addresses, allowing software to use of the untranslated > address bits for metadata. > > The patchset brings support for LAM for userspace addresses. Only LAM_U57 at > this time. > > Please review and consider applying. > > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam +Bharata, Ananth. Do you folks have any feedback on the patchset? Looks like AMD version of the tagged pointers feature does not get traction as of now, but I want to be sure that the interface introduced here can be suitable for your future plans. Do you see anything in the interface that can prevent it to be extended to the AMD feature? -- Kiryl Shutsemau / Kirill A. Shutemov
Hi Kirill,
On 9/4/2022 6:30 AM, Kirill A. Shutemov wrote:
> On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote:
>> Linear Address Masking[1] (LAM) modifies the checking that is applied to
>> 64-bit linear addresses, allowing software to use of the untranslated
>> address bits for metadata.
>>
>> The patchset brings support for LAM for userspace addresses. Only LAM_U57 at
>> this time.
>>
>> Please review and consider applying.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam
>
> +Bharata, Ananth.
>
> Do you folks have any feedback on the patchset?
>
> Looks like AMD version of the tagged pointers feature does not get
> traction as of now, but I want to be sure that the interface introduced
> here can be suitable for your future plans.
>
> Do you see anything in the interface that can prevent it to be extended to
> the AMD feature?
The arch_prctl() extensions is generic enough that it should be good.
The untagged_addr() macro looks like this from one of the callers:
start = untagged_addr(mm, start);
ffffffff814d39bb: 48 8b 8d 40 ff ff ff mov -0xc0(%rbp),%rcx
ffffffff814d39c2: 48 89 f2 mov %rsi,%rdx
ffffffff814d39c5: 48 c1 fa 3f sar $0x3f,%rdx
ffffffff814d39c9: 48 0b 91 50 03 00 00 or 0x350(%rcx),%rdx
ffffffff814d39d0: 48 21 f2 and %rsi,%rdx
ffffffff814d39d3: 49 89 d6 mov %rdx,%r14
Can this overhead of a few additional instructions be removed for
platforms that don't have LAM feature? I haven't measured how much
overhead this effectively contributes to in real, but wonder if it is
worth optimizing for non-LAM platforms.
Regards,
Bharata.
On Mon, Sep 05, 2022 at 10:35:44AM +0530, Bharata B Rao wrote: > Hi Kirill, > > On 9/4/2022 6:30 AM, Kirill A. Shutemov wrote: > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > >> Linear Address Masking[1] (LAM) modifies the checking that is applied to > >> 64-bit linear addresses, allowing software to use of the untranslated > >> address bits for metadata. > >> > >> The patchset brings support for LAM for userspace addresses. Only LAM_U57 at > >> this time. > >> > >> Please review and consider applying. > >> > >> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam > > > > +Bharata, Ananth. > > > > Do you folks have any feedback on the patchset? > > > > Looks like AMD version of the tagged pointers feature does not get > > traction as of now, but I want to be sure that the interface introduced > > here can be suitable for your future plans. > > > > Do you see anything in the interface that can prevent it to be extended to > > the AMD feature? > > The arch_prctl() extensions is generic enough that it should be good. > > The untagged_addr() macro looks like this from one of the callers: > > start = untagged_addr(mm, start); > ffffffff814d39bb: 48 8b 8d 40 ff ff ff mov -0xc0(%rbp),%rcx > ffffffff814d39c2: 48 89 f2 mov %rsi,%rdx > ffffffff814d39c5: 48 c1 fa 3f sar $0x3f,%rdx > ffffffff814d39c9: 48 0b 91 50 03 00 00 or 0x350(%rcx),%rdx > ffffffff814d39d0: 48 21 f2 and %rsi,%rdx > ffffffff814d39d3: 49 89 d6 mov %rdx,%r14 > > Can this overhead of a few additional instructions be removed for > platforms that don't have LAM feature? I haven't measured how much > overhead this effectively contributes to in real, but wonder if it is > worth optimizing for non-LAM platforms. I'm not sure how the optimization should look like. I guess we can stick static_cpu_has() there, but I'm not convinced that adding jumps there will be beneficial. -- Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Sep 05, 2022 at 04:44:57PM +0300, Kirill A. Shutemov wrote: > On Mon, Sep 05, 2022 at 10:35:44AM +0530, Bharata B Rao wrote: > > Hi Kirill, > > > > On 9/4/2022 6:30 AM, Kirill A. Shutemov wrote: > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > >> Linear Address Masking[1] (LAM) modifies the checking that is applied to > > >> 64-bit linear addresses, allowing software to use of the untranslated > > >> address bits for metadata. > > >> > > >> The patchset brings support for LAM for userspace addresses. Only LAM_U57 at > > >> this time. > > >> > > >> Please review and consider applying. > > >> > > >> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam > > > > > > +Bharata, Ananth. > > > > > > Do you folks have any feedback on the patchset? > > > > > > Looks like AMD version of the tagged pointers feature does not get > > > traction as of now, but I want to be sure that the interface introduced > > > here can be suitable for your future plans. > > > > > > Do you see anything in the interface that can prevent it to be extended to > > > the AMD feature? > > > > The arch_prctl() extensions is generic enough that it should be good. > > > > The untagged_addr() macro looks like this from one of the callers: > > > > start = untagged_addr(mm, start); > > ffffffff814d39bb: 48 8b 8d 40 ff ff ff mov -0xc0(%rbp),%rcx > > ffffffff814d39c2: 48 89 f2 mov %rsi,%rdx > > ffffffff814d39c5: 48 c1 fa 3f sar $0x3f,%rdx > > ffffffff814d39c9: 48 0b 91 50 03 00 00 or 0x350(%rcx),%rdx > > ffffffff814d39d0: 48 21 f2 and %rsi,%rdx > > ffffffff814d39d3: 49 89 d6 mov %rdx,%r14 > > > > Can this overhead of a few additional instructions be removed for > > platforms that don't have LAM feature? I haven't measured how much > > overhead this effectively contributes to in real, but wonder if it is > > worth optimizing for non-LAM platforms. > > I'm not sure how the optimization should look like. I guess we can stick > static_cpu_has() there, but I'm not convinced that adding jumps there will > be beneficial. I suppose the critical bit is the memory load. That can stall and then you're sad. A jump_label is easy enough to add.
On Mon, Sep 05, 2022 at 04:30:25PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 05, 2022 at 04:44:57PM +0300, Kirill A. Shutemov wrote:
> > On Mon, Sep 05, 2022 at 10:35:44AM +0530, Bharata B Rao wrote:
> > > Hi Kirill,
> > >
> > > On 9/4/2022 6:30 AM, Kirill A. Shutemov wrote:
> > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote:
> > > >> Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > > >> 64-bit linear addresses, allowing software to use of the untranslated
> > > >> address bits for metadata.
> > > >>
> > > >> The patchset brings support for LAM for userspace addresses. Only LAM_U57 at
> > > >> this time.
> > > >>
> > > >> Please review and consider applying.
> > > >>
> > > >> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam
> > > >
> > > > +Bharata, Ananth.
> > > >
> > > > Do you folks have any feedback on the patchset?
> > > >
> > > > Looks like AMD version of the tagged pointers feature does not get
> > > > traction as of now, but I want to be sure that the interface introduced
> > > > here can be suitable for your future plans.
> > > >
> > > > Do you see anything in the interface that can prevent it to be extended to
> > > > the AMD feature?
> > >
> > > The arch_prctl() extensions is generic enough that it should be good.
> > >
> > > The untagged_addr() macro looks like this from one of the callers:
> > >
> > > start = untagged_addr(mm, start);
> > > ffffffff814d39bb: 48 8b 8d 40 ff ff ff mov -0xc0(%rbp),%rcx
> > > ffffffff814d39c2: 48 89 f2 mov %rsi,%rdx
> > > ffffffff814d39c5: 48 c1 fa 3f sar $0x3f,%rdx
> > > ffffffff814d39c9: 48 0b 91 50 03 00 00 or 0x350(%rcx),%rdx
> > > ffffffff814d39d0: 48 21 f2 and %rsi,%rdx
> > > ffffffff814d39d3: 49 89 d6 mov %rdx,%r14
> > >
> > > Can this overhead of a few additional instructions be removed for
> > > platforms that don't have LAM feature? I haven't measured how much
> > > overhead this effectively contributes to in real, but wonder if it is
> > > worth optimizing for non-LAM platforms.
> >
> > I'm not sure how the optimization should look like. I guess we can stick
> > static_cpu_has() there, but I'm not convinced that adding jumps there will
> > be beneficial.
>
> I suppose the critical bit is the memory load. That can stall and then
> you're sad. A jump_label is easy enough to add.
What about something like this?
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 803241dfc473..868d2730884b 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -30,8 +30,10 @@ static inline bool pagefault_disabled(void);
*/
#define untagged_addr(mm, addr) ({ \
u64 __addr = (__force u64)(addr); \
- s64 sign = (s64)__addr >> 63; \
- __addr &= (mm)->context.untag_mask | sign; \
+ if (static_cpu_has(X86_FEATURE_LAM)) { \
+ s64 sign = (s64)__addr >> 63; \
+ __addr &= (mm)->context.untag_mask | sign; \
+ } \
(__force __typeof__(addr))__addr; \
})
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Sep 05, 2022 at 06:35:17PM +0300, Kirill A. Shutemov wrote:
> What about something like this?
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 803241dfc473..868d2730884b 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -30,8 +30,10 @@ static inline bool pagefault_disabled(void);
> */
> #define untagged_addr(mm, addr) ({ \
> u64 __addr = (__force u64)(addr); \
> - s64 sign = (s64)__addr >> 63; \
> - __addr &= (mm)->context.untag_mask | sign; \
> + if (static_cpu_has(X86_FEATURE_LAM)) { \
> + s64 sign = (s64)__addr >> 63; \
> + __addr &= (mm)->context.untag_mask | sign; \
> + } \
> (__force __typeof__(addr))__addr; \
> })
Well, if you go throught the trouble of adding it, might as well use a
regular static branch and only enable it once there's an actual user,
no?
On Mon, Sep 05, 2022 at 05:46:49PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 05, 2022 at 06:35:17PM +0300, Kirill A. Shutemov wrote:
> > What about something like this?
> >
> > diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> > index 803241dfc473..868d2730884b 100644
> > --- a/arch/x86/include/asm/uaccess.h
> > +++ b/arch/x86/include/asm/uaccess.h
> > @@ -30,8 +30,10 @@ static inline bool pagefault_disabled(void);
> > */
> > #define untagged_addr(mm, addr) ({ \
> > u64 __addr = (__force u64)(addr); \
> > - s64 sign = (s64)__addr >> 63; \
> > - __addr &= (mm)->context.untag_mask | sign; \
> > + if (static_cpu_has(X86_FEATURE_LAM)) { \
> > + s64 sign = (s64)__addr >> 63; \
> > + __addr &= (mm)->context.untag_mask | sign; \
> > + } \
> > (__force __typeof__(addr))__addr; \
> > })
>
> Well, if you go throught the trouble of adding it, might as well use a
> regular static branch and only enable it once there's an actual user,
> no?
Fair enough. How about this?
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 803241dfc473..1a03c65a9c0f 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -22,6 +22,8 @@ static inline bool pagefault_disabled(void);
#endif
#ifdef CONFIG_X86_64
+DECLARE_STATIC_KEY_FALSE(tagged_addr_key);
+
/*
* Mask out tag bits from the address.
*
@@ -30,8 +32,10 @@ static inline bool pagefault_disabled(void);
*/
#define untagged_addr(mm, addr) ({ \
u64 __addr = (__force u64)(addr); \
- s64 sign = (s64)__addr >> 63; \
- __addr &= (mm)->context.untag_mask | sign; \
+ if (static_branch_unlikely(&tagged_addr_key)) { \
+ s64 sign = (s64)__addr >> 63; \
+ __addr &= (mm)->context.untag_mask | sign; \
+ } \
(__force __typeof__(addr))__addr; \
})
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 337f80a0862f..63194bf43c9a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -742,6 +742,9 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
}
#endif
+DEFINE_STATIC_KEY_FALSE(tagged_addr_key);
+EXPORT_SYMBOL_GPL(tagged_addr_key);
+
static void enable_lam_func(void *mm)
{
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
@@ -813,6 +816,7 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
}
on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
+ static_branch_enable(&tagged_addr_key);
out:
mutex_unlock(&mm->context.lock);
mmap_write_unlock(mm);
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Sep 05, 2022 at 07:47:08PM +0300, Kirill A. Shutemov wrote:
> Fair enough. How about this?
>
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 803241dfc473..1a03c65a9c0f 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -22,6 +22,8 @@ static inline bool pagefault_disabled(void);
> #endif
>
> #ifdef CONFIG_X86_64
> +DECLARE_STATIC_KEY_FALSE(tagged_addr_key);
> +
> /*
> * Mask out tag bits from the address.
> *
> @@ -30,8 +32,10 @@ static inline bool pagefault_disabled(void);
> */
> #define untagged_addr(mm, addr) ({ \
> u64 __addr = (__force u64)(addr); \
> - s64 sign = (s64)__addr >> 63; \
> - __addr &= (mm)->context.untag_mask | sign; \
> + if (static_branch_unlikely(&tagged_addr_key)) { \
> + s64 sign = (s64)__addr >> 63; \
> + __addr &= (mm)->context.untag_mask | sign; \
> + } \
> (__force __typeof__(addr))__addr; \
> })
>
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 337f80a0862f..63194bf43c9a 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -742,6 +742,9 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
> }
> #endif
>
> +DEFINE_STATIC_KEY_FALSE(tagged_addr_key);
So here you use the: false-unlikely scenario which seems suboptimal in
this case, I was thinking the false-likely case would generate better
code (see the comment in linux/jump_label.h).
> +EXPORT_SYMBOL_GPL(tagged_addr_key);
> +
> static void enable_lam_func(void *mm)
> {
> struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> @@ -813,6 +816,7 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> }
>
> on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
> + static_branch_enable(&tagged_addr_key);
> out:
> mutex_unlock(&mm->context.lock);
> mmap_write_unlock(mm);
Aside from the one nit above, this looks about right.
Hi Kirill, On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > Linear Address Masking[1] (LAM) modifies the checking that is applied to > 64-bit linear addresses, allowing software to use of the untranslated > address bits for metadata. We discussed this internally, but didn't bubble up here. Given that we are working on enabling Shared Virtual Addressing (SVA) within the IOMMU. This permits user to share VA directly with the device, and the device can participate even in fixing page-faults and such. IOMMU enforces canonical addressing, since we are hijacking the top order bits for meta-data, it will fail sanity check and we would return a failure back to device on any page-faults from device. It also complicates how device TLB and ATS work, and needs some major improvements to detect device capability to accept tagged pointers, adjust the devtlb to act accordingly. Both are orthogonal features, but there is an intersection of both that are fundamentally incompatible. Its even more important, since an application might be using SVA under the cover provided by some library that's used without their knowledge. The path would be: 1. Ensure both LAM and SVM are incompatible by design, without major changes. - If LAM is enabled already and later SVM enabling is requested by user, that should fail. and Vice versa. - Provide an API to user to ask for opt-out. Now they know they must sanitize the pointers before sending to device, or the working set is already isolated and needs no work. 2. I suppose for any syscalls that take tagged pointers you would maybe relax checks for how many bits to ignore for canonicallity. This is required so user don't need to do the same for everything sanitization before every syscall. If you have it fail, the library might choose a less optimal path if one is available. Cheers, Ashok
On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote:
> Hi Kirill,
>
> On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote:
> > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > 64-bit linear addresses, allowing software to use of the untranslated
> > address bits for metadata.
>
> We discussed this internally, but didn't bubble up here.
>
> Given that we are working on enabling Shared Virtual Addressing (SVA)
> within the IOMMU. This permits user to share VA directly with the device,
> and the device can participate even in fixing page-faults and such.
>
> IOMMU enforces canonical addressing, since we are hijacking the top order
> bits for meta-data, it will fail sanity check and we would return a failure
> back to device on any page-faults from device.
>
> It also complicates how device TLB and ATS work, and needs some major
> improvements to detect device capability to accept tagged pointers, adjust
> the devtlb to act accordingly.
>
>
> Both are orthogonal features, but there is an intersection of both
> that are fundamentally incompatible.
>
> Its even more important, since an application might be using SVA under the
> cover provided by some library that's used without their knowledge.
>
> The path would be:
>
> 1. Ensure both LAM and SVM are incompatible by design, without major
> changes.
> - If LAM is enabled already and later SVM enabling is requested by
> user, that should fail. and Vice versa.
> - Provide an API to user to ask for opt-out. Now they know they
> must sanitize the pointers before sending to device, or the
> working set is already isolated and needs no work.
The patch below implements something like this. It is PoC, build-tested only.
To be honest, I hate it. It is clearly a layering violation. It feels
dirty. But I don't see any better way as we tie orthogonal features
together.
Also I have no idea how to make forced PASID allocation if LAM enabled.
What the API has to look like?
Any comments?
> 2. I suppose for any syscalls that take tagged pointers you would maybe
> relax checks for how many bits to ignore for canonicallity. This is
> required so user don't need to do the same for everything sanitization
> before every syscall.
I'm not quite follow this. For syscalls that allow tagged pointers, we do
untagged_addr() now. Not sure what else needed.
> If you have it fail, the library might choose a less optimal path if one is
> available.
>
> Cheers,
> Ashok
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index a31e27b95b19..e5c04ced36c9 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -23,5 +23,6 @@
#define ARCH_GET_UNTAG_MASK 0x4001
#define ARCH_ENABLE_TAGGED_ADDR 0x4002
#define ARCH_GET_MAX_TAG_BITS 0x4003
+#define ARCH_ENABLE_TAGGED_ADDR_FORCED 0x4004
#endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 337f80a0862f..7d89a2fd1a55 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -774,7 +774,8 @@ static bool lam_u48_allowed(void)
#define LAM_U48_BITS 15
#define LAM_U57_BITS 6
-static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
+static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits,
+ bool forced)
{
int ret = 0;
@@ -793,6 +794,11 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
goto out;
}
+ if (pasid_valid(mm->pasid) && !forced) {
+ ret = -EBUSY;
+ goto out;
+ }
+
if (!nr_bits) {
ret = -EINVAL;
goto out;
@@ -910,7 +916,9 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
return put_user(task->mm->context.untag_mask,
(unsigned long __user *)arg2);
case ARCH_ENABLE_TAGGED_ADDR:
- return prctl_enable_tagged_addr(task->mm, arg2);
+ return prctl_enable_tagged_addr(task->mm, arg2, false);
+ case ARCH_ENABLE_TAGGED_ADDR_FORCED:
+ return prctl_enable_tagged_addr(task->mm, arg2, true);
case ARCH_GET_MAX_TAG_BITS: {
int nr_bits;
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..a6ec17de1937 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -4,6 +4,7 @@
*/
#include <linux/mutex.h>
#include <linux/sched/mm.h>
+#include <asm/mmu_context.h>
#include "iommu-sva-lib.h"
@@ -32,6 +33,15 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
return -EINVAL;
mutex_lock(&iommu_sva_lock);
+
+ /* Serialize against LAM enabling */
+ mutex_lock(&mm->context.lock);
+
+ if (mm_lam_cr3_mask(mm)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
/* Is a PASID already associated with this mm? */
if (pasid_valid(mm->pasid)) {
if (mm->pasid < min || mm->pasid >= max)
@@ -45,6 +55,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
else
mm_pasid_set(mm, pasid);
out:
+ mutex_unlock(&mm->context.lock);
mutex_unlock(&iommu_sva_lock);
return ret;
}
--
Kiryl Shutsemau / Kirill A. Shutemov
On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote:
> On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote:
> > Hi Kirill,
> >
> > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote:
> > > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > > 64-bit linear addresses, allowing software to use of the untranslated
> > > address bits for metadata.
> >
> > We discussed this internally, but didn't bubble up here.
> >
> > Given that we are working on enabling Shared Virtual Addressing (SVA)
> > within the IOMMU. This permits user to share VA directly with the device,
> > and the device can participate even in fixing page-faults and such.
> >
> > IOMMU enforces canonical addressing, since we are hijacking the top order
> > bits for meta-data, it will fail sanity check and we would return a failure
> > back to device on any page-faults from device.
> >
> > It also complicates how device TLB and ATS work, and needs some major
> > improvements to detect device capability to accept tagged pointers, adjust
> > the devtlb to act accordingly.
> >
> >
> > Both are orthogonal features, but there is an intersection of both
> > that are fundamentally incompatible.
> >
> > Its even more important, since an application might be using SVA under the
> > cover provided by some library that's used without their knowledge.
> >
> > The path would be:
> >
> > 1. Ensure both LAM and SVM are incompatible by design, without major
> > changes.
> > - If LAM is enabled already and later SVM enabling is requested by
> > user, that should fail. and Vice versa.
> > - Provide an API to user to ask for opt-out. Now they know they
> > must sanitize the pointers before sending to device, or the
> > working set is already isolated and needs no work.
>
> The patch below implements something like this. It is PoC, build-tested only.
>
> To be honest, I hate it. It is clearly a layering violation. It feels
> dirty. But I don't see any better way as we tie orthogonal features
> together.
>
> Also I have no idea how to make forced PASID allocation if LAM enabled.
> What the API has to look like?
>
> Any comments?
Looking through it, it seems to be sane enough.. I feel dirty too :-) but
don't see a better way.
I'm Ccing JasonG since we are reworking the IOMMU interfaces right now, and
Jacob who is in the middle of some refactoring.
>
> > 2. I suppose for any syscalls that take tagged pointers you would maybe
> > relax checks for how many bits to ignore for canonicallity. This is
> > required so user don't need to do the same for everything sanitization
> > before every syscall.
>
> I'm not quite follow this. For syscalls that allow tagged pointers, we do
> untagged_addr() now. Not sure what else needed.
>
> > If you have it fail, the library might choose a less optimal path if one is
> > available.
> >
> > Cheers,
> > Ashok
>
> diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
> index a31e27b95b19..e5c04ced36c9 100644
> --- a/arch/x86/include/uapi/asm/prctl.h
> +++ b/arch/x86/include/uapi/asm/prctl.h
> @@ -23,5 +23,6 @@
> #define ARCH_GET_UNTAG_MASK 0x4001
> #define ARCH_ENABLE_TAGGED_ADDR 0x4002
> #define ARCH_GET_MAX_TAG_BITS 0x4003
> +#define ARCH_ENABLE_TAGGED_ADDR_FORCED 0x4004
>
> #endif /* _ASM_X86_PRCTL_H */
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 337f80a0862f..7d89a2fd1a55 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -774,7 +774,8 @@ static bool lam_u48_allowed(void)
> #define LAM_U48_BITS 15
> #define LAM_U57_BITS 6
>
> -static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> +static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits,
> + bool forced)
> {
> int ret = 0;
>
> @@ -793,6 +794,11 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> goto out;
> }
>
> + if (pasid_valid(mm->pasid) && !forced) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> if (!nr_bits) {
> ret = -EINVAL;
> goto out;
> @@ -910,7 +916,9 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> return put_user(task->mm->context.untag_mask,
> (unsigned long __user *)arg2);
> case ARCH_ENABLE_TAGGED_ADDR:
> - return prctl_enable_tagged_addr(task->mm, arg2);
> + return prctl_enable_tagged_addr(task->mm, arg2, false);
> + case ARCH_ENABLE_TAGGED_ADDR_FORCED:
> + return prctl_enable_tagged_addr(task->mm, arg2, true);
> case ARCH_GET_MAX_TAG_BITS: {
> int nr_bits;
>
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 106506143896..a6ec17de1937 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -4,6 +4,7 @@
> */
> #include <linux/mutex.h>
> #include <linux/sched/mm.h>
> +#include <asm/mmu_context.h>
>
> #include "iommu-sva-lib.h"
>
> @@ -32,6 +33,15 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> return -EINVAL;
>
> mutex_lock(&iommu_sva_lock);
> +
> + /* Serialize against LAM enabling */
> + mutex_lock(&mm->context.lock);
> +
> + if (mm_lam_cr3_mask(mm)) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> /* Is a PASID already associated with this mm? */
> if (pasid_valid(mm->pasid)) {
> if (mm->pasid < min || mm->pasid >= max)
> @@ -45,6 +55,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> else
> mm_pasid_set(mm, pasid);
> out:
> + mutex_unlock(&mm->context.lock);
> mutex_unlock(&iommu_sva_lock);
> return ret;
> }
> --
> Kiryl Shutsemau / Kirill A. Shutemov
Hi Ashok,
On Fri, 9 Sep 2022 16:08:02 +0000, Ashok Raj <ashok_raj@linux.intel.com>
wrote:
> > diff --git a/arch/x86/include/uapi/asm/prctl.h
> > b/arch/x86/include/uapi/asm/prctl.h index a31e27b95b19..e5c04ced36c9
> > 100644 --- a/arch/x86/include/uapi/asm/prctl.h
> > +++ b/arch/x86/include/uapi/asm/prctl.h
> > @@ -23,5 +23,6 @@
> > #define ARCH_GET_UNTAG_MASK 0x4001
> > #define ARCH_ENABLE_TAGGED_ADDR 0x4002
> > #define ARCH_GET_MAX_TAG_BITS 0x4003
> > +#define ARCH_ENABLE_TAGGED_ADDR_FORCED 0x4004
> >
> > #endif /* _ASM_X86_PRCTL_H */
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index 337f80a0862f..7d89a2fd1a55 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -774,7 +774,8 @@ static bool lam_u48_allowed(void)
> > #define LAM_U48_BITS 15
> > #define LAM_U57_BITS 6
> >
> > -static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned
> > long nr_bits) +static int prctl_enable_tagged_addr(struct mm_struct
> > *mm, unsigned long nr_bits,
> > + bool forced)
> > {
> > int ret = 0;
> >
> > @@ -793,6 +794,11 @@ static int prctl_enable_tagged_addr(struct
> > mm_struct *mm, unsigned long nr_bits) goto out;
> > }
> >
> > + if (pasid_valid(mm->pasid) && !forced) {
I don't think this works since we have lazy pasid free. for example,
after all the devices did sva_unbind, mm->pasid we'll remain valid until
mmdrop(). LAM should be supported in this case.
Perhaps, we could introduce another prctl flag for SVA, PR_GET_SVA?
Both iommu driver and LAM can set/query the flag. LAM applications may not
be the only ones want to know if share virtual addressing is on.
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
Thanks,
Jacob
On 9/12/22 13:39, Jacob Pan wrote:
>>> + if (pasid_valid(mm->pasid) && !forced) {
> I don't think this works since we have lazy pasid free. for example,
> after all the devices did sva_unbind, mm->pasid we'll remain valid until
> mmdrop(). LAM should be supported in this case.
Nah, it works fine.
It just means that the rules are "you can't do LAM if your process
*EVER* got a PASID" instead of "you can't do LAM if you are actively
using your PASID".
We knew that PASID use would be a one-way trip for a process when we
moved to the simplified implementation. This is just more fallout from
that. It's fine.
> Perhaps, we could introduce another prctl flag for SVA, PR_GET_SVA?
> Both iommu driver and LAM can set/query the flag. LAM applications may not
> be the only ones want to know if share virtual addressing is on.
I don't think it's a good idea to add yet more UABI around this issue.
Won't the IOMMU folks eventually get their hardware in line with LAM?
Isn't this situation temporary?
Hi Dave,
On Mon, 12 Sep 2022 14:41:56 -0700, Dave Hansen <dave.hansen@intel.com>
wrote:
> On 9/12/22 13:39, Jacob Pan wrote:
> >>> + if (pasid_valid(mm->pasid) && !forced) {
> > I don't think this works since we have lazy pasid free. for example,
> > after all the devices did sva_unbind, mm->pasid we'll remain valid
> > until mmdrop(). LAM should be supported in this case.
>
> Nah, it works fine.
> It just means that the rules are "you can't do LAM if your process
> *EVER* got a PASID" instead of "you can't do LAM if you are actively
> using your PASID".
Sure it works if you change the rules, but this case need to documented.
>
> We knew that PASID use would be a one-way trip for a process when we
> moved to the simplified implementation. This is just more fallout from
> that. It's fine.
>
Is LAM also a one-way trip?
> > Perhaps, we could introduce another prctl flag for SVA, PR_GET_SVA?
> > Both iommu driver and LAM can set/query the flag. LAM applications may
> > not be the only ones want to know if share virtual addressing is on.
>
> I don't think it's a good idea to add yet more UABI around this issue.
> Won't the IOMMU folks eventually get their hardware in line with LAM?
> Isn't this situation temporary?
Thanks,
Jacob
On Mon, Sep 12, 2022 at 03:55:02PM -0700, Jacob Pan wrote:
> Hi Dave,
>
> On Mon, 12 Sep 2022 14:41:56 -0700, Dave Hansen <dave.hansen@intel.com>
> wrote:
>
> > On 9/12/22 13:39, Jacob Pan wrote:
> > >>> + if (pasid_valid(mm->pasid) && !forced) {
> > > I don't think this works since we have lazy pasid free. for example,
> > > after all the devices did sva_unbind, mm->pasid we'll remain valid
> > > until mmdrop(). LAM should be supported in this case.
> >
> > Nah, it works fine.
> > It just means that the rules are "you can't do LAM if your process
> > *EVER* got a PASID" instead of "you can't do LAM if you are actively
> > using your PASID".
> Sure it works if you change the rules, but this case need to documented.
>
> >
> > We knew that PASID use would be a one-way trip for a process when we
> > moved to the simplified implementation. This is just more fallout from
> > that. It's fine.
> >
> Is LAM also a one-way trip?
Yes.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Sep 12, 2022 at 02:41:56PM -0700, Dave Hansen wrote:
> On 9/12/22 13:39, Jacob Pan wrote:
> >>> + if (pasid_valid(mm->pasid) && !forced) {
> > I don't think this works since we have lazy pasid free. for example,
> > after all the devices did sva_unbind, mm->pasid we'll remain valid until
> > mmdrop(). LAM should be supported in this case.
>
> Nah, it works fine.
>
> It just means that the rules are "you can't do LAM if your process
> *EVER* got a PASID" instead of "you can't do LAM if you are actively
> using your PASID".
>
> We knew that PASID use would be a one-way trip for a process when we
> moved to the simplified implementation. This is just more fallout from
> that. It's fine.
Agree.
>
> > Perhaps, we could introduce another prctl flag for SVA, PR_GET_SVA?
> > Both iommu driver and LAM can set/query the flag. LAM applications may not
> > be the only ones want to know if share virtual addressing is on.
>
> I don't think it's a good idea to add yet more UABI around this issue.
> Won't the IOMMU folks eventually get their hardware in line with LAM?
> Isn't this situation temporary?
This is more than just the IOMMU change, since this involves device end,
ability to report tagging feature, communicating the width to ignore
etc. I suspect PCIe changes are required for the device end which would
be a long pole.
I suspect this would be moderately permanent :-) memory tagging is more
of a niche use case, and hurting general IO devices has lots of design
touch points that makes it difficult to close in short order.
On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote: > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote: > > Hi Kirill, > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > Linear Address Masking[1] (LAM) modifies the checking that is applied to > > > 64-bit linear addresses, allowing software to use of the untranslated > > > address bits for metadata. > > > > We discussed this internally, but didn't bubble up here. > > > > Given that we are working on enabling Shared Virtual Addressing (SVA) > > within the IOMMU. This permits user to share VA directly with the device, > > and the device can participate even in fixing page-faults and such. > > > > IOMMU enforces canonical addressing, since we are hijacking the top order > > bits for meta-data, it will fail sanity check and we would return a failure > > back to device on any page-faults from device. > > > > It also complicates how device TLB and ATS work, and needs some major > > improvements to detect device capability to accept tagged pointers, adjust > > the devtlb to act accordingly. > > > > > > Both are orthogonal features, but there is an intersection of both > > that are fundamentally incompatible. > > > > Its even more important, since an application might be using SVA under the > > cover provided by some library that's used without their knowledge. > > > > The path would be: > > > > 1. Ensure both LAM and SVM are incompatible by design, without major > > changes. > > - If LAM is enabled already and later SVM enabling is requested by > > user, that should fail. and Vice versa. > > - Provide an API to user to ask for opt-out. Now they know they > > must sanitize the pointers before sending to device, or the > > working set is already isolated and needs no work. > > The patch below implements something like this. It is PoC, build-tested only. > > To be honest, I hate it. It is clearly a layering violation. It feels > dirty. But I don't see any better way as we tie orthogonal features > together. > > Also I have no idea how to make forced PASID allocation if LAM enabled. > What the API has to look like? Jacob, Ashok, any comment on this part? I expect in many cases LAM will be enabled very early (like before malloc is functinal) in process start and it makes PASID allocation always fail. Any way out? -- Kiryl Shutsemau / Kirill A. Shutemov
Hi Kirill,
On Tue, 13 Sep 2022 01:49:30 +0300, "Kirill A. Shutemov"
<kirill.shutemov@linux.intel.com> wrote:
> On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote:
> > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote:
> > > Hi Kirill,
> > >
> > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote:
> > > > Linear Address Masking[1] (LAM) modifies the checking that is
> > > > applied to 64-bit linear addresses, allowing software to use of the
> > > > untranslated address bits for metadata.
> > >
> > > We discussed this internally, but didn't bubble up here.
> > >
> > > Given that we are working on enabling Shared Virtual Addressing (SVA)
> > > within the IOMMU. This permits user to share VA directly with the
> > > device, and the device can participate even in fixing page-faults and
> > > such.
> > >
> > > IOMMU enforces canonical addressing, since we are hijacking the top
> > > order bits for meta-data, it will fail sanity check and we would
> > > return a failure back to device on any page-faults from device.
> > >
> > > It also complicates how device TLB and ATS work, and needs some major
> > > improvements to detect device capability to accept tagged pointers,
> > > adjust the devtlb to act accordingly.
> > >
> > >
> > > Both are orthogonal features, but there is an intersection of both
> > > that are fundamentally incompatible.
> > >
> > > Its even more important, since an application might be using SVA
> > > under the cover provided by some library that's used without their
> > > knowledge.
> > >
> > > The path would be:
> > >
> > > 1. Ensure both LAM and SVM are incompatible by design, without major
> > > changes.
> > > - If LAM is enabled already and later SVM enabling is
> > > requested by user, that should fail. and Vice versa.
> > > - Provide an API to user to ask for opt-out. Now they know
> > > they must sanitize the pointers before sending to device, or the
> > > working set is already isolated and needs no work.
> >
> > The patch below implements something like this. It is PoC, build-tested
> > only.
> >
> > To be honest, I hate it. It is clearly a layering violation. It feels
> > dirty. But I don't see any better way as we tie orthogonal features
> > together.
> >
> > Also I have no idea how to make forced PASID allocation if LAM enabled.
> > What the API has to look like?
>
> Jacob, Ashok, any comment on this part?
>
> I expect in many cases LAM will be enabled very early (like before malloc
> is functinal) in process start and it makes PASID allocation always fail.
>
Is there a generic flag LAM can set on the mm?
We can't check x86 feature in IOMMU SVA API. i.e.
@@ -32,6 +33,15 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
return -EINVAL;
mutex_lock(&iommu_sva_lock);
+
+ /* Serialize against LAM enabling */
+ mutex_lock(&mm->context.lock);
+
+ if (mm_lam_cr3_mask(mm)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
> Any way out?
>
Thanks,
Jacob
On Mon, Sep 12, 2022 at 05:08:09PM -0700, Jacob Pan wrote: > Hi Kirill, > > On Tue, 13 Sep 2022 01:49:30 +0300, "Kirill A. Shutemov" > <kirill.shutemov@linux.intel.com> wrote: > > > On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote: > > > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote: > > > > Hi Kirill, > > > > > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > > > Linear Address Masking[1] (LAM) modifies the checking that is > > > > > applied to 64-bit linear addresses, allowing software to use of the > > > > > untranslated address bits for metadata. > > > > > > > > We discussed this internally, but didn't bubble up here. > > > > > > > > Given that we are working on enabling Shared Virtual Addressing (SVA) > > > > within the IOMMU. This permits user to share VA directly with the > > > > device, and the device can participate even in fixing page-faults and > > > > such. > > > > > > > > IOMMU enforces canonical addressing, since we are hijacking the top > > > > order bits for meta-data, it will fail sanity check and we would > > > > return a failure back to device on any page-faults from device. > > > > > > > > It also complicates how device TLB and ATS work, and needs some major > > > > improvements to detect device capability to accept tagged pointers, > > > > adjust the devtlb to act accordingly. > > > > > > > > > > > > Both are orthogonal features, but there is an intersection of both > > > > that are fundamentally incompatible. > > > > > > > > Its even more important, since an application might be using SVA > > > > under the cover provided by some library that's used without their > > > > knowledge. > > > > > > > > The path would be: > > > > > > > > 1. Ensure both LAM and SVM are incompatible by design, without major > > > > changes. > > > > - If LAM is enabled already and later SVM enabling is > > > > requested by user, that should fail. and Vice versa. > > > > - Provide an API to user to ask for opt-out. Now they know > > > > they must sanitize the pointers before sending to device, or the > > > > working set is already isolated and needs no work. > > > > > > The patch below implements something like this. It is PoC, build-tested > > > only. > > > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > > dirty. But I don't see any better way as we tie orthogonal features > > > together. > > > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > > What the API has to look like? > > > > Jacob, Ashok, any comment on this part? > > > > I expect in many cases LAM will be enabled very early (like before malloc > > is functinal) in process start and it makes PASID allocation always fail. > > > Is there a generic flag LAM can set on the mm? Hm. Not really. I thought we can use untagged_addr(mm, -1UL) != -1UL as such check, but -1UL is kernel address and untagged_addr() would not untag such address for LAM. I guess we can make add a helper for this. But tagged address implementation is rather different across different platforms and semantic can be hard to define. Like if the tagged addresses support per-thread or per-process. Or maybe it is global. Maybe just add arch hook there? arch_can_alloc_pasid(mm) or something. -- Kiryl Shutsemau / Kirill A. Shutemov
On Tue, Sep 13, 2022 at 01:49:30AM +0300, Kirill A. Shutemov wrote: > On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote: > > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote: > > > Hi Kirill, > > > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > > Linear Address Masking[1] (LAM) modifies the checking that is applied to > > > > 64-bit linear addresses, allowing software to use of the untranslated > > > > address bits for metadata. > > > > > > We discussed this internally, but didn't bubble up here. > > > > > > Given that we are working on enabling Shared Virtual Addressing (SVA) > > > within the IOMMU. This permits user to share VA directly with the device, > > > and the device can participate even in fixing page-faults and such. > > > > > > IOMMU enforces canonical addressing, since we are hijacking the top order > > > bits for meta-data, it will fail sanity check and we would return a failure > > > back to device on any page-faults from device. > > > > > > It also complicates how device TLB and ATS work, and needs some major > > > improvements to detect device capability to accept tagged pointers, adjust > > > the devtlb to act accordingly. > > > > > > > > > Both are orthogonal features, but there is an intersection of both > > > that are fundamentally incompatible. > > > > > > Its even more important, since an application might be using SVA under the > > > cover provided by some library that's used without their knowledge. > > > > > > The path would be: > > > > > > 1. Ensure both LAM and SVM are incompatible by design, without major > > > changes. > > > - If LAM is enabled already and later SVM enabling is requested by > > > user, that should fail. and Vice versa. > > > - Provide an API to user to ask for opt-out. Now they know they > > > must sanitize the pointers before sending to device, or the > > > working set is already isolated and needs no work. > > > > The patch below implements something like this. It is PoC, build-tested only. > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > dirty. But I don't see any better way as we tie orthogonal features > > together. > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > What the API has to look like? > > Jacob, Ashok, any comment on this part? > > I expect in many cases LAM will be enabled very early (like before malloc > is functinal) in process start and it makes PASID allocation always fail. > > Any way out? We need closure on this to proceed. Any clue? -- Kiryl Shutsemau / Kirill A. Shutemov
On Wed, Sep 14, 2022 at 05:45:18PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 13, 2022 at 01:49:30AM +0300, Kirill A. Shutemov wrote: > > On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote: > > > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote: > > > > Hi Kirill, > > > > > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > > > Linear Address Masking[1] (LAM) modifies the checking that is applied to > > > > > 64-bit linear addresses, allowing software to use of the untranslated > > > > > address bits for metadata. > > > > > > > > We discussed this internally, but didn't bubble up here. > > > > > > > > Given that we are working on enabling Shared Virtual Addressing (SVA) > > > > within the IOMMU. This permits user to share VA directly with the device, > > > > and the device can participate even in fixing page-faults and such. > > > > > > > > IOMMU enforces canonical addressing, since we are hijacking the top order > > > > bits for meta-data, it will fail sanity check and we would return a failure > > > > back to device on any page-faults from device. > > > > > > > > It also complicates how device TLB and ATS work, and needs some major > > > > improvements to detect device capability to accept tagged pointers, adjust > > > > the devtlb to act accordingly. > > > > > > > > > > > > Both are orthogonal features, but there is an intersection of both > > > > that are fundamentally incompatible. > > > > > > > > Its even more important, since an application might be using SVA under the > > > > cover provided by some library that's used without their knowledge. > > > > > > > > The path would be: > > > > > > > > 1. Ensure both LAM and SVM are incompatible by design, without major > > > > changes. > > > > - If LAM is enabled already and later SVM enabling is requested by > > > > user, that should fail. and Vice versa. > > > > - Provide an API to user to ask for opt-out. Now they know they > > > > must sanitize the pointers before sending to device, or the > > > > working set is already isolated and needs no work. > > > > > > The patch below implements something like this. It is PoC, build-tested only. > > > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > > dirty. But I don't see any better way as we tie orthogonal features > > > together. > > > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > > What the API has to look like? > > > > Jacob, Ashok, any comment on this part? > > > > I expect in many cases LAM will be enabled very early (like before malloc > > is functinal) in process start and it makes PASID allocation always fail. > > > > Any way out? > > We need closure on this to proceed. Any clue? Failing PASID allocation seems like the right thing to do here. If the application is explicitly allocating PASID's it can opt-out using the similar mechanism you have for LAM enabling. So user takes responsibility for sanitizing pointers. If some library is using an accelerator without application knowledge, that would use the failure as a mechanism to use an alternate path if one exists. I don't know if both LAM and SVM need a separate forced opt-in (or i don't have an opinion rather). Is this what you were asking? + Joerg, JasonG in case they have an opinion. Cheers, Ashok
On Wed, Sep 14, 2022 at 08:11:19AM -0700, Ashok Raj wrote: > On Wed, Sep 14, 2022 at 05:45:18PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 13, 2022 at 01:49:30AM +0300, Kirill A. Shutemov wrote: > > > On Sun, Sep 04, 2022 at 03:39:52AM +0300, Kirill A. Shutemov wrote: > > > > On Thu, Sep 01, 2022 at 05:45:08PM +0000, Ashok Raj wrote: > > > > > Hi Kirill, > > > > > > > > > > On Tue, Aug 30, 2022 at 04:00:53AM +0300, Kirill A. Shutemov wrote: > > > > > > Linear Address Masking[1] (LAM) modifies the checking that is applied to > > > > > > 64-bit linear addresses, allowing software to use of the untranslated > > > > > > address bits for metadata. > > > > > > > > > > We discussed this internally, but didn't bubble up here. > > > > > > > > > > Given that we are working on enabling Shared Virtual Addressing (SVA) > > > > > within the IOMMU. This permits user to share VA directly with the device, > > > > > and the device can participate even in fixing page-faults and such. > > > > > > > > > > IOMMU enforces canonical addressing, since we are hijacking the top order > > > > > bits for meta-data, it will fail sanity check and we would return a failure > > > > > back to device on any page-faults from device. > > > > > > > > > > It also complicates how device TLB and ATS work, and needs some major > > > > > improvements to detect device capability to accept tagged pointers, adjust > > > > > the devtlb to act accordingly. > > > > > > > > > > > > > > > Both are orthogonal features, but there is an intersection of both > > > > > that are fundamentally incompatible. > > > > > > > > > > Its even more important, since an application might be using SVA under the > > > > > cover provided by some library that's used without their knowledge. > > > > > > > > > > The path would be: > > > > > > > > > > 1. Ensure both LAM and SVM are incompatible by design, without major > > > > > changes. > > > > > - If LAM is enabled already and later SVM enabling is requested by > > > > > user, that should fail. and Vice versa. > > > > > - Provide an API to user to ask for opt-out. Now they know they > > > > > must sanitize the pointers before sending to device, or the > > > > > working set is already isolated and needs no work. > > > > > > > > The patch below implements something like this. It is PoC, build-tested only. > > > > > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > > > dirty. But I don't see any better way as we tie orthogonal features > > > > together. > > > > > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > > > What the API has to look like? > > > > > > Jacob, Ashok, any comment on this part? > > > > > > I expect in many cases LAM will be enabled very early (like before malloc > > > is functinal) in process start and it makes PASID allocation always fail. > > > > > > Any way out? > > > > We need closure on this to proceed. Any clue? > > Failing PASID allocation seems like the right thing to do here. If the > application is explicitly allocating PASID's it can opt-out using the > similar mechanism you have for LAM enabling. So user takes > responsibility for sanitizing pointers. > > If some library is using an accelerator without application knowledge, > that would use the failure as a mechanism to use an alternate path if > one exists. > > I don't know if both LAM and SVM need a separate forced opt-in (or i > don't have an opinion rather). Is this what you were asking? > > + Joerg, JasonG in case they have an opinion. My point is that the patch provides a way to override LAM vs. PASID mutual exclusion, but only if PASID allocated first. If we enabled LAM before PASID is allcoated there's no way to forcefully allocate PASID, bypassing LAM check. I think there should be one, no? -- Kiryl Shutsemau / Kirill A. Shutemov
On Wed, Sep 14, 2022 at 06:18:18PM +0300, Kirill A. Shutemov wrote: > > > > > > > > > > The patch below implements something like this. It is PoC, build-tested only. > > > > > > > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > > > > dirty. But I don't see any better way as we tie orthogonal features > > > > > together. > > > > > > > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > > > > What the API has to look like? > > > > > > > > Jacob, Ashok, any comment on this part? > > > > > > > > I expect in many cases LAM will be enabled very early (like before malloc > > > > is functinal) in process start and it makes PASID allocation always fail. > > > > > > > > Any way out? > > > > > > We need closure on this to proceed. Any clue? > > > > Failing PASID allocation seems like the right thing to do here. If the > > application is explicitly allocating PASID's it can opt-out using the > > similar mechanism you have for LAM enabling. So user takes > > responsibility for sanitizing pointers. > > > > If some library is using an accelerator without application knowledge, > > that would use the failure as a mechanism to use an alternate path if > > one exists. > > > > I don't know if both LAM and SVM need a separate forced opt-in (or i > > don't have an opinion rather). Is this what you were asking? > > > > + Joerg, JasonG in case they have an opinion. > > My point is that the patch provides a way to override LAM vs. PASID mutual > exclusion, but only if PASID allocated first. If we enabled LAM before > PASID is allcoated there's no way to forcefully allocate PASID, bypassing > LAM check. I think there should be one, no? Yes, we should have one for force enabling SVM too if the application asks for forgiveness.
On Wed, Sep 14, 2022 at 08:31:56AM -0700, Ashok Raj wrote: > On Wed, Sep 14, 2022 at 06:18:18PM +0300, Kirill A. Shutemov wrote: > > > > > > > > > > > > The patch below implements something like this. It is PoC, build-tested only. > > > > > > > > > > > > To be honest, I hate it. It is clearly a layering violation. It feels > > > > > > dirty. But I don't see any better way as we tie orthogonal features > > > > > > together. > > > > > > > > > > > > Also I have no idea how to make forced PASID allocation if LAM enabled. > > > > > > What the API has to look like? > > > > > > > > > > Jacob, Ashok, any comment on this part? > > > > > > > > > > I expect in many cases LAM will be enabled very early (like before malloc > > > > > is functinal) in process start and it makes PASID allocation always fail. > > > > > > > > > > Any way out? > > > > > > > > We need closure on this to proceed. Any clue? > > > > > > Failing PASID allocation seems like the right thing to do here. If the > > > application is explicitly allocating PASID's it can opt-out using the > > > similar mechanism you have for LAM enabling. So user takes > > > responsibility for sanitizing pointers. > > > > > > If some library is using an accelerator without application knowledge, > > > that would use the failure as a mechanism to use an alternate path if > > > one exists. > > > > > > I don't know if both LAM and SVM need a separate forced opt-in (or i > > > don't have an opinion rather). Is this what you were asking? > > > > > > + Joerg, JasonG in case they have an opinion. > > > > My point is that the patch provides a way to override LAM vs. PASID mutual > > exclusion, but only if PASID allocated first. If we enabled LAM before > > PASID is allcoated there's no way to forcefully allocate PASID, bypassing > > LAM check. I think there should be one, no? > > Yes, we should have one for force enabling SVM too if the application > asks for forgiveness. What is the right API here? -- Kiryl Shutsemau / Kirill A. Shutemov
Hi Kirill, On Wed, 14 Sep 2022 18:45:32 +0300, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > On Wed, Sep 14, 2022 at 08:31:56AM -0700, Ashok Raj wrote: > > On Wed, Sep 14, 2022 at 06:18:18PM +0300, Kirill A. Shutemov wrote: > > > > > > > > > > > > > > The patch below implements something like this. It is PoC, > > > > > > > build-tested only. > > > > > > > > > > > > > > To be honest, I hate it. It is clearly a layering violation. > > > > > > > It feels dirty. But I don't see any better way as we tie > > > > > > > orthogonal features together. > > > > > > > > > > > > > > Also I have no idea how to make forced PASID allocation if > > > > > > > LAM enabled. What the API has to look like? > > > > > > > > > > > > Jacob, Ashok, any comment on this part? > > > > > > > > > > > > I expect in many cases LAM will be enabled very early (like > > > > > > before malloc is functinal) in process start and it makes PASID > > > > > > allocation always fail. > > > > > > > > > > > > Any way out? > > > > > > > > > > We need closure on this to proceed. Any clue? > > > > > > > > Failing PASID allocation seems like the right thing to do here. If > > > > the application is explicitly allocating PASID's it can opt-out > > > > using the similar mechanism you have for LAM enabling. So user takes > > > > responsibility for sanitizing pointers. > > > > > > > > If some library is using an accelerator without application > > > > knowledge, that would use the failure as a mechanism to use an > > > > alternate path if one exists. > > > > > > > > I don't know if both LAM and SVM need a separate forced opt-in (or i > > > > don't have an opinion rather). Is this what you were asking? > > > > > > > > + Joerg, JasonG in case they have an opinion. > > > > > > My point is that the patch provides a way to override LAM vs. PASID > > > mutual exclusion, but only if PASID allocated first. If we enabled > > > LAM before PASID is allcoated there's no way to forcefully allocate > > > PASID, bypassing LAM check. I think there should be one, no? > > > > Yes, we should have one for force enabling SVM too if the application > > asks for forgiveness. > > What is the right API here? > It seems very difficult to implement a UAPI for the applications to override at a runtime. Currently, SVM bind is under the control of individual drivers. It could be at the time of open or some ioctl. Perhaps, this can be a platform policy via some commandline option. e.g. intel_iommu=sva_lam_coexist. Thanks, Jacob
On Wed, Sep 14, 2022 at 04:51:16PM -0700, Jacob Pan wrote: > Hi Kirill, > > On Wed, 14 Sep 2022 18:45:32 +0300, "Kirill A. Shutemov" > <kirill.shutemov@linux.intel.com> wrote: > > > On Wed, Sep 14, 2022 at 08:31:56AM -0700, Ashok Raj wrote: > > > On Wed, Sep 14, 2022 at 06:18:18PM +0300, Kirill A. Shutemov wrote: > > > > > > > > > > > > > > > > The patch below implements something like this. It is PoC, > > > > > > > > build-tested only. > > > > > > > > > > > > > > > > To be honest, I hate it. It is clearly a layering violation. > > > > > > > > It feels dirty. But I don't see any better way as we tie > > > > > > > > orthogonal features together. > > > > > > > > > > > > > > > > Also I have no idea how to make forced PASID allocation if > > > > > > > > LAM enabled. What the API has to look like? > > > > > > > > > > > > > > Jacob, Ashok, any comment on this part? > > > > > > > > > > > > > > I expect in many cases LAM will be enabled very early (like > > > > > > > before malloc is functinal) in process start and it makes PASID > > > > > > > allocation always fail. > > > > > > > > > > > > > > Any way out? > > > > > > > > > > > > We need closure on this to proceed. Any clue? > > > > > > > > > > Failing PASID allocation seems like the right thing to do here. If > > > > > the application is explicitly allocating PASID's it can opt-out > > > > > using the similar mechanism you have for LAM enabling. So user takes > > > > > responsibility for sanitizing pointers. > > > > > > > > > > If some library is using an accelerator without application > > > > > knowledge, that would use the failure as a mechanism to use an > > > > > alternate path if one exists. > > > > > > > > > > I don't know if both LAM and SVM need a separate forced opt-in (or i > > > > > don't have an opinion rather). Is this what you were asking? > > > > > > > > > > + Joerg, JasonG in case they have an opinion. > > > > > > > > My point is that the patch provides a way to override LAM vs. PASID > > > > mutual exclusion, but only if PASID allocated first. If we enabled > > > > LAM before PASID is allcoated there's no way to forcefully allocate > > > > PASID, bypassing LAM check. I think there should be one, no? > > > > > > Yes, we should have one for force enabling SVM too if the application > > > asks for forgiveness. > > > > What is the right API here? > > > It seems very difficult to implement a UAPI for the applications to > override at a runtime. Currently, SVM bind is under the control of > individual drivers. It could be at the time of open or some ioctl. > > Perhaps, this can be a platform policy via some commandline option. e.g. > intel_iommu=sva_lam_coexist. I think it has to be per-process, not a system-wide handle. Maybe a separate arch_prctl() to allow to enable LAM/SVM coexisting? It would cover both sides of the API, relaxing check for both. -- Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Sep 15, 2022 at 12:01:35PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 14, 2022 at 04:51:16PM -0700, Jacob Pan wrote:
> > Hi Kirill,
> >
> > On Wed, 14 Sep 2022 18:45:32 +0300, "Kirill A. Shutemov"
> > <kirill.shutemov@linux.intel.com> wrote:
> >
> > > On Wed, Sep 14, 2022 at 08:31:56AM -0700, Ashok Raj wrote:
> > > > On Wed, Sep 14, 2022 at 06:18:18PM +0300, Kirill A. Shutemov wrote:
> > > > > > > > >
> > > > > > > > > The patch below implements something like this. It is PoC,
> > > > > > > > > build-tested only.
> > > > > > > > >
> > > > > > > > > To be honest, I hate it. It is clearly a layering violation.
> > > > > > > > > It feels dirty. But I don't see any better way as we tie
> > > > > > > > > orthogonal features together.
> > > > > > > > >
> > > > > > > > > Also I have no idea how to make forced PASID allocation if
> > > > > > > > > LAM enabled. What the API has to look like?
> > > > > > > >
> > > > > > > > Jacob, Ashok, any comment on this part?
> > > > > > > >
> > > > > > > > I expect in many cases LAM will be enabled very early (like
> > > > > > > > before malloc is functinal) in process start and it makes PASID
> > > > > > > > allocation always fail.
> > > > > > > >
> > > > > > > > Any way out?
> > > > > > >
> > > > > > > We need closure on this to proceed. Any clue?
> > > > > >
> > > > > > Failing PASID allocation seems like the right thing to do here. If
> > > > > > the application is explicitly allocating PASID's it can opt-out
> > > > > > using the similar mechanism you have for LAM enabling. So user takes
> > > > > > responsibility for sanitizing pointers.
> > > > > >
> > > > > > If some library is using an accelerator without application
> > > > > > knowledge, that would use the failure as a mechanism to use an
> > > > > > alternate path if one exists.
> > > > > >
> > > > > > I don't know if both LAM and SVM need a separate forced opt-in (or i
> > > > > > don't have an opinion rather). Is this what you were asking?
> > > > > >
> > > > > > + Joerg, JasonG in case they have an opinion.
> > > > >
> > > > > My point is that the patch provides a way to override LAM vs. PASID
> > > > > mutual exclusion, but only if PASID allocated first. If we enabled
> > > > > LAM before PASID is allcoated there's no way to forcefully allocate
> > > > > PASID, bypassing LAM check. I think there should be one, no?
> > > >
> > > > Yes, we should have one for force enabling SVM too if the application
> > > > asks for forgiveness.
> > >
> > > What is the right API here?
> > >
> > It seems very difficult to implement a UAPI for the applications to
> > override at a runtime. Currently, SVM bind is under the control of
> > individual drivers. It could be at the time of open or some ioctl.
> >
> > Perhaps, this can be a platform policy via some commandline option. e.g.
> > intel_iommu=sva_lam_coexist.
>
> I think it has to be per-process, not a system-wide handle.
>
> Maybe a separate arch_prctl() to allow to enable LAM/SVM coexisting?
> It would cover both sides of the API, relaxing check for both.
Maybe something like the patch below. Build tested only.
I really struggle with naming here. Any suggestions on what XXX has to be
replaced with? I don't think it has to be limited to LAM as some other
tagging implementation may come later.
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 2fdb390040b5..0a38b52b7b5e 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -12,6 +12,8 @@
#define MM_CONTEXT_UPROBE_IA32 BIT(0)
/* vsyscall page is accessible on this MM */
#define MM_CONTEXT_HAS_VSYSCALL BIT(1)
+/* Allow LAM and SVM coexisting */
+#define MM_CONTEXT_XXX BIT(2)
/*
* x86 has arch-specific MMU state beyond what lives in mm_struct.
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 3736f41948e9..d4a0994e5bc7 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -113,6 +113,8 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm)
mm->context.untag_mask = -1UL;
}
+#define arch_can_alloc_pasid(mm) \
+ (!mm_lam_cr3_mask(mm) || (mm->context.flags & MM_CONTEXT_XXX))
#else
static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index a31e27b95b19..3b77d51c7e6c 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -23,5 +23,6 @@
#define ARCH_GET_UNTAG_MASK 0x4001
#define ARCH_ENABLE_TAGGED_ADDR 0x4002
#define ARCH_GET_MAX_TAG_BITS 0x4003
+#define ARCH_XXX 0x4004
#endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9aa85e74e59e..111843c9dd40 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -793,6 +793,11 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
goto out;
}
+ if (pasid_valid(mm->pasid) && !(mm->context.flags & MM_CONTEXT_XXX)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
if (!nr_bits) {
ret = -EINVAL;
goto out;
@@ -911,6 +916,12 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
(unsigned long __user *)arg2);
case ARCH_ENABLE_TAGGED_ADDR:
return prctl_enable_tagged_addr(task->mm, arg2);
+ case ARCH_XXX:
+ if (mmap_write_lock_killable(task->mm))
+ return -EINTR;
+ task->mm->context.flags |= MM_CONTEXT_XXX;
+ mmap_write_unlock(task->mm);
+ return 0;
case ARCH_GET_MAX_TAG_BITS: {
int nr_bits;
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..ed76cdfa3e6b 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -2,6 +2,8 @@
/*
* Helpers for IOMMU drivers implementing SVA
*/
+#include <linux/mm.h>
+#include <linux/mmu_context.h>
#include <linux/mutex.h>
#include <linux/sched/mm.h>
@@ -31,7 +33,17 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
min == 0 || max < min)
return -EINVAL;
+ /* Serialize against address tagging enabling */
+ if (mmap_write_lock_killable(mm))
+ return -EINTR;
+
+ if (!arch_can_alloc_pasid(mm)) {
+ mmap_write_unlock(mm);
+ return -EBUSY;
+ }
+
mutex_lock(&iommu_sva_lock);
+
/* Is a PASID already associated with this mm? */
if (pasid_valid(mm->pasid)) {
if (mm->pasid < min || mm->pasid >= max)
@@ -46,6 +58,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
mm_pasid_set(mm, pasid);
out:
mutex_unlock(&iommu_sva_lock);
+ mmap_write_unlock(mm);
return ret;
}
EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index b9b970f7ab45..1649b080d844 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -28,4 +28,8 @@ static inline void leave_mm(int cpu) { }
# define task_cpu_possible(cpu, p) cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
#endif
+#ifndef arch_can_alloc_pasid
+#define arch_can_alloc_pasid(mm) true
+#endif
+
#endif
--
Kiryl Shutsemau / Kirill A. Shutemov
On 9/15/22 10:28, Kirill A. Shutemov wrote:> + /* Serialize against
address tagging enabling *
> + if (mmap_write_lock_killable(mm))
> + return -EINTR;
> +
> + if (!arch_can_alloc_pasid(mm)) {
> + mmap_write_unlock(mm);
> + return -EBUSY;
> + }
Shouldn't this actually be some kind of *device* check?
The question here is whether the gunk in the mm's address space is
compatible with the device.
* Can the device walk the page tables in use under the mm?
* Does the device interpret addresses the same way as the CPUs
using the mm?
The page table format is, right now, wholly determined at boot at the
latest. But, it probably wouldn't hurt to pretend like it _might_
change at runtime.
The address interpretation part is, of course, what LAM changes. It's
also arguable that it includes features like protection keys. I can
totally see a case where folks might want to be careful and disallow
device access to an mm where pkeys are in use.
On Wed, Sep 21, 2022 at 09:57:47AM -0700, Dave Hansen wrote:
> On 9/15/22 10:28, Kirill A. Shutemov wrote:> + /* Serialize against
> address tagging enabling *
> > + if (mmap_write_lock_killable(mm))
> > + return -EINTR;
> > +
> > + if (!arch_can_alloc_pasid(mm)) {
> > + mmap_write_unlock(mm);
> > + return -EBUSY;
> > + }
>
> Shouldn't this actually be some kind of *device* check?
The device will enable svm only when its capable of it, and performs all
the normal capability checks like PASID, ATS etc before enabling it.
This is the final step before the mm is hooked up with the IOMMU.
>
> The question here is whether the gunk in the mm's address space is
> compatible with the device.
>
> * Can the device walk the page tables in use under the mm?
> * Does the device interpret addresses the same way as the CPUs
> using the mm?
>
> The page table format is, right now, wholly determined at boot at the
> latest. But, it probably wouldn't hurt to pretend like it _might_
> change at runtime.
>
> The address interpretation part is, of course, what LAM changes. It's
> also arguable that it includes features like protection keys. I can
> totally see a case where folks might want to be careful and disallow
> device access to an mm where pkeys are in use.
On 9/21/22 10:08, Ashok Raj wrote:
> On Wed, Sep 21, 2022 at 09:57:47AM -0700, Dave Hansen wrote:
>> On 9/15/22 10:28, Kirill A. Shutemov wrote:> + /* Serialize against
>> address tagging enabling *
>>> + if (mmap_write_lock_killable(mm))
>>> + return -EINTR;
>>> +
>>> + if (!arch_can_alloc_pasid(mm)) {
>>> + mmap_write_unlock(mm);
>>> + return -EBUSY;
>>> + }
>> Shouldn't this actually be some kind of *device* check?
> The device will enable svm only when its capable of it, and performs all
> the normal capability checks like PASID, ATS etc before enabling it.
> This is the final step before the mm is hooked up with the IOMMU.
What does that mean, though?
Are you saying that any device compatibility with an mm is solely
determined by the IOMMU in play, so the IOMMU code should host the mm
compatibility checks?
On Wed, Sep 21, 2022 at 10:11:46AM -0700, Dave Hansen wrote: > Are you saying that any device compatibility with an mm is solely > determined by the IOMMU in play, so the IOMMU code should host the mm > compatibility checks? Yes, exactly. Only the HW entity that walks the page tables needs to understand their parsing rules and in this case that is only the IOMMU block. Jason
On Wed, Sep 21, 2022 at 03:11:34PM -0300, Jason Gunthorpe wrote: > On Wed, Sep 21, 2022 at 10:11:46AM -0700, Dave Hansen wrote: > > > Are you saying that any device compatibility with an mm is solely > > determined by the IOMMU in play, so the IOMMU code should host the mm > > compatibility checks? > > Yes, exactly. Only the HW entity that walks the page tables needs to > understand their parsing rules and in this case that is only the IOMMU > block. But device has to know what bits of the virtual address are significant to handle device TLB lookup/store correctly, no? -- Kiryl Shutsemau / Kirill A. Shutemov
On Fri, Sep 23, 2022 at 03:42:39AM +0300, Kirill A. Shutemov wrote: > On Wed, Sep 21, 2022 at 03:11:34PM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 21, 2022 at 10:11:46AM -0700, Dave Hansen wrote: > > > > > Are you saying that any device compatibility with an mm is solely > > > determined by the IOMMU in play, so the IOMMU code should host the mm > > > compatibility checks? > > > > Yes, exactly. Only the HW entity that walks the page tables needs to > > understand their parsing rules and in this case that is only the IOMMU > > block. > > But device has to know what bits of the virtual address are significant to > handle device TLB lookup/store correctly, no? For a device that also cares about tagging, yes. But in the current world we don't have such devices. IOMMU only knows about the shared cr3 we placed in the PASID table entries to walk page-tables. I hope the page-tables don't factor the meta-data bits correct? So I would assume an untagged pointer should just be fine for the IOMMU to walk. IOMMU currently wants canonical addresses for VA.
On Thu, Sep 22, 2022 at 10:27:45PM -0700, Ashok Raj wrote: > On Fri, Sep 23, 2022 at 03:42:39AM +0300, Kirill A. Shutemov wrote: > > On Wed, Sep 21, 2022 at 03:11:34PM -0300, Jason Gunthorpe wrote: > > > On Wed, Sep 21, 2022 at 10:11:46AM -0700, Dave Hansen wrote: > > > > > > > Are you saying that any device compatibility with an mm is solely > > > > determined by the IOMMU in play, so the IOMMU code should host the mm > > > > compatibility checks? > > > > > > Yes, exactly. Only the HW entity that walks the page tables needs to > > > understand their parsing rules and in this case that is only the IOMMU > > > block. > > > > But device has to know what bits of the virtual address are significant to > > handle device TLB lookup/store correctly, no? > > For a device that also cares about tagging, yes. But in the current > world we don't have such devices. IOMMU only knows about the shared cr3 > we placed in the PASID table entries to walk page-tables. I hope the > page-tables don't factor the meta-data bits correct? Correct. Page tables contain only physical addresses, so they have no exposure to tags in the virtual addresses. > So I would assume an untagged pointer should just be fine for the IOMMU > to walk. IOMMU currently wants canonical addresses for VA. Right. But it means that LAM compatibility can be block on two layers: IOMMU and device. IOMMU is not the only HW entity that has to be aware of tagged pointers. So where to put compatibility check that can cover both device and IOMMU, considering that in the future they can get aware about tagged pointers? -- Kiryl Shutsemau / Kirill A. Shutemov
On Fri, Sep 23, 2022 at 12:38:26PM +0300, Kirill A. Shutemov wrote: > > So I would assume an untagged pointer should just be fine for the IOMMU > > to walk. IOMMU currently wants canonical addresses for VA. > > Right. But it means that LAM compatibility can be block on two layers: > IOMMU and device. IOMMU is not the only HW entity that has to be aware of > tagged pointers. Why does a device need to care about this? What do you imagine a device doing with it? The userspace should program the device with the tagged address, the device should present the tagged address on the bus, the IOMMU should translate the tagged address the same as the CPU by ignoring the upper bits. Jason
On 9/23/22 04:46, Jason Gunthorpe wrote: > On Fri, Sep 23, 2022 at 12:38:26PM +0300, Kirill A. Shutemov wrote: >>> So I would assume an untagged pointer should just be fine for the IOMMU >>> to walk. IOMMU currently wants canonical addresses for VA. >> Right. But it means that LAM compatibility can be block on two layers: >> IOMMU and device. IOMMU is not the only HW entity that has to be aware of >> tagged pointers. > Why does a device need to care about this? What do you imagine a > device doing with it? > > The userspace should program the device with the tagged address, the > device should present the tagged address on the bus, the IOMMU should > translate the tagged address the same as the CPU by ignoring the upper > bits. Is this how *every* access works? Every single device access to the address space goes through the IOMMU? I thought devices also cached address translation responses from the IOMMU and stashed them in their own device-local TLB. If the device is unaware of the tags, then how does device TLB invalidation work? Would all device TLB flushes be full flushes of the devices TLB? If something tried to use single-address invalidation, it would need to invalidate every possible tag alias because the device wouldn't know that the tags *are* tags instead of actual virtual addresses.
On Fri, Sep 23, 2022 at 07:18:42AM -0700, Dave Hansen wrote: > On 9/23/22 04:46, Jason Gunthorpe wrote: > > On Fri, Sep 23, 2022 at 12:38:26PM +0300, Kirill A. Shutemov wrote: > >>> So I would assume an untagged pointer should just be fine for the IOMMU > >>> to walk. IOMMU currently wants canonical addresses for VA. > >> Right. But it means that LAM compatibility can be block on two layers: > >> IOMMU and device. IOMMU is not the only HW entity that has to be aware of > >> tagged pointers. > > Why does a device need to care about this? What do you imagine a > > device doing with it? > > > > The userspace should program the device with the tagged address, the > > device should present the tagged address on the bus, the IOMMU should > > translate the tagged address the same as the CPU by ignoring the upper > > bits. > > Is this how *every* access works? Every single device access to the > address space goes through the IOMMU? > > I thought devices also cached address translation responses from the > IOMMU and stashed them in their own device-local TLB. If the device is > unaware of the tags, then how does device TLB invalidation work? Would This is coming a full circle now :-) Since the device doesn't understand tagging, SVM and tagging aren't compatible. If you need SVM, you can only send sanitized pointers to the device period. In fact our page-request even looks for canonical checks before doing the page-faulting. > all device TLB flushes be full flushes of the devices TLB? If something > tried to use single-address invalidation, it would need to invalidate > every possible tag alias because the device wouldn't know that the tags > *are* tags instead of actual virtual addresses. Once tagging is extended into the PCI SIG, and devices know to work with them, so will the IOMMU, then they can all play in the same field. Until then they are isolated, or let SVM only work with untagged VA's.
On 9/23/22 08:28, Ashok Raj wrote: >> >> I thought devices also cached address translation responses from the >> IOMMU and stashed them in their own device-local TLB. If the device is >> unaware of the tags, then how does device TLB invalidation work? Would > This is coming a full circle now :-) > > Since the device doesn't understand tagging, SVM and tagging aren't > compatible. If you need SVM, you can only send sanitized pointers to the > device period. In fact our page-request even looks for canonical checks > before doing the page-faulting. > >> all device TLB flushes be full flushes of the devices TLB? If something >> tried to use single-address invalidation, it would need to invalidate >> every possible tag alias because the device wouldn't know that the tags >> *are* tags instead of actual virtual addresses. > Once tagging is extended into the PCI SIG, and devices know to work with > them, so will the IOMMU, then they can all play in the same field. Until > then they are isolated, or let SVM only work with untagged VA's. But, the point that Kirill and I were getting at is still that devices *have* a role to play here. The idea that this can be hidden at the IOMMU layer is pure fantasy. Right?
On Fri, Sep 23, 2022 at 08:31:13AM -0700, Dave Hansen wrote: > On 9/23/22 08:28, Ashok Raj wrote: > >> > >> I thought devices also cached address translation responses from the > >> IOMMU and stashed them in their own device-local TLB. If the device is > >> unaware of the tags, then how does device TLB invalidation work? Would > > This is coming a full circle now :-) > > > > Since the device doesn't understand tagging, SVM and tagging aren't > > compatible. If you need SVM, you can only send sanitized pointers to the > > device period. In fact our page-request even looks for canonical checks > > before doing the page-faulting. > > > >> all device TLB flushes be full flushes of the devices TLB? If something > >> tried to use single-address invalidation, it would need to invalidate > >> every possible tag alias because the device wouldn't know that the tags > >> *are* tags instead of actual virtual addresses. > > Once tagging is extended into the PCI SIG, and devices know to work with > > them, so will the IOMMU, then they can all play in the same field. Until > > then they are isolated, or let SVM only work with untagged VA's. > > But, the point that Kirill and I were getting at is still that devices > *have* a role to play here. The idea that this can be hidden at the > IOMMU layer is pure fantasy. Right? If you *can't* send tagged memory to the device, what is the role the device need to play? For now you can only send proper VA's that are canonical.
On 9/23/22 08:44, Ashok Raj wrote: >> But, the point that Kirill and I were getting at is still that devices >> *have* a role to play here. The idea that this can be hidden at the >> IOMMU layer is pure fantasy. Right? > If you *can't* send tagged memory to the device, what is the > role the device need to play? > > For now you can only send proper VA's that are canonical. Today, yes, you have to keep tagged addresses away from devices. They must be sequestered in a place that only the CPU can find them. The observation that Kirill and I had is that there are thing that are done solely on the device today -- like accessing a translated address twice -- without IOMMU involvement. We were trying to figure out how that would work in the future once tagged addresses are exposed to devices and they implement all the new PCI magic. After our private chat, I think the answer is that devices *have* a role to play. Device-side logic must know how to untag memory before asking for translation or even *deciding* to ask for address translation. But, hopefully, the communicating that untagging information to the device will be done in a device-agnostic, standardized way, just how PASIDs or ATS are handled today.
On Fri, Sep 23, 2022 at 09:23:40AM -0700, Dave Hansen wrote: > After our private chat, I think the answer is that devices *have* a role > to play. Device-side logic must know how to untag memory before asking > for translation or even *deciding* to ask for address translation. But, > hopefully, the communicating that untagging information to the device > will be done in a device-agnostic, standardized way, just how PASIDs or > ATS are handled today. Right, that would be my hope, we will see what PCI SIG standardizes. Jason
On Fri, Sep 23, 2022 at 07:18:42AM -0700, Dave Hansen wrote: > On 9/23/22 04:46, Jason Gunthorpe wrote: > > On Fri, Sep 23, 2022 at 12:38:26PM +0300, Kirill A. Shutemov wrote: > >>> So I would assume an untagged pointer should just be fine for the IOMMU > >>> to walk. IOMMU currently wants canonical addresses for VA. > >> Right. But it means that LAM compatibility can be block on two layers: > >> IOMMU and device. IOMMU is not the only HW entity that has to be aware of > >> tagged pointers. > > Why does a device need to care about this? What do you imagine a > > device doing with it? > > > > The userspace should program the device with the tagged address, the > > device should present the tagged address on the bus, the IOMMU should > > translate the tagged address the same as the CPU by ignoring the upper > > bits. > > Is this how *every* access works? Every single device access to the > address space goes through the IOMMU? > > I thought devices also cached address translation responses from the > IOMMU and stashed them in their own device-local TLB. Ah, you are worried about invalidation. There is an optional PCI feature called ATS that is this caching, and it is mandatory if the IOMMU will use the CPU page table. In ATS the invalidation is triggered by the iommu driver in a device agnostic way. The PCI spec has no provision to invalidate with a mask, only linear chunks of address space can be invalidated. Presumably when someone wants to teach the iommu to process LAM they will also have to teach the the PCI spec how to do LAM too. I would not like to see a world where drivers have to deal with this. ATS is completely transparent to the driver, it should stay that way. Devices should handle LAM through some PCI SIG agnostic approach and that will be contained entirely in the iommu driver. Jason
On Fri, Sep 23, 2022 at 11:42:39AM -0300, Jason Gunthorpe wrote: > On Fri, Sep 23, 2022 at 07:18:42AM -0700, Dave Hansen wrote: > > On 9/23/22 04:46, Jason Gunthorpe wrote: > > > On Fri, Sep 23, 2022 at 12:38:26PM +0300, Kirill A. Shutemov wrote: > > >>> So I would assume an untagged pointer should just be fine for the IOMMU > > >>> to walk. IOMMU currently wants canonical addresses for VA. > > >> Right. But it means that LAM compatibility can be block on two layers: > > >> IOMMU and device. IOMMU is not the only HW entity that has to be aware of > > >> tagged pointers. > > > Why does a device need to care about this? What do you imagine a > > > device doing with it? > > > > > > The userspace should program the device with the tagged address, the > > > device should present the tagged address on the bus, the IOMMU should > > > translate the tagged address the same as the CPU by ignoring the upper > > > bits. > > > > Is this how *every* access works? Every single device access to the > > address space goes through the IOMMU? > > > > I thought devices also cached address translation responses from the > > IOMMU and stashed them in their own device-local TLB. > > Ah, you are worried about invalidation. > > There is an optional PCI feature called ATS that is this caching, and > it is mandatory if the IOMMU will use the CPU page table. > > In ATS the invalidation is triggered by the iommu driver in a device > agnostic way. > > The PCI spec has no provision to invalidate with a mask, only linear > chunks of address space can be invalidated. This part is currently being worked on in the PCI SIG. Once we have something like this we can teach which portion of the VA to mask. Ofcourse this will take a while before PCI sig standardizes it and we will start seeing devices that support them. BTW, this is a problem for all vendors that support SVM and LAM/MTT. I see ARM in the SMMU 3.1 doc states it doesn't support MTT at the moment, just like the Intel IOMMU. I hope the API we develop must work across all vendors.
On Wed, Sep 21, 2022 at 10:11:46AM -0700, Dave Hansen wrote:
> On 9/21/22 10:08, Ashok Raj wrote:
> > On Wed, Sep 21, 2022 at 09:57:47AM -0700, Dave Hansen wrote:
> >> On 9/15/22 10:28, Kirill A. Shutemov wrote:> + /* Serialize against
> >> address tagging enabling *
> >>> + if (mmap_write_lock_killable(mm))
> >>> + return -EINTR;
> >>> +
> >>> + if (!arch_can_alloc_pasid(mm)) {
> >>> + mmap_write_unlock(mm);
> >>> + return -EBUSY;
> >>> + }
> >> Shouldn't this actually be some kind of *device* check?
> > The device will enable svm only when its capable of it, and performs all
> > the normal capability checks like PASID, ATS etc before enabling it.
> > This is the final step before the mm is hooked up with the IOMMU.
>
> What does that mean, though?
>
> Are you saying that any device compatibility with an mm is solely
> determined by the IOMMU in play, so the IOMMU code should host the mm
> compatibility checks?
>
To check if a device supports SVM like capabilities it needs to support
the following PCIe capabilities.
- PASID
- Page Request Interface (PRI) for dynamic page-faulting
- ATS - For quick VA->PA lookups.
The device purely works only with memory addresses and caches them in
its device TLB after a lookup via ATS.
When device does ATS, it sends a translation request, and IOMMU will
walk the page-tables to give the PA back. It can use it until it gets an
invalidation.
So the device doesn't need to know page-table formats. but if you use
tagged pointers its something you want to check device support for it. I
don't think there is any plans right now to support something like the
following.
- Check device ability to work with tagged pointers.
- OS should configure the width to ignore etc
- Device TLB's properly handle the tagged portion without creating
aliasing etc.
In order for LAM and SVM to play nicely you need
#1 IOMMU support for tagged pointers
#2 Device ability to handle tagged pointers.
#2 above is an additional check to perform in addition to PASID,PRI,ATS
checks we do today.
Cheers,
Ashok
> From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Sent: Friday, September 16, 2022 1:29 AM > > I really struggle with naming here. Any suggestions on what XXX has to be > replaced with? I don't think it has to be limited to LAM as some other > tagging implementation may come later. What about ARCH_ENABLE_TAGGED_ADDR_CPU to mark that the application tags address only on CPU and pays attention to untag when the address is programmed to a device? w/ ARCH_ENABLE_TAGGED_ADDR_CPU then LAM and SVA can co-exist. The original ARCH_ENABLE_TAGGED_ADDR means that tagged address is used on both CPU and device. Enabling sva on a device behind an iommu which doesn't support LAM is then rejected if LAM has been enabled. and vice versa. Thanks Kevin
On Thu, Sep 15, 2022 at 08:28:58PM +0300, Kirill A. Shutemov wrote:
> @@ -31,7 +33,17 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> min == 0 || max < min)
> return -EINVAL;
>
> + /* Serialize against address tagging enabling */
> + if (mmap_write_lock_killable(mm))
> + return -EINTR;
> +
> + if (!arch_can_alloc_pasid(mm)) {
> + mmap_write_unlock(mm);
> + return -EBUSY;
> + }
This has nothing to do with "allocating pasid"
Rather should be: "is the CPU page table compatible with the IOMMU HW
page table walker"
For this I would rather have a function that queries the format of the
page table under the mm_struct and we have enum values like
INTEL_NORMAL and INTEL_LAM as possible values.
The iommu driver will block incompatible page table formats, and when
it starts up it should assert something that blocks changing the
format.
Jason
On 9/20/22 06:14, Jason Gunthorpe wrote: > For this I would rather have a function that queries the format of the > page table under the mm_struct and we have enum values like > INTEL_NORMAL and INTEL_LAM as possible values. > > The iommu driver will block incompatible page table formats, and when > it starts up it should assert something that blocks changing the > format. That doesn't sound too bad. Except, please don't call it a "page table format". The format of the page tables does not change with LAM. It's entirely how the CPU interprets addresses that changes. I guess someone could make an argument that, with LAM, there is a "hole" that can't be filled in and _that_ constitutes a format change, but that'd be a stretch. The thing that matters when LAM is on is that some CPU might be stashing addresses somewhere that only have meaning when you interpret them with LAM rules. That's really a property of the mm, not of the page tables. Oh, and please don't call things "INTEL_WHATEVER". It looks silly and confuses the heck out of people when a second CPU vendor needs to use the same code.
On Tue, Sep 20, 2022 at 09:06:32AM -0700, Dave Hansen wrote: > On 9/20/22 06:14, Jason Gunthorpe wrote: > > For this I would rather have a function that queries the format of the > > page table under the mm_struct and we have enum values like > > INTEL_NORMAL and INTEL_LAM as possible values. > > > > The iommu driver will block incompatible page table formats, and when > > it starts up it should assert something that blocks changing the > > format. > > That doesn't sound too bad. Except, please don't call it a "page table > format". The format of the page tables does not change with LAM. It's > entirely how the CPU interprets addresses that changes. Sure it does. The rules for how the page table is walked change. The actual bits stored in memory might not be different, but that doesn't mean the format didn't change. If it didn't change we wouldn't have an incompatibility with the IOMMU HW walker. > Oh, and please don't call things "INTEL_WHATEVER". It looks silly and > confuses the heck out of people when a second CPU vendor needs to use > the same code. I don't mean something where a second CPU vendor would re-use the code, I mean literally to specify the exact format of the page tables, where each CPU and configuration can get its own code. eg "Intel x86-64 format 5 level, no lam" At the end of the day the iommu drivers need to know this information so they can program the HW to walk the same tables. Today it is all implicit that if you have an iommu driver then it can make assumptions about the layout, but if we add an API we may as well be a bit more explicit at the same time. Jason
Hi Jason, On Tue, 20 Sep 2022 13:27:27 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Sep 20, 2022 at 09:06:32AM -0700, Dave Hansen wrote: > > On 9/20/22 06:14, Jason Gunthorpe wrote: > > > For this I would rather have a function that queries the format of the > > > page table under the mm_struct and we have enum values like > > > INTEL_NORMAL and INTEL_LAM as possible values. > > > > > > The iommu driver will block incompatible page table formats, and when > > > it starts up it should assert something that blocks changing the > > > format. > > > > That doesn't sound too bad. Except, please don't call it a "page table > > format". The format of the page tables does not change with LAM. It's > > entirely how the CPU interprets addresses that changes. > > Sure it does. The rules for how the page table is walked change. The > actual bits stored in memory might not be different, but that doesn't > mean the format didn't change. If it didn't change we wouldn't have an > incompatibility with the IOMMU HW walker. There are many CPU-IOMMU compatibility checks before we do for SVA,e.g. we check paging mode in sva_bind. We are delegating these checks in arch/platform code. So why can't we let arch code decide how to convey mm-IOMMU SVA compatibility? let it be a flag ( as in this patch) or some callback. Perhaps a more descriptive name s/arch_can_alloc_pasid(mm)/arch_can_support_sva(mm)/ is all we disagreeing :) Thanks, Jacob
On Tue, Sep 20, 2022 at 11:41:04AM -0700, Jacob Pan wrote: > Hi Jason, > > On Tue, 20 Sep 2022 13:27:27 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Tue, Sep 20, 2022 at 09:06:32AM -0700, Dave Hansen wrote: > > > On 9/20/22 06:14, Jason Gunthorpe wrote: > > > > For this I would rather have a function that queries the format of the > > > > page table under the mm_struct and we have enum values like > > > > INTEL_NORMAL and INTEL_LAM as possible values. > > > > > > > > The iommu driver will block incompatible page table formats, and when > > > > it starts up it should assert something that blocks changing the > > > > format. > > > > > > That doesn't sound too bad. Except, please don't call it a "page table > > > format". The format of the page tables does not change with LAM. It's > > > entirely how the CPU interprets addresses that changes. > > > > Sure it does. The rules for how the page table is walked change. The > > actual bits stored in memory might not be different, but that doesn't > > mean the format didn't change. If it didn't change we wouldn't have an > > incompatibility with the IOMMU HW walker. > > There are many CPU-IOMMU compatibility checks before we do for SVA,e.g. we > check paging mode in sva_bind. We are delegating these checks in > arch/platform code. So why can't we let arch code decide how to convey > mm-IOMMU SVA compatibility? let it be a flag ( as in this patch) or some > callback. In general I'm not so keen on arch unique code for general ideas like this (ARM probably has the same issue), but sure it could work. > Perhaps a more descriptive name > s/arch_can_alloc_pasid(mm)/arch_can_support_sva(mm)/ is all we disagreeing > :) Except that still isn't what it is doing. "sva" can mean lots of things. You need to assert that the page table format is one of the formats that the iommu understands and configure the iommu to match it. It is a very simple question about what ruleset and memory layout govern the page table memory used by the CPU. And I think every CPU should be able to define a couple of their configurations in some enum, most of the PTE handling code is all hardwired, so I don't think we really support that many combinations anyhow? Jason
Hi Jason, On Tue, 20 Sep 2022 15:50:33 -0300, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Sep 20, 2022 at 11:41:04AM -0700, Jacob Pan wrote: > > Hi Jason, > > > > On Tue, 20 Sep 2022 13:27:27 -0300, Jason Gunthorpe <jgg@nvidia.com> > > wrote: > > > On Tue, Sep 20, 2022 at 09:06:32AM -0700, Dave Hansen wrote: > > > > On 9/20/22 06:14, Jason Gunthorpe wrote: > > > > > For this I would rather have a function that queries the format > > > > > of the page table under the mm_struct and we have enum values like > > > > > INTEL_NORMAL and INTEL_LAM as possible values. > > > > > > > > > > The iommu driver will block incompatible page table formats, and > > > > > when it starts up it should assert something that blocks changing > > > > > the format. > > > > > > > > That doesn't sound too bad. Except, please don't call it a "page > > > > table format". The format of the page tables does not change with > > > > LAM. It's entirely how the CPU interprets addresses that changes. > > > > > > > > > > Sure it does. The rules for how the page table is walked change. The > > > actual bits stored in memory might not be different, but that doesn't > > > mean the format didn't change. If it didn't change we wouldn't have an > > > incompatibility with the IOMMU HW walker. > > > > There are many CPU-IOMMU compatibility checks before we do for SVA,e.g. > > we check paging mode in sva_bind. We are delegating these checks in > > arch/platform code. So why can't we let arch code decide how to convey > > mm-IOMMU SVA compatibility? let it be a flag ( as in this patch) or some > > callback. > > In general I'm not so keen on arch unique code for general ideas like > this (ARM probably has the same issue), but sure it could work. > Creating an abstraction seems to belong to a separate endeavor when we have more than one user. For now, are you ok with the current approach? > > Perhaps a more descriptive name > > s/arch_can_alloc_pasid(mm)/arch_can_support_sva(mm)/ is all we > > disagreeing :) > > Except that still isn't what it is doing. "sva" can mean lots of > things. True, but sharing page table is the root cause of the incompatibility. IMHO, SVA means sharing page table at the highest level. > You need to assert that the page table format is one of the > formats that the iommu understands and configure the iommu to match > it. It is a very simple question about what ruleset and memory layout > govern the page table memory used by the CPU. the problem is more relevant to things like canonical address requirement than page table format. > And I think every CPU should be able to define a couple of their > configurations in some enum, most of the PTE handling code is all > hardwired, so I don't think we really support that many combinations > anyhow? sounds like a nice but separate effort, I don't know enough here. Thanks, Jacob
On Tue, Sep 20, 2022 at 01:44:30PM -0700, Jacob Pan wrote: > > In general I'm not so keen on arch unique code for general ideas like > > this (ARM probably has the same issue), but sure it could work. > > > Creating an abstraction seems to belong to a separate endeavor when we > have more than one user. For now, are you ok with the current approach? Sure, just don't give it a silly name like pasid or sva > > You need to assert that the page table format is one of the > > formats that the iommu understands and configure the iommu to match > > it. It is a very simple question about what ruleset and memory layout > > govern the page table memory used by the CPU. > the problem is more relevant to things like canonical address > requirement than page table format. The translation of an VA to a PA is entirely within the realm of the page table format, IMHO. If the rules change then the format clearly differs, even if your arch documents talk about "canonical address" or something to define the different parsing behaviors. Jason
On Tue, Sep 20, 2022 at 10:14:54AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 15, 2022 at 08:28:58PM +0300, Kirill A. Shutemov wrote:
>
> > @@ -31,7 +33,17 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> > min == 0 || max < min)
> > return -EINVAL;
> >
> > + /* Serialize against address tagging enabling */
> > + if (mmap_write_lock_killable(mm))
> > + return -EINTR;
> > +
> > + if (!arch_can_alloc_pasid(mm)) {
> > + mmap_write_unlock(mm);
> > + return -EBUSY;
> > + }
>
> This has nothing to do with "allocating pasid"
>
> Rather should be: "is the CPU page table compatible with the IOMMU HW
> page table walker"
Thanks Jason, this certainly looks like a more logical way to look at it
rather than the functional association of allocating pasids.
>
> For this I would rather have a function that queries the format of the
> page table under the mm_struct and we have enum values like
> INTEL_NORMAL and INTEL_LAM as possible values.
>
> The iommu driver will block incompatible page table formats, and when
> it starts up it should assert something that blocks changing the
> format.
>
> Jason
>
© 2016 - 2026 Red Hat, Inc.