On Tue, Feb 25, 2025 at 10:00:36PM -0500, Rik van Riel wrote:
> Reduce code duplication by consolidating the decision point
> for whether to do individual invalidations or a full flush
> inside get_flush_tlb_info.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
After 4a02ed8e1cc3 ("x86/mm: Consolidate full flush threshold
decision"), we've seen data corruption in DMAd buffers when testing SVA.
From our preliminary analysis, it appears that get_flush_tlb_info()
modifies the start and end parameters for full TLB flushes (setting
start=0, end=TLB_FLUSH_ALL). However, the MMU notifier call at the end
of the function still uses the original parameters instead of the
updated info->start and info->end.
The change below appears to solve the problem, however we are not sure if
this is the right way to fix the problem.
----8<----
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 39f80111e6f1..e66c7662c254 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
put_flush_tlb_info();
put_cpu();
- mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
+ mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end);
}
static void do_flush_tlb_all(void *info)
--
2.51.0
On Tue, Sep 2, 2025 at 5:44 PM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > On Tue, Feb 25, 2025 at 10:00:36PM -0500, Rik van Riel wrote: > > Reduce code duplication by consolidating the decision point > > for whether to do individual invalidations or a full flush > > inside get_flush_tlb_info. > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > > Tested-by: Michael Kelley <mhklinux@outlook.com> > > Acked-by: Dave Hansen <dave.hansen@intel.com> > > Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> > > --- > After 4a02ed8e1cc3 ("x86/mm: Consolidate full flush threshold > decision"), we've seen data corruption in DMAd buffers when testing SVA. If it's not too much effort, you could try to see whether bumping /sys/kernel/debug/x86/tlb_single_page_flush_ceiling to some relatively large value (maybe something like 262144, which is 512*512) causes you to see the same kind of issue before commit 4a02ed8e1cc3. (Note that increasing tlb_single_page_flush_ceiling costs performance, and putting an overly big value in there will probably make the system completely unresponsive.)
On Tue, Sep 02, 2025 at 06:31:49PM +0200, Jann Horn wrote: > On Tue, Sep 2, 2025 at 5:44 PM Giovanni Cabiddu > <giovanni.cabiddu@intel.com> wrote: > > On Tue, Feb 25, 2025 at 10:00:36PM -0500, Rik van Riel wrote: > > > Reduce code duplication by consolidating the decision point > > > for whether to do individual invalidations or a full flush > > > inside get_flush_tlb_info. > > > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > > > Tested-by: Michael Kelley <mhklinux@outlook.com> > > > Acked-by: Dave Hansen <dave.hansen@intel.com> > > > Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> > > > --- > > After 4a02ed8e1cc3 ("x86/mm: Consolidate full flush threshold > > decision"), we've seen data corruption in DMAd buffers when testing SVA. > > If it's not too much effort, you could try to see whether bumping > /sys/kernel/debug/x86/tlb_single_page_flush_ceiling to some relatively > large value (maybe something like 262144, which is 512*512) causes you > to see the same kind of issue before commit 4a02ed8e1cc3. (Note that > increasing tlb_single_page_flush_ceiling costs performance, and > putting an overly big value in there will probably make the system > completely unresponsive.) Thanks. We will try to increase tlb_single_page_flush_ceiling before 4a02ed8e1cc3. I'm not familiar with this code, but based on the commit message, 4a02ed8e1cc3 appears to be a refactor. However, that doesn't seem to be the case. Before the commit, mmu_notifier_arch_invalidate_secondary_tlbs() was getting a modified value for start and end. After the commit it appears to be using the original values instead. Was this intentional? Regards, -- Giovanni
On Tue, Sep 2, 2025 at 5:44 PM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > On Tue, Feb 25, 2025 at 10:00:36PM -0500, Rik van Riel wrote: > > Reduce code duplication by consolidating the decision point > > for whether to do individual invalidations or a full flush > > inside get_flush_tlb_info. > > > > Signed-off-by: Rik van Riel <riel@surriel.com> > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > > Tested-by: Michael Kelley <mhklinux@outlook.com> > > Acked-by: Dave Hansen <dave.hansen@intel.com> > > Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> > > --- > After 4a02ed8e1cc3 ("x86/mm: Consolidate full flush threshold > decision"), we've seen data corruption in DMAd buffers when testing SVA. > > From our preliminary analysis, it appears that get_flush_tlb_info() > modifies the start and end parameters for full TLB flushes (setting > start=0, end=TLB_FLUSH_ALL). However, the MMU notifier call at the end > of the function still uses the original parameters instead of the > updated info->start and info->end. > > The change below appears to solve the problem, however we are not sure if > this is the right way to fix the problem. > > ----8<---- > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 39f80111e6f1..e66c7662c254 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, > > put_flush_tlb_info(); > put_cpu(); > - mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); > + mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end); > } I don't see why the IOMMU flush should be broadened just because the CPU flush got broadened. On x86, IOMMU flushes happen from arch_tlbbatch_add_pending() and flush_tlb_mm_range(); the IOMMU folks might know better, but as far as I know, there is nothing that elides IOMMU flushes depending on the state of X86-internal flush generation tracking or such. To me this looks like a change that is correct but makes it easier to hit IOMMU flushing issues in other places. Are you encountering these issues on an Intel system or an AMD system?
I just noticed few things need clarification > On 2 Sep 2025, at 19:05, Jann Horn <jannh@google.com> wrote: > > On Tue, Sep 2, 2025 at 5:44 PM Giovanni Cabiddu > <giovanni.cabiddu@intel.com> wrote: >> [snip] >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index 39f80111e6f1..e66c7662c254 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, >> >> put_flush_tlb_info(); >> put_cpu(); >> - mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); >> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end); >> } > > I don't see why the IOMMU flush should be broadened just because the > CPU flush got broadened. Agreed (as Rik also indicated now) > > On x86, IOMMU flushes happen from arch_tlbbatch_add_pending() and > flush_tlb_mm_range(); the IOMMU folks might know better, but as far as > I know, there is nothing that elides IOMMU flushes depending on the > state of X86-internal flush generation tracking or such. > > To me this looks like a change that is correct but makes it easier to > hit IOMMU flushing issues in other places. This change is not correct. Do not reference info after calling put_flush_tlb_info(). > > Are you encountering these issues on an Intel system or an AMD system? I would note that on AMD IOMMUs there is some masking functionality that allows to make large yet selective flushes. This means both that we should not force IOMMU flush range to be large just because we decided to do so for the CPU (as you correctly said) and that there might be an unrelated bug, like in the mask computation.
On Wed, Sep 3, 2025 at 4:18 PM Nadav Amit <nadav.amit@gmail.com> wrote: > > On 2 Sep 2025, at 19:05, Jann Horn <jannh@google.com> wrote: > > On x86, IOMMU flushes happen from arch_tlbbatch_add_pending() and > > flush_tlb_mm_range(); the IOMMU folks might know better, but as far as > > I know, there is nothing that elides IOMMU flushes depending on the > > state of X86-internal flush generation tracking or such. > > > > To me this looks like a change that is correct but makes it easier to > > hit IOMMU flushing issues in other places. > > This change is not correct. Do not reference info after calling > put_flush_tlb_info(). To be clear, I worded that very confusingly, I meant that Rik's commit 4a02ed8e1cc3 looks correct to me.
On Tue, Sep 2, 2025 at 6:05 PM Jann Horn <jannh@google.com> wrote: > To me this looks like a change that is correct but makes it easier to > hit IOMMU flushing issues in other places. (This was ambigous; to be clear, I meant that 4a02ed8e1cc3 looks correct to me, and that the suggested fix doesn't look to me like it fixes anything.)
On 9/2/25 08:44, Giovanni Cabiddu wrote: > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 39f80111e6f1..e66c7662c254 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, > > put_flush_tlb_info(); > put_cpu(); > - mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); > + mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end); > } That does look like the right solution. This is the downside of wrapping everything up in that 'info' struct; it's not obvious that the canonical source of the start/end information moved from those variables into the structure. Rik, is that your read on it too? In any case, Giovanni, do you want to send that as a "real" patch that we can apply (changelog, SoB, Fixes, Cc:stable@, etc...)?
On Tue, 2025-09-02 at 08:50 -0700, Dave Hansen wrote: > On 9/2/25 08:44, Giovanni Cabiddu wrote: > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > > index 39f80111e6f1..e66c7662c254 100644 > > --- a/arch/x86/mm/tlb.c > > +++ b/arch/x86/mm/tlb.c > > @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, > > unsigned long start, > > > > put_flush_tlb_info(); > > put_cpu(); > > - mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, > > end); > > + mmu_notifier_arch_invalidate_secondary_tlbs(mm, info- > > >start, info->end); > > } > > That does look like the right solution. > > This is the downside of wrapping everything up in that 'info' struct; > it's not obvious that the canonical source of the start/end > information > moved from those variables into the structure. > > Rik, is that your read on it too? > In flush_tlb_mm_range(), we only need to flush from start to end. The same should be true for the mmu notifier. The only reason info->start and info->end can be modified is because a global TLB flush can be faster than a ranged one. I can't think of any reason why that should affect the range the mmu notifier needs to flush. -- All Rights Reversed.
> On 2 Sep 2025, at 18:50, Dave Hansen <dave.hansen@intel.com> wrote: > > On 9/2/25 08:44, Giovanni Cabiddu wrote: >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index 39f80111e6f1..e66c7662c254 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, >> >> put_flush_tlb_info(); >> put_cpu(); >> - mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); >> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end); >> } > > That does look like the right solution. > > This is the downside of wrapping everything up in that 'info' struct; > it's not obvious that the canonical source of the start/end information > moved from those variables into the structure. Just a reminder (which I am sure that you know): the main motivation behind this info struct is to allow the payload that is required for the TLB shootdown to be in a single cache-line. It would be nice (if possible) that callees like broadcast_tlb_flush() would constify flush_tlb_info (I’m not encouraging lots of consification, but maybe it’s appropriate here) to prevent the misuse of flush_tlb_info.
On 9/2/25 09:08, Nadav Amit wrote: > It would be nice (if possible) that callees like broadcast_tlb_flush() > would constify flush_tlb_info (I’m not encouraging lots of consification, > but maybe it’s appropriate here) to prevent the misuse of flush_tlb_info. Yeah, agreed. It would be nice if there was a clear and enforced line between producers and consumers here.
© 2016 - 2025 Red Hat, Inc.