[PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.

zhangchun posted 1 patch 1 year, 5 months ago
There is a newer version of this series
mm/highmem.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
[PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 5 months ago
Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
time may cause deadlock. The issue is like this:

 CPU 0:                                                 CPU 1:
 kmap_high(){                                           kmap_xxx() {
               ...                                        irq_disable();
        spin_lock(&kmap_lock)
               ...
        map_new_virtual                                     ...
           flush_all_zero_pkmaps
              flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
                      smp_call_function_many         spin_lock(&kmap_lock)
                      ...                                   ....
        spin_unlock(&kmap_lock)
               ...

CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
this by releasing  kmap_lock before call flush_tlb_kernel_range,
avoid kmap_lock deadlock.

Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
 mm/highmem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/highmem.c b/mm/highmem.c
index bd48ba4..841b370 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
 		set_page_address(page, NULL);
 		need_flush = 1;
 	}
-	if (need_flush)
+	if (need_flush) {
+		unlock_kmap();
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+		lock_kmap();
+	}
 }
 
 void __kmap_flush_unused(void)
-- 
1.8.3.1

Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by Andrew Morton 1 year, 5 months ago
On Wed, 10 Jul 2024 20:20:28 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the same
> time may cause deadlock. The issue is like this:


What is kmap_XXX?

>  CPU 0:                                                 CPU 1:
>  kmap_high(){                                           kmap_xxx() {
>                ...                                        irq_disable();
>         spin_lock(&kmap_lock)
>                ...
>         map_new_virtual                                     ...
>            flush_all_zero_pkmaps
>               flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
>                       smp_call_function_many         spin_lock(&kmap_lock)
>                       ...                                   ....
>         spin_unlock(&kmap_lock)
>                ...
> 
> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1
> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix
> this by releasing  kmap_lock before call flush_tlb_kernel_range,
> avoid kmap_lock deadlock.
> 
> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")

Wow, that's 15 years old.  Has the deadlock been observed?

> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>  		set_page_address(page, NULL);
>  		need_flush = 1;
>  	}
> -	if (need_flush)
> +	if (need_flush) {
> +		unlock_kmap();
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +		lock_kmap();
> +	}
>  }

Why is dropping the lock like this safe?  What data is it protecting
and why is it OK to leave that data unprotected here?

[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 4 months ago
Very sorry to disturb! Just a friendly ping!
-- 
1.8.3.1
[PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 5 months ago
Very sorry to disturb! Just a friendly ping to check in on the status of the 
patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".  
Please let me know if there is any additional information from my side.

Sincerely look forward to your suggestions and guidance!

>>> >> --- a/mm/highmem.c
>>> >> +++ b/mm/highmem.c
>>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>>> >>       set_page_address(page, NULL);
>>> >>       need_flush = 1;
>>> >>   }
>>> >> - if (need_flush)
>>> >> + if (need_flush) {
>>> >> +     unlock_kmap();
>>> >>       flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>>> >> +     lock_kmap();
>>> >> + }
>>> >>  }
>>> 
>>> >Why is dropping the lock like this safe?  What data is it protecting 
>>> >and why is it OK to leave that data unprotected here?
>>> 
>>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
>>> When call flush_tlb_kernel_range(PKMAP_ADDR(0), 
>>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.

>>No, the risk here is that when the lock is dropped, other threads will modify the data.  And when this thread (the one running
>>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.

>map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is 
>not dropped. 
>"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

>Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, 
>this is thread-safe.

>In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the 
>kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes 
>the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. 
>If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. 

>static inline unsigned long map_new_virtual(struct page *page)
>{
>        unsigned long vaddr;
>        int count;
>        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
>        unsigned int color = get_pkmap_color(page);

>start:
>        ...
>                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
>                        count = get_pkmap_entries_count(color);
>                } 
>                ...
>                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
>                        break;  /* Found a usable entry */
>                if (--count) 
>                        continue;
>
>               ...
>        vaddr = PKMAP_ADDR(last_pkmap_nr);
>        set_pte_at(&init_mm, vaddr,
>                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
>
>        pkmap_count[last_pkmap_nr] = 1;
>        ...
>        return vaddr;
>}
   
-- 
1.8.3.1

Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by Andrew Morton 1 year, 4 months ago
On Fri, 19 Jul 2024 00:18:32 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> Very sorry to disturb! Just a friendly ping to check in on the status of the 
> patch "Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.".  
> Please let me know if there is any additional information from my side.
> 

Please update the patch to include a code comment which explains why
we're dropping kmap_lock at this point.  And please update the
changelog to explain why holding kmap_lock was not necessary at this
point and why this is a safe change.  Then resend.

[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 3 months ago
 CPU 0:                                                 CPU 1:
 kmap_high(){                                           kmap_xxx() {
               ...                                        irq_disable();
        spin_lock(&kmap_lock)
               ...
        map_new_virtual                                     ...
           flush_all_zero_pkmaps
              flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
                      smp_call_function_many         spin_lock(&kmap_lock)
                      ...                                   ....
        spin_unlock(&kmap_lock)
               ...

CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has disabled irqs, waiting for kmap_lock,
cannot answer the IPI. Fix this by releasing  kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
deadlock. Like this:

        if (need_flush) {
            unlock_kmap();
            flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
            lock_kmap();
        }

Dropping the lock is safe. kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable).
When call flush_tlb_kernel_range(PKMAP_ADDR(0),
PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected
here is safe.

map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr],
the kmap_lock is not dropped.
"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired,
this is thread-safe.

In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and
release the kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After
Thread A completes the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr].

static inline unsigned long map_new_virtual(struct page *page)
{
        unsigned long vaddr;
        int count;
        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
        unsigned int color = get_pkmap_color(page);

start:
        ...
                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
                        count = get_pkmap_entries_count(color);
                }
                ...
                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
                        break;  /* Found a usable entry */
                if (--count)
                        continue;

               ...
        vaddr = PKMAP_ADDR(last_pkmap_nr);
        set_pte_at(&init_mm, vaddr,
                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));

        pkmap_count[last_pkmap_nr] = 1;
        ...
        return vaddr;
}

Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
 mm/highmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
 		set_page_address(page, NULL);
 		need_flush = 1;
 	}
-	if (need_flush)
+	if (need_flush) {
+		/*
+		 * In multi-core system one CPU holds the kmap_lock, waiting
+		 * for other CPUs respond to IPI. But other CPUS has disabled
+		 * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+		 * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+		 * deadlock.
+		 */
+		unlock_kmap();
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+		lock_kmap();
+	}
 }
 
 void __kmap_flush_unused(void)
--
1.8.3.1
[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 2 months ago
Very sorry to disturb! Just a friendly ping! This deadlock bug is necessary to fix!
If any additional info needs, please contact me. Long for your reply!

-- 
1.8.3.1
[PATCH v4] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 1 month ago
 CPU 0:                                                 CPU 1:
 kmap_high(){                                           kmap_xxx() {
               ...                                        irq_disable();
        spin_lock(&kmap_lock)
               ...
        map_new_virtual                                     ...
           flush_all_zero_pkmaps
              flush_tlb_kernel_range         /* CPU0 holds the kmap_lock */
                      smp_call_function_many         spin_lock(&kmap_lock)
                      ...                                   ....
        spin_unlock(&kmap_lock)
               ...

CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 has disabled irqs,
waiting for kmap_lock, cannot answer the IPI.

Fix this by releasing kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock deadlock.

        if (need_flush) {
            unlock_kmap();
            flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
            lock_kmap();
        }

Dropping the lock like this is safe. kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. 
Leave that data unprotected here is safe.

map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr].
When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is not dropped. "if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. 
If unusable, try agin.

Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, this is thread-safe.

In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the kmap_lock,
and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes the execution of function the
variable pkmap_count[last_pkmap_nr]. After Thread A completes the execution of function flush_tlb_kernel_range, it will check the variable 
pkmap_count[last_pkmap_nr].

static inline unsigned long map_new_virtual(struct page *page) {
        unsigned long vaddr;
        int count;
        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
        unsigned int color = get_pkmap_color(page);

start:
      ...
                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
                        count = get_pkmap_entries_count(color);
                }
                ...
                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
                        break;  /* Found a usable entry */
                if (--count)
                        continue;

               ...
        vaddr = PKMAP_ADDR(last_pkmap_nr);
        set_pte_at(&init_mm, vaddr,
                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));

        pkmap_count[last_pkmap_nr] = 1;
        ...
        return vaddr;
}

Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")
Signed-off-by: zhangchun <zhang.chuna@h3c.com>
Co-developed-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Signed-off-by: zhangzhansheng <zhang.zhansheng@h3c.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: zhangzhengming <zhang.zhengming@h3c.com>
---
 mm/highmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b..07f2c67 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -231,8 +231,18 @@ static void flush_all_zero_pkmaps(void)
 		set_page_address(page, NULL);
 		need_flush = 1;
 	}
-	if (need_flush)
+	if (need_flush) {
+		/*
+		 * In multi-core system one CPU holds the kmap_lock, waiting
+		 * for other CPUs respond to IPI. But other CPUS has disabled
+		 * irqs, waiting for kmap_lock, cannot answer the IPI. Release
+		 * kmap_lock before call flush_tlb_kernel_range, avoid kmap_lock
+		 * deadlock.
+		 */
+		unlock_kmap();
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+		lock_kmap();
+	}
 }
 
 void __kmap_flush_unused(void)
--
1.8.3.1


-- 
1.8.3.1
[PATCH v4] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 10 months ago
Very sorry to disturb! Just a friendly ping! Half a year has passed, this deadlock bug is 
necessary to fix! If any additional info needs, please contact. Long for your reply!

-- 
1.8.3.1
[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 2 months ago
Very sorry to disturb! Just a friendly ping! This deadlock bug needs to fixed!
If any additional info needs, please contact me. Long for your reply!

-- 
1.8.3.1
[PATCH v3] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 3 months ago
Very sorry to disturb! Just a friendly ping!
-- 
1.8.3.1
[PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 5 months ago
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>> >>       set_page_address(page, NULL);
>> >>       need_flush = 1;
>> >>   }
>> >> - if (need_flush)
>> >> + if (need_flush) {
>> >> +     unlock_kmap();
>> >>       flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> +     lock_kmap();
>> >> + }
>> >>  }
>> 
>> >Why is dropping the lock like this safe?  What data is it protecting 
>> >and why is it OK to leave that data unprotected here?
>> 
>> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
>> When call flush_tlb_kernel_range(PKMAP_ADDR(0), 
>> PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range will neither modify nor read these variables. Leave that data unprotected here is safe.

>No, the risk here is that when the lock is dropped, other threads will modify the data.  And when this thread (the one running
>flush_all_zero_pkmaps()) retakes the lock, that data may now be unexpectedly altered.

map_new_virtual aims to find an usable entry pkmap_count[last_pkmap_nr]. When read and modify the pkmap_count[last_pkmap_nr], the kmap_lock is 
not dropped. 
"if (!pkmap_count[last_pkmap_nr])" determine pkmap_count[last_pkmap_nr] is usable or not. If unusable, try agin.

Furthermore, the value of static variable last_pkmap_nr is stored in a local variable last_pkmap_nr, when kmap_lock is acquired, 
this is thread-safe.

In an extreme case, if Thread A and Thread B access the same last_pkmap_nr, Thread A calls function flush_tlb_kernel_range and release the 
kmap_lock, and Thread B then acquires the kmap_lock and modifies the variable pkmap_count[last_pkmap_nr]. After Thread A completes 
the execution of function flush_tlb_kernel_range, it will check the variable pkmap_count[last_pkmap_nr]. 
If pkmap_count[last_pkmap_nr] != 0, Thread A continue to call get_next_pkmap_nr and get next last_pkmap_nr. 

static inline unsigned long map_new_virtual(struct page *page)
{
        unsigned long vaddr;
        int count;
        unsigned int last_pkmap_nr; // local variable to store static variable last_pkmap_nr
        unsigned int color = get_pkmap_color(page);

start:
        ...
                        flush_all_zero_pkmaps();// release kmap_lock, then acquire it
                        count = get_pkmap_entries_count(color);
                } 
                ...
                if (!pkmap_count[last_pkmap_nr]) // pkmap_count[last_pkmap_nr] is used or not
                        break;  /* Found a usable entry */
                if (--count) 
                        continue;

               ...
        vaddr = PKMAP_ADDR(last_pkmap_nr);
        set_pte_at(&init_mm, vaddr,
                   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));

        pkmap_count[last_pkmap_nr] = 1;
        ...
        return vaddr;
}
   
-- 
1.8.3.1
[PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by zhangchun 1 year, 5 months ago
>> Use kmap_high and kmap_XXX or kumap_xxx among differt cores at the 
>> same time may cause deadlock. The issue is like this:


>What is kmap_XXX?

kmap/kunmap.

>>  CPU 0:                                                 CPU 1:
>>  kmap_high(){                                           kmap_xxx() {
>>                ...                                        irq_disable();
>>         spin_lock(&kmap_lock)
>>                ...
>>         map_new_virtual                                     ...
>>            flush_all_zero_pkmaps
>>             flush_tlb_kernel_range        /* CPU0 holds the kmap_lock */
>>                 smp_call_function_many         spin_lock(&kmap_lock)
>>                       ...                                   ....
>>         spin_unlock(&kmap_lock)
>>                ...
>> 
>> CPU 0 holds the kmap_lock, waiting for CPU 1 respond to IPI. But CPU 1 
>> has disabled irqs, waiting for kmap_lock, cannot answer the IPI. Fix 
>> this by releasing  kmap_lock before call flush_tlb_kernel_range, avoid 
>> kmap_lock deadlock.
>> 
>> Fixes: 3297e760776a ("highmem: atomic highmem kmap page pinning")

>Wow, that's 15 years old.  Has the deadlock been observed?

Yeah, the device crashed due to this reason. 
 
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
>>  		set_page_address(page, NULL);
>>  		need_flush = 1;
>>  	}
>> -	if (need_flush)
>> +	if (need_flush) {
>> +		unlock_kmap();
>>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +		lock_kmap();
>> +	}
>>  }

>Why is dropping the lock like this safe?  What data is it protecting and why is it OK to 
>leave that data unprotected here?

kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
will neither modify nor read these variables. Leave that data unprotected here is safe.
-- 
1.8.3.1

Re: [PATCH v2] mm: Give kmap_lock before call flush_tlb_kernel_rang,avoid kmap_high deadlock.
Posted by Andrew Morton 1 year, 5 months ago
On Thu, 11 Jul 2024 15:07:56 +0800 zhangchun <zhang.chuna@h3c.com> wrote:

> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -220,8 +220,11 @@ static void flush_all_zero_pkmaps(void)
> >>  		set_page_address(page, NULL);
> >>  		need_flush = 1;
> >>  	}
> >> -	if (need_flush)
> >> +	if (need_flush) {
> >> +		unlock_kmap();
> >>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> +		lock_kmap();
> >> +	}
> >>  }
> 
> >Why is dropping the lock like this safe?  What data is it protecting and why is it OK to 
> >leave that data unprotected here?
> 
> kmap_lock is used to protect pkmap_count, pkmap_page_table and last_pkmap_nr(static variable). 
> When call flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)), flush_tlb_kernel_range
> will neither modify nor read these variables. Leave that data unprotected here is safe.

No, the risk here is that when the lock is dropped, other threads will
modify the data.  And when this thread (the one running
flush_all_zero_pkmaps()) retakes the lock, that data may now be
unexpectedly altered.