arch/arm64/mm/trans_pgd.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
when the page is already dirty (PTE_DIRTY is set). While this optimization
prevents unnecessary dirty page marking in normal memory management paths,
it breaks kexec on some platforms like NXP LS1043.
The issue occurs in the kexec code path:
1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
writable copy of the linear mapping
2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
are writable for the new kernel image copying
3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
4. When kexec tries to copy the new kernel image through the linear
mapping, it fails on read-only pages, causing the system to hang
after "Bye!"
The same issue affects hibernation which uses the same trans_pgd code path.
Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
kexec and hibernation, ensuring all pages in the temporary mapping are
writable regardless of their dirty state. This preserves the original
commit's optimization for normal memory management while fixing the
kexec/hibernation regression.
Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
---
arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 18543b603c77..ad4e5e4fcc91 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
* Resume will overwrite areas that may be marked
* read only (code, rodata). Clear the RDONLY bit from
* the temporary mappings we use during restore.
+ *
+ * For kexec/hibernation, we need writable access regardless
+ * of the page's dirty state, so force clear PTE_RDONLY.
*/
- __set_pte(dst_ptep, pte_mkwrite_novma(pte));
+ pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+ __set_pte(dst_ptep, pte);
} else if (!pte_none(pte)) {
/*
* debug_pagealloc will removed the PTE_VALID bit if
@@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
*/
BUG_ON(!pfn_valid(pte_pfn(pte)));
- __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
+ pte = pte_mkvalid(pte);
+ pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
+ pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+ __set_pte(dst_ptep, pte);
}
}
--
2.51.2
Hi, Jianpeng,
Jianpeng Chang <jianpeng.chang.cn@windriver.com> writes:
> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
> when the page is already dirty (PTE_DIRTY is set). While this optimization
> prevents unnecessary dirty page marking in normal memory management paths,
> it breaks kexec on some platforms like NXP LS1043.
>
> The issue occurs in the kexec code path:
> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
> writable copy of the linear mapping
> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
> are writable for the new kernel image copying
> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
> 4. When kexec tries to copy the new kernel image through the linear
> mapping, it fails on read-only pages, causing the system to hang
> after "Bye!"
>
> The same issue affects hibernation which uses the same trans_pgd code path.
>
> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
> kexec and hibernation, ensuring all pages in the temporary mapping are
> writable regardless of their dirty state. This preserves the original
> commit's optimization for normal memory management while fixing the
> kexec/hibernation regression.
>
> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
IMHO, this isn't the right "Fixes" tag. The original _copy_pte() code
should be the fixing target.
> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
> ---
> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
> index 18543b603c77..ad4e5e4fcc91 100644
> --- a/arch/arm64/mm/trans_pgd.c
> +++ b/arch/arm64/mm/trans_pgd.c
> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
> * Resume will overwrite areas that may be marked
> * read only (code, rodata). Clear the RDONLY bit from
> * the temporary mappings we use during restore.
> + *
> + * For kexec/hibernation, we need writable access regardless
> + * of the page's dirty state, so force clear PTE_RDONLY.
> */
> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> + __set_pte(dst_ptep, pte);
Why not
__set_pte(dst_ptep, pte_mkwrite_novma(pte_mkdirty(pte));
?
> } else if (!pte_none(pte)) {
> /*
> * debug_pagealloc will removed the PTE_VALID bit if
> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
> */
> BUG_ON(!pfn_valid(pte_pfn(pte)));
>
> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
> + pte = pte_mkvalid(pte);
> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> + __set_pte(dst_ptep, pte);
> }
> }
---
Best Regards,
Huang, Ying
On 11/28/25 5:32 PM, Huang, Ying wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi, Jianpeng,
>
> Jianpeng Chang <jianpeng.chang.cn@windriver.com> writes:
>
>> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
>> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
>> when the page is already dirty (PTE_DIRTY is set). While this optimization
>> prevents unnecessary dirty page marking in normal memory management paths,
>> it breaks kexec on some platforms like NXP LS1043.
>>
>> The issue occurs in the kexec code path:
>> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
>> writable copy of the linear mapping
>> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
>> are writable for the new kernel image copying
>> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
>> 4. When kexec tries to copy the new kernel image through the linear
>> mapping, it fails on read-only pages, causing the system to hang
>> after "Bye!"
>>
>> The same issue affects hibernation which uses the same trans_pgd code path.
>>
>> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
>> kexec and hibernation, ensuring all pages in the temporary mapping are
>> writable regardless of their dirty state. This preserves the original
>> commit's optimization for normal memory management while fixing the
>> kexec/hibernation regression.
>>
>> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
> IMHO, this isn't the right "Fixes" tag. The original _copy_pte() code
> should be the fixing target.
Hi Ying,
According to my understanding, the Fixes tag should point to the commit
that directly
introduced the issue. While _copy_pte() was introduced with
pte_mkwrite_novma() in
commit 6ecc21bb432d, at that time pte_mkwrite_novma() always cleared the
PTE_RDONLY
bit unconditionally, and kexec worked correctly. Should we blame a
change that was
working properly at the time, or am I missing something here?
Thanks,
Jianpeng
>
>> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
>> ---
>> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
>> index 18543b603c77..ad4e5e4fcc91 100644
>> --- a/arch/arm64/mm/trans_pgd.c
>> +++ b/arch/arm64/mm/trans_pgd.c
>> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>> * Resume will overwrite areas that may be marked
>> * read only (code, rodata). Clear the RDONLY bit from
>> * the temporary mappings we use during restore.
>> + *
>> + * For kexec/hibernation, we need writable access regardless
>> + * of the page's dirty state, so force clear PTE_RDONLY.
>> */
>> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>> + __set_pte(dst_ptep, pte);
> Why not
>
> __set_pte(dst_ptep, pte_mkwrite_novma(pte_mkdirty(pte));
I agree that using pte_mkdirty() is indeed better, makes the
modification clearer and avoids the helper function. I will change it.
>
> ?
>
>> } else if (!pte_none(pte)) {
>> /*
>> * debug_pagealloc will removed the PTE_VALID bit if
>> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>> */
>> BUG_ON(!pfn_valid(pte_pfn(pte)));
>>
>> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
>> + pte = pte_mkvalid(pte);
>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>> + __set_pte(dst_ptep, pte);
>> }
>> }
> ---
> Best Regards,
> Huang, Ying
Jianpeng Chang <jianpeng.chang.cn@windriver.com> writes:
> On 11/28/25 5:32 PM, Huang, Ying wrote:
>> CAUTION: This email comes from a non Wind River email account!
>> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>> Hi, Jianpeng,
>>
>> Jianpeng Chang <jianpeng.chang.cn@windriver.com> writes:
>>
>>> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
>>> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
>>> when the page is already dirty (PTE_DIRTY is set). While this optimization
>>> prevents unnecessary dirty page marking in normal memory management paths,
>>> it breaks kexec on some platforms like NXP LS1043.
>>>
>>> The issue occurs in the kexec code path:
>>> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
>>> writable copy of the linear mapping
>>> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
>>> are writable for the new kernel image copying
>>> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
>>> 4. When kexec tries to copy the new kernel image through the linear
>>> mapping, it fails on read-only pages, causing the system to hang
>>> after "Bye!"
>>>
>>> The same issue affects hibernation which uses the same trans_pgd code path.
>>>
>>> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
>>> kexec and hibernation, ensuring all pages in the temporary mapping are
>>> writable regardless of their dirty state. This preserves the original
>>> commit's optimization for normal memory management while fixing the
>>> kexec/hibernation regression.
>>>
>>> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
>> IMHO, this isn't the right "Fixes" tag. The original _copy_pte() code
>> should be the fixing target.
>
> Hi Ying,
>
> According to my understanding, the Fixes tag should point to the
> commit that directly
>
> introduced the issue. While _copy_pte() was introduced with
> pte_mkwrite_novma() in
>
> commit 6ecc21bb432d, at that time pte_mkwrite_novma() always cleared
> the PTE_RDONLY
Per my understanding, that is not correct to clearing PTE_RDONLY in
pte_mkwrite_novma() in the first place.
However, "Fixes" flag is used for patch backporting too. For that, the
original tag is better. So, It's OK for me now.
> bit unconditionally, and kexec worked correctly. Should we blame a
> change that was
>
> working properly at the time, or am I missing something here?
>
>>
>>> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
>>> ---
>>> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
>>> index 18543b603c77..ad4e5e4fcc91 100644
>>> --- a/arch/arm64/mm/trans_pgd.c
>>> +++ b/arch/arm64/mm/trans_pgd.c
>>> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>>> * Resume will overwrite areas that may be marked
>>> * read only (code, rodata). Clear the RDONLY bit from
>>> * the temporary mappings we use during restore.
>>> + *
>>> + * For kexec/hibernation, we need writable access regardless
>>> + * of the page's dirty state, so force clear PTE_RDONLY.
>>> */
>>> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
>>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>>> + __set_pte(dst_ptep, pte);
>> Why not
>>
>> __set_pte(dst_ptep, pte_mkwrite_novma(pte_mkdirty(pte));
> I agree that using pte_mkdirty() is indeed better, makes the
> modification clearer and avoids the helper function. I will change it.
Thanks!
>> ?
>>
>>> } else if (!pte_none(pte)) {
>>> /*
>>> * debug_pagealloc will removed the PTE_VALID bit if
>>> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>>> */
>>> BUG_ON(!pfn_valid(pte_pfn(pte)));
>>>
>>> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
>>> + pte = pte_mkvalid(pte);
>>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>>> + __set_pte(dst_ptep, pte);
>>> }
>>> }
---
Best Regards,
Huang, Ying
On 27/11/25 9:13 AM, Jianpeng Chang wrote:
> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
> when the page is already dirty (PTE_DIRTY is set). While this optimization
> prevents unnecessary dirty page marking in normal memory management paths,
> it breaks kexec on some platforms like NXP LS1043.
>
> The issue occurs in the kexec code path:
> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
> writable copy of the linear mapping
> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
> are writable for the new kernel image copying
> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
> 4. When kexec tries to copy the new kernel image through the linear
> mapping, it fails on read-only pages, causing the system to hang
> after "Bye!"
>
> The same issue affects hibernation which uses the same trans_pgd code path.
>
> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
> kexec and hibernation, ensuring all pages in the temporary mapping are
> writable regardless of their dirty state. This preserves the original
> commit's optimization for normal memory management while fixing the
> kexec/hibernation regression.
>
> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
> ---
> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
> index 18543b603c77..ad4e5e4fcc91 100644
> --- a/arch/arm64/mm/trans_pgd.c
> +++ b/arch/arm64/mm/trans_pgd.c
> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
> * Resume will overwrite areas that may be marked
> * read only (code, rodata). Clear the RDONLY bit from
> * the temporary mappings we use during restore.
> + *
> + * For kexec/hibernation, we need writable access regardless
> + * of the page's dirty state, so force clear PTE_RDONLY.
> */
Hence pte_mkwrite_novma() helper cannot be used here as would have
normally expected. Might be good idea to add to the above comment.
> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> + __set_pte(dst_ptep, pte);
> } else if (!pte_none(pte)) {
> /*
> * debug_pagealloc will removed the PTE_VALID bit if
> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
> */
> BUG_ON(!pfn_valid(pte_pfn(pte)));
>
> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
> + pte = pte_mkvalid(pte);
Probably better to move pte_mkvalid() just after clearing PTE_RDONLY.
> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
> + __set_pte(dst_ptep, pte);
> }
> }
>
Just wondering if it would be worth adding a local helper for the set
PTE_WRITE --> clear PTR_RDONLY sequence describing its difference with
now updated pte_mkwrite_novma() helper with the earlier comment.
On 11/27/2025 1:41 PM, Anshuman Khandual wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On 27/11/25 9:13 AM, Jianpeng Chang wrote:
>> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
>> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
>> when the page is already dirty (PTE_DIRTY is set). While this optimization
>> prevents unnecessary dirty page marking in normal memory management paths,
>> it breaks kexec on some platforms like NXP LS1043.
>>
>> The issue occurs in the kexec code path:
>> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
>> writable copy of the linear mapping
>> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
>> are writable for the new kernel image copying
>> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
>> 4. When kexec tries to copy the new kernel image through the linear
>> mapping, it fails on read-only pages, causing the system to hang
>> after "Bye!"
>>
>> The same issue affects hibernation which uses the same trans_pgd code path.
>>
>> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
>> kexec and hibernation, ensuring all pages in the temporary mapping are
>> writable regardless of their dirty state. This preserves the original
>> commit's optimization for normal memory management while fixing the
>> kexec/hibernation regression.
>>
>> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
>> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
>> ---
>> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
>> index 18543b603c77..ad4e5e4fcc91 100644
>> --- a/arch/arm64/mm/trans_pgd.c
>> +++ b/arch/arm64/mm/trans_pgd.c
>> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>> * Resume will overwrite areas that may be marked
>> * read only (code, rodata). Clear the RDONLY bit from
>> * the temporary mappings we use during restore.
>> + *
>> + * For kexec/hibernation, we need writable access regardless
>> + * of the page's dirty state, so force clear PTE_RDONLY.
>> */
> Hence pte_mkwrite_novma() helper cannot be used here as would have
> normally expected. Might be good idea to add to the above comment.
I'll add this to make it clear that we can't use pte_mkwrite_novma().
>
>> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>> + __set_pte(dst_ptep, pte);
>> } else if (!pte_none(pte)) {
>> /*
>> * debug_pagealloc will removed the PTE_VALID bit if
>> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>> */
>> BUG_ON(!pfn_valid(pte_pfn(pte)));
>>
>> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
>> + pte = pte_mkvalid(pte);
> Probably better to move pte_mkvalid() just after clearing PTE_RDONLY.
I'll make this change.
>
>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>> + __set_pte(dst_ptep, pte);
>> }
>> }
>>
> Just wondering if it would be worth adding a local helper for the set
> PTE_WRITE --> clear PTR_RDONLY sequence describing its difference with
> now updated pte_mkwrite_novma() helper with the earlier comment.
Thank you for the review! I appreciate your suggestions.
You're right that a local helper would make the code more readable and
clearly
document the difference from pte_mkwrite_novma().
I have a small concern about placing PTE manipulation functions outside of
pgtable.h - is this a good way? Or would you prefer this as a local
static inline
helper within trans_pgd.c given its specific use case for kexec/hibernation?
I'll implement whichever approach you think is more appropriate in v2.
Best regards,
Jianpeng
On 27/11/25 3:54 PM, Chang, Jianpeng (CN) wrote:
>
> On 11/27/2025 1:41 PM, Anshuman Khandual wrote:
>> CAUTION: This email comes from a non Wind River email account!
>> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>> On 27/11/25 9:13 AM, Jianpeng Chang wrote:
>>> Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in
>>> pte_mkwrite()") modified pte_mkwrite_novma() to only clear PTE_RDONLY
>>> when the page is already dirty (PTE_DIRTY is set). While this optimization
>>> prevents unnecessary dirty page marking in normal memory management paths,
>>> it breaks kexec on some platforms like NXP LS1043.
>>>
>>> The issue occurs in the kexec code path:
>>> 1. machine_kexec_post_load() calls trans_pgd_create_copy() to create a
>>> writable copy of the linear mapping
>>> 2. _copy_pte() calls pte_mkwrite_novma() to ensure all pages in the copy
>>> are writable for the new kernel image copying
>>> 3. With the new logic, clean pages (without PTE_DIRTY) remain read-only
>>> 4. When kexec tries to copy the new kernel image through the linear
>>> mapping, it fails on read-only pages, causing the system to hang
>>> after "Bye!"
>>>
>>> The same issue affects hibernation which uses the same trans_pgd code path.
>>>
>>> Fix this by explicitly clearing PTE_RDONLY in _copy_pte() for both
>>> kexec and hibernation, ensuring all pages in the temporary mapping are
>>> writable regardless of their dirty state. This preserves the original
>>> commit's optimization for normal memory management while fixing the
>>> kexec/hibernation regression.
>>>
>>> Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()")
>>> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
>>> ---
>>> arch/arm64/mm/trans_pgd.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
>>> index 18543b603c77..ad4e5e4fcc91 100644
>>> --- a/arch/arm64/mm/trans_pgd.c
>>> +++ b/arch/arm64/mm/trans_pgd.c
>>> @@ -40,8 +40,13 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>>> * Resume will overwrite areas that may be marked
>>> * read only (code, rodata). Clear the RDONLY bit from
>>> * the temporary mappings we use during restore.
>>> + *
>>> + * For kexec/hibernation, we need writable access regardless
>>> + * of the page's dirty state, so force clear PTE_RDONLY.
>>> */
>> Hence pte_mkwrite_novma() helper cannot be used here as would have
>> normally expected. Might be good idea to add to the above comment.
> I'll add this to make it clear that we can't use pte_mkwrite_novma().
>>
>>> - __set_pte(dst_ptep, pte_mkwrite_novma(pte));
>>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>>> + __set_pte(dst_ptep, pte);
>>> } else if (!pte_none(pte)) {
>>> /*
>>> * debug_pagealloc will removed the PTE_VALID bit if
>>> @@ -57,7 +62,10 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
>>> */
>>> BUG_ON(!pfn_valid(pte_pfn(pte)));
>>>
>>> - __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
>>> + pte = pte_mkvalid(pte);
>> Probably better to move pte_mkvalid() just after clearing PTE_RDONLY.
> I'll make this change.
>>
>>> + pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
>>> + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
>>> + __set_pte(dst_ptep, pte);
>>> }
>>> }
>>>
>> Just wondering if it would be worth adding a local helper for the set
>> PTE_WRITE --> clear PTR_RDONLY sequence describing its difference with
>> now updated pte_mkwrite_novma() helper with the earlier comment.
>
> Thank you for the review! I appreciate your suggestions.
>
> You're right that a local helper would make the code more readable and clearly
>
> document the difference from pte_mkwrite_novma().
>
> I have a small concern about placing PTE manipulation functions outside of
>
> pgtable.h - is this a good way? Or would you prefer this as a local static inline
> > helper within trans_pgd.c given its specific use case for kexec/hibernation?
A local static helper function in trans_pgd.c documenting the difference with
standard pgtable helper pte_mkwrite_novma() would be better as it is not used
else where.
>
>
> I'll implement whichever approach you think is more appropriate in v2.
>
Let's wait some more time for other opinions as well.
>
> Best regards,
>
> Jianpeng
>
© 2016 - 2025 Red Hat, Inc.