kernel/events/uprobes.c | 5 +++++ 1 file changed, 5 insertions(+)
We triggered the following error logs in syzkaller test:
BUG: Bad page state in process syz.7.38 pfn:1eff3
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1eff3
flags: 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff)
raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000fffffffe 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_unref_page_prepare+0x401/0x500
free_unref_page+0x6d/0x1b0
uprobe_write_opcode+0x460/0x8e0
install_breakpoint.part.0+0x51/0x80
register_for_each_vma+0x1d9/0x2b0
__uprobe_register+0x245/0x300
bpf_uprobe_multi_link_attach+0x29b/0x4f0
link_create+0x1e2/0x280
__sys_bpf+0x75f/0xac0
__x64_sys_bpf+0x1a/0x30
do_syscall_64+0x56/0x100
entry_SYSCALL_64_after_hwframe+0x78/0xe2
BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES val:-1
The following syzkaller test case can be used to reproduce:
r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8)
write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10)
r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x42, 0x0)
mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0, 0x12, r4, 0x0)
r5 = userfaultfd(0x80801)
ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20})
r6 = userfaultfd(0x80801)
ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140))
ioctl$UFFDIO_REGISTER(r6, 0xc020aa00, &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2})
ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04, &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}})
r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3, &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"], &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94)
bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30, 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00', &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40)
The cause is that zero pfn is set to the pte without increasing the rss
count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
not increase accordingly. Then, the operation on the same pfn is performed
in uprobe_write_opcode()->__replace_page() to unconditional decrease the
rss count and old_folio's refcount.
Therefore, two bugs are introduced:
1. The rss count is incorrect, when process exit, the check_mm() report
error "Bad rss-count".
2. The reserved folio (zero folio) is freed when folio->refcount is zero,
then free_pages_prepare->free_page_is_bad() report error
"Bad page state".
Considering that uprobe hit the zero folio is a very rare scene, just
reject zero old folio immediately after get_user_page_vma_remote().
Fixes: 7396fa818d62 ("uprobes/core: Make background page replacement logic account for rss_stat counters")
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
v2: Modified according to the comments of David and Oleg.
---
kernel/events/uprobes.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 46ddf3a2334d..daa46868faf3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -506,6 +506,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (ret <= 0)
goto put_old;
+ if (is_zero_page(old_page)) {
+ ret = -EINVAL;
+ goto put_old;
+ }
+
if (WARN(!is_register && PageCompound(old_page),
"uprobe unregister should never work on compound page\n")) {
ret = -EINVAL;
--
2.25.1
On 02/21, Tong Tiangen wrote:
>
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -506,6 +506,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> if (ret <= 0)
> goto put_old;
>
> + if (is_zero_page(old_page)) {
> + ret = -EINVAL;
> + goto put_old;
> + }
I agree with David, the subject looks a bit misleading.
And. I won't insist, this is cosmetic, but if you send V2 please consider
moving the "verify_opcode()" check down, after the is_zero_page/PageCompound
checks.
Oleg.
在 2025/2/21 23:28, Oleg Nesterov 写道:
> On 02/21, Tong Tiangen wrote:
>>
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -506,6 +506,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>> if (ret <= 0)
>> goto put_old;
>>
>> + if (is_zero_page(old_page)) {
>> + ret = -EINVAL;
>> + goto put_old;
>> + }
>
> I agree with David, the subject looks a bit misleading.
>
> And. I won't insist, this is cosmetic, but if you send V2 please consider
> moving the "verify_opcode()" check down, after the is_zero_page/PageCompound
> checks.
>
> Oleg.
OK, check the validity of the old page first and modify the subject in
v3 .
Thanks.
>
>
> .
在 2025/2/22 10:37, Tong Tiangen 写道:
>
>
> 在 2025/2/21 23:28, Oleg Nesterov 写道:
>> On 02/21, Tong Tiangen wrote:
>>>
>>> --- a/kernel/events/uprobes.c
>>> +++ b/kernel/events/uprobes.c
>>> @@ -506,6 +506,11 @@ int uprobe_write_opcode(struct arch_uprobe
>>> *auprobe, struct mm_struct *mm,
>>> if (ret <= 0)
>>> goto put_old;
>>>
>>> + if (is_zero_page(old_page)) {
>>> + ret = -EINVAL;
>>> + goto put_old;
>>> + }
>>
>> I agree with David, the subject looks a bit misleading.
>>
>> And. I won't insist, this is cosmetic, but if you send V2 please consider
>> moving the "verify_opcode()" check down, after the
>> is_zero_page/PageCompound
>> checks.
>>
>> Oleg.
>
> OK, check the validity of the old page first and modify the subject in
> v3 .
>
> Thanks.
I'm going to add a new patch to moving the "verify_opcode()" check down
, IIUC that "!PageAnon(old_page)" below also needs to be moved together,
and as David said this can be triggered by user space, so delete the use
of "WARN", as follows:
@@ -502,20 +502,16 @@ int uprobe_write_opcode(struct arch_uprobe
*auprobe, struct mm_struct *mm,
if (IS_ERR(old_page))
return PTR_ERR(old_page);
- ret = verify_opcode(old_page, vaddr, &opcode);
- if (ret <= 0)
+ ret = -EINVAL;
+ if (is_zero_page(old_page))
goto put_old;
- if (is_zero_page(old_page)) {
- ret = -EINVAL;
+ if (!is_register && (PageCompound(old_page) || !PageAnon(old_page)))
goto put_old;
- }
- if (WARN(!is_register && PageCompound(old_page),
- "uprobe unregister should never work on compound
page\n")) {
- ret = -EINVAL;
+ ret = verify_opcode(old_page, vaddr, &opcode);
+ if (ret <= 0)
goto put_old;
- }
/* We are going to replace instruction, update ref_ctr. */
if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
@@ -526,10 +522,6 @@ int uprobe_write_opcode(struct arch_uprobe
*auprobe, struct mm_struct *mm,
ref_ctr_updated = 1;
}
- ret = 0;
- if (!is_register && !PageAnon(old_page))
- goto put_old;
-
ret = anon_vma_prepare(vma);
Thanks.
>
>>
>>
>> .
>
> .
On 02/22, Tong Tiangen wrote:
>
>
> I'm going to add a new patch to moving the "verify_opcode()" check down
> , IIUC that "!PageAnon(old_page)" below also needs to be moved together,
No, no.
I forgot everything, but please don't do this. IIUC This is optimization
for the case when the probed file has int3 at this offset. We should not
skip update_ref_ctr() in this case, just we can avoid __replace_page().
> and as David said this can be triggered by user space, so delete the use
> of "WARN", as follows:
Hmm... I think that David meant the new WARN_ON() added by you in V1?
Please don't remove the old WARN(PageCompound(old_page) check. If userspace
can trigger this warning we need to to fix this code and add FOLL_SPLIT_PMD
unconditionally (and likely do something else).
I take my words back ;) Don't do the additional cleanups, just add the
is_zero_page(old_page) check right after get_user_page_vma_remote() and
update the subject/changelog as David suggests.
This function needs more cleanups anyway. Say, the usage of orig_page_huge
_looks_ obviously wrong even if (afaics) nothing bad can happen. It should
be reinitialized after "goto retry" or it should be checked before the
"orig_page = find_get_page()" code. The usage of gup_flags looks confusing
too. Lets do this later.
Oleg.
>
>
> @@ -502,20 +502,16 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe,
> struct mm_struct *mm,
> if (IS_ERR(old_page))
> return PTR_ERR(old_page);
>
> - ret = verify_opcode(old_page, vaddr, &opcode);
> - if (ret <= 0)
> + ret = -EINVAL;
> + if (is_zero_page(old_page))
> goto put_old;
>
> - if (is_zero_page(old_page)) {
> - ret = -EINVAL;
> + if (!is_register && (PageCompound(old_page) || !PageAnon(old_page)))
> goto put_old;
> - }
>
> - if (WARN(!is_register && PageCompound(old_page),
> - "uprobe unregister should never work on compound page\n"))
> {
> - ret = -EINVAL;
> + ret = verify_opcode(old_page, vaddr, &opcode);
> + if (ret <= 0)
> goto put_old;
> - }
>
> /* We are going to replace instruction, update ref_ctr. */
> if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
> @@ -526,10 +522,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe,
> struct mm_struct *mm,
> ref_ctr_updated = 1;
> }
>
> - ret = 0;
> - if (!is_register && !PageAnon(old_page))
> - goto put_old;
> -
> ret = anon_vma_prepare(vma);
>
> Thanks.
> >
> >>
> >>
> >>.
> >
> >.
>
On 21.02.25 02:50, Tong Tiangen wrote:
> We triggered the following error logs in syzkaller test:
>
> BUG: Bad page state in process syz.7.38 pfn:1eff3
> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1eff3
> flags: 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff)
> raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8 0000000000000000
> raw: 0000000000000000 0000000000000000 00000000fffffffe 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x32/0x50
> bad_page+0x69/0xf0
> free_unref_page_prepare+0x401/0x500
> free_unref_page+0x6d/0x1b0
> uprobe_write_opcode+0x460/0x8e0
> install_breakpoint.part.0+0x51/0x80
> register_for_each_vma+0x1d9/0x2b0
> __uprobe_register+0x245/0x300
> bpf_uprobe_multi_link_attach+0x29b/0x4f0
> link_create+0x1e2/0x280
> __sys_bpf+0x75f/0xac0
> __x64_sys_bpf+0x1a/0x30
> do_syscall_64+0x56/0x100
> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>
> BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES val:-1
>
> The following syzkaller test case can be used to reproduce:
>
> r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8)
> write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10)
> r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x42, 0x0)
> mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0, 0x12, r4, 0x0)
> r5 = userfaultfd(0x80801)
> ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20})
> r6 = userfaultfd(0x80801)
> ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140))
> ioctl$UFFDIO_REGISTER(r6, 0xc020aa00, &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2})
> ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04, &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}})
> r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3, &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"], &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94)
> bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30, 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00', &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40)
>
> The cause is that zero pfn is set to the pte without increasing the rss
> count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
> not increase accordingly. Then, the operation on the same pfn is performed
> in uprobe_write_opcode()->__replace_page() to unconditional decrease the
> rss count and old_folio's refcount.
>
> Therefore, two bugs are introduced:
> 1. The rss count is incorrect, when process exit, the check_mm() report
> error "Bad rss-count".
> 2. The reserved folio (zero folio) is freed when folio->refcount is zero,
> then free_pages_prepare->free_page_is_bad() report error
> "Bad page state".
Well, there is more, like triggering the
VM_WARN_ON_FOLIO(is_zero_folio(folio), folio);
in __folio_rmap_sanity_checks() I assume.
So maybe just call the patch
"uprobes: reject the share zeropage in uprobe_write_opcode)()"
Thanks!
--
Cheers,
David / dhildenb
在 2025/2/21 16:12, David Hildenbrand 写道:
> On 21.02.25 02:50, Tong Tiangen wrote:
>> We triggered the following error logs in syzkaller test:
>>
>> BUG: Bad page state in process syz.7.38 pfn:1eff3
>> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
>> pfn:0x1eff3
>> flags:
>> 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff)
>> raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8
>> 0000000000000000
>> raw: 0000000000000000 0000000000000000 00000000fffffffe
>> 0000000000000000
>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 1.13.0-1ubuntu1.1 04/01/2014
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x32/0x50
>> bad_page+0x69/0xf0
>> free_unref_page_prepare+0x401/0x500
>> free_unref_page+0x6d/0x1b0
>> uprobe_write_opcode+0x460/0x8e0
>> install_breakpoint.part.0+0x51/0x80
>> register_for_each_vma+0x1d9/0x2b0
>> __uprobe_register+0x245/0x300
>> bpf_uprobe_multi_link_attach+0x29b/0x4f0
>> link_create+0x1e2/0x280
>> __sys_bpf+0x75f/0xac0
>> __x64_sys_bpf+0x1a/0x30
>> do_syscall_64+0x56/0x100
>> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>>
>> BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES
>> val:-1
>>
>> The following syzkaller test case can be used to reproduce:
>>
>> r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8)
>> write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10)
>> r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00',
>> 0x42, 0x0)
>> mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0,
>> 0x12, r4, 0x0)
>> r5 = userfaultfd(0x80801)
>> ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20})
>> r6 = userfaultfd(0x80801)
>> ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140))
>> ioctl$UFFDIO_REGISTER(r6, 0xc020aa00,
>> &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2})
>> ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04,
>> &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}})
>> r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3,
>> &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"],
>> &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0,
>> @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94)
>> bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30,
>> 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00',
>> &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40)
>>
>> The cause is that zero pfn is set to the pte without increasing the rss
>> count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
>> not increase accordingly. Then, the operation on the same pfn is
>> performed
>> in uprobe_write_opcode()->__replace_page() to unconditional decrease the
>> rss count and old_folio's refcount.
>>
>> Therefore, two bugs are introduced:
>> 1. The rss count is incorrect, when process exit, the check_mm() report
>> error "Bad rss-count".
>> 2. The reserved folio (zero folio) is freed when folio->refcount is zero,
>> then free_pages_prepare->free_page_is_bad() report error
>> "Bad page state".
>
> Well, there is more, like triggering the
>
> VM_WARN_ON_FOLIO(is_zero_folio(folio), folio);
>
> in __folio_rmap_sanity_checks() I assume.
>
> So maybe just call the patch
>
> "uprobes: reject the share zeropage in uprobe_write_opcode)()"
>
> Thanks!
OK, This subject is more appropriate.
Thanks.
>
© 2016 - 2025 Red Hat, Inc.