syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
mode, it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
and int are 32-bit wide. This setup causes the following mishap during
the &= ~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is
0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.
Note: other VM_* flags are not affected:
This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
all constants of type int and after ~ operation, they end up with
leading 1 and are thus converted to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
include/linux/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..c6794d0e24eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
#define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */
#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
On 10/1/25 11:03, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
>
> [ 44.607039] ------------[ cut here ]------------
> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>
> <snip other registers, drop unreliable trace>
>
> [ 44.617726] Call Trace:
> [ 44.617926] <TASK>
> [ 44.619284] userfaultfd_release+0xef/0x1b0
> [ 44.620976] __fput+0x3f9/0xb60
> [ 44.621240] fput_close_sync+0x110/0x210
> [ 44.622222] __x64_sys_close+0x8f/0x120
> [ 44.622530] do_syscall_64+0x5b/0x2f0
> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 44.623244] RIP: 0033:0x7f365bb3f227
>
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
>
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
>
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
>
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.
>
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
>
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
>
> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>
> but the root-cause (flag-drop) remains the same.
>
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Late to the party, but it seems to me the correct Fixes: should be
f8af4da3b4c1 ("ksm: the mm interface to ksm")
which introduced the flag and the buggy clearing code, no?
Commit 7677f7fd8be76 is just one that notices it, right? But there are other
flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
can be cleared using a madvise.
So we can't amend the Fixes: now but maybe could advise stable to backport
for even older versions than based on 7677f7fd8be76 ?
> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
> include/linux/mm.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..c6794d0e24eb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
> #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
> #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
> -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
> +#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */
>
> #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
On 11/6/25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>>
>> [ 44.607039] ------------[ cut here ]------------
>> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>>
>> <snip other registers, drop unreliable trace>
>>
>> [ 44.617726] Call Trace:
>> [ 44.617926] <TASK>
>> [ 44.619284] userfaultfd_release+0xef/0x1b0
>> [ 44.620976] __fput+0x3f9/0xb60
>> [ 44.621240] fput_close_sync+0x110/0x210
>> [ 44.622222] __x64_sys_close+0x8f/0x120
>> [ 44.622530] do_syscall_64+0x5b/0x2f0
>> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 44.623244] RIP: 0033:0x7f365bb3f227
>>
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>>
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>>
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>>
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>>
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>>
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>>
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>>
>> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>>
>> but the root-cause (flag-drop) remains the same.
>>
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
>
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
Clarification: flags with bits >31 did not exist at the time of f8af4da3b4c1
as they were only introduced later with 63c17fb8e5a4 ("mm/core,
x86/mm/pkeys: Store protection bits in high VMA flags") (v4.6) so that would
have been the most precise Fixes: commit. Sorry, Hugh :)
But that doesn't affect the stable backports efforts where the oldest LTS is
5.4 anyway.
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
>
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
>
>> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Xu Xin <xu.xin16@zte.com.cn>
>> Cc: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: stable@vger.kernel.org
>> ---
>> include/linux/mm.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..c6794d0e24eb 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
>> #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
>> #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
>> #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
>> -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
>> +#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */
>>
>> #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
>> #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
>
On Thu, Nov 06, 2025 at 11:39:28AM +0100, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
> > syzkaller discovered the following crash: (kernel BUG)
> >
> > [ 44.607039] ------------[ cut here ]------------
> > [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> > [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> > [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> >
> > <snip other registers, drop unreliable trace>
> >
> > [ 44.617726] Call Trace:
> > [ 44.617926] <TASK>
> > [ 44.619284] userfaultfd_release+0xef/0x1b0
> > [ 44.620976] __fput+0x3f9/0xb60
> > [ 44.621240] fput_close_sync+0x110/0x210
> > [ 44.622222] __x64_sys_close+0x8f/0x120
> > [ 44.622530] do_syscall_64+0x5b/0x2f0
> > [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [ 44.623244] RIP: 0033:0x7f365bb3f227
> >
> > Kernel panics because it detects UFFD inconsistency during
> > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> >
> > The inconsistency is caused in ksm_madvise(): when user calls madvise()
> > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> > mode, it accidentally clears all flags stored in the upper 32 bits of
> > vma->vm_flags.
> >
> > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> > and int are 32-bit wide. This setup causes the following mishap during
> > the &= ~VM_MERGEABLE assignment.
> >
> > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> > promoted to unsigned long before the & operation. This promotion fills
> > upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> > even for a signed conversion, this wouldn't help as the leading bit is
> > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> > the upper 32-bits of its value.
> >
> > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> > BIT() macro.
> >
> > Note: other VM_* flags are not affected:
> > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> > all constants of type int and after ~ operation, they end up with
> > leading 1 and are thus converted to unsigned long with leading 1s.
> >
> > Note 2:
> > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> > no longer a kernel BUG, but a WARNING at the same place:
> >
> > [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> >
> > but the root-cause (flag-drop) remains the same.
> >
> > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
>
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
>
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
>
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
>
Good point. It was a bit tricky to determine the correct "fixes" tag, as
there were more candidates:
- the commit that initially introduced VM_MERGEABLE as a constant with
different inferred type to other vm_flags constants
- the commit that first started using upper 32 bits of vm_flags and did
not make sure the constants are defined safely
- f8af4da3b4c1 indeed, as the one that makes the drop actually possible
- 7677f7fd8be76 that shows us a path where the drop manifests
Looking back, I agree f8af4da3b4c1 is the better option, but as you
said, that won't be changed now.
Nevertheless, I'll send the backports after a round of kselftests,
thanks for pointing this out.
Have a good day,
Jakub
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
On 06.11.25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>>
>> [ 44.607039] ------------[ cut here ]------------
>> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>>
>> <snip other registers, drop unreliable trace>
>>
>> [ 44.617726] Call Trace:
>> [ 44.617926] <TASK>
>> [ 44.619284] userfaultfd_release+0xef/0x1b0
>> [ 44.620976] __fput+0x3f9/0xb60
>> [ 44.621240] fput_close_sync+0x110/0x210
>> [ 44.622222] __x64_sys_close+0x8f/0x120
>> [ 44.622530] do_syscall_64+0x5b/0x2f0
>> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 44.623244] RIP: 0033:0x7f365bb3f227
>>
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>>
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>>
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>>
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>>
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>>
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>>
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>>
>> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>>
>> but the root-cause (flag-drop) remains the same.
>>
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
>
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
>
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
>
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
Yes, I agree.
--
Cheers
David
On Wed, 1 Oct 2025 09:03:52 +0000 Jakub Acs <acsjakub@amazon.de> wrote:
> syzkaller discovered the following crash: (kernel BUG)
>
> [ 44.607039] ------------[ cut here ]------------
> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>
> <snip other registers, drop unreliable trace>
>
> [ 44.617726] Call Trace:
> [ 44.617926] <TASK>
> [ 44.619284] userfaultfd_release+0xef/0x1b0
> [ 44.620976] __fput+0x3f9/0xb60
> [ 44.621240] fput_close_sync+0x110/0x210
> [ 44.622222] __x64_sys_close+0x8f/0x120
> [ 44.622530] do_syscall_64+0x5b/0x2f0
> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 44.623244] RIP: 0033:0x7f365bb3f227
>
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
>
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
>
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
>
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.
Nice!
>
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
>
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
>
> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>
> but the root-cause (flag-drop) remains the same.
>
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Nit. It is recommended [1] to use 12 characters of the SHA-1 ID, but you are
using 13 characters.
> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
Nit. This would be nice to be placed just after the 'Fixes:' tag.
Acked-by: SeongJae Park <sj@kernel.org>
[1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
Thanks,
SJ
[...]
On 01.10.25 11:03, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
>
> [ 44.607039] ------------[ cut here ]------------
> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>
> <snip other registers, drop unreliable trace>
>
> [ 44.617726] Call Trace:
> [ 44.617926] <TASK>
> [ 44.619284] userfaultfd_release+0xef/0x1b0
> [ 44.620976] __fput+0x3f9/0xb60
> [ 44.621240] fput_close_sync+0x110/0x210
> [ 44.622222] __x64_sys_close+0x8f/0x120
> [ 44.622530] do_syscall_64+0x5b/0x2f0
> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 44.623244] RIP: 0033:0x7f365bb3f227
>
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
>
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
>
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
>
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.
>
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
>
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
>
> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>
> but the root-cause (flag-drop) remains the same.
>
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Very Likely we want to CC stable.
> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
IMHO no need to resend this one if Andrew can just pick this one up.
Then, you can send out patch #2 separately as commented in reply to
patch #2.
Thanks!
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers
David / dhildenb
© 2016 - 2026 Red Hat, Inc.