lib/buildid.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
__build_id_parse() only works on page-backed storage. Its helper paths
eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
that do not map a regular file or lack valid address_space operations.
Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
lib/buildid.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index aaf61dfc0919..7131594cb071 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
int ret;
/* only works for page backed storage */
- if (!vma->vm_file)
+ if (!vma->vm_file ||
+ !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
+ !vma->vm_file->f_mapping->a_ops ||
+ !vma->vm_file->f_mapping->a_ops->read_folio)
return -EINVAL;
freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
--
2.43.0
Hi Jinchao, On Tue, Dec 23, 2025 at 06:32:07PM +0800, Jinchao Wang wrote: > __build_id_parse() only works on page-backed storage. Its helper paths > eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs > that do not map a regular file or lack valid address_space operations. > > Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> Check the previous discussion on this at https://lore.kernel.org/all/20251114193729.251892-1-ssranevjti@gmail.com/ The preferred solution was to use kernel_read() call instead of adding more such checks. Please check and test the patch at https://lore.kernel.org/20251222205859.3968077-1-shakeel.butt@linux.dev/
On Tue, Dec 23, 2025 at 11:05:49AM -0800, Shakeel Butt wrote: > Hi Jinchao, > > On Tue, Dec 23, 2025 at 06:32:07PM +0800, Jinchao Wang wrote: > > __build_id_parse() only works on page-backed storage. Its helper paths > > eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs > > that do not map a regular file or lack valid address_space operations. > > > > Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com > > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > > Check the previous discussion on this at > https://lore.kernel.org/all/20251114193729.251892-1-ssranevjti@gmail.com/ > > The preferred solution was to use kernel_read() call instead of adding > more such checks. Please check and test the patch at > https://lore.kernel.org/20251222205859.3968077-1-shakeel.butt@linux.dev/ > Thanks for the pointer. After reading the discussion and the patch, I agree with you. I also tested your patch, it fixes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote:
> __build_id_parse() only works on page-backed storage. Its helper paths
> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
> that do not map a regular file or lack valid address_space operations.
>
> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
>
> ...
>
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
> int ret;
>
> /* only works for page backed storage */
> - if (!vma->vm_file)
> + if (!vma->vm_file ||
> + !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
> + !vma->vm_file->f_mapping->a_ops ||
> + !vma->vm_file->f_mapping->a_ops->read_folio)
> return -EINVAL;
>
> freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
Thanks. Seems this one needs additional paperwork.
I added the below:
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Tested-by: <syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com>
Link: https://lkml.kernel.org/r/694a67ab.050a0220.19928e.001c.GAE@google.com
Closes: https://lkml.kernel.org/r/693540fe.a70a0220.38f243.004c.GAE@google.com
Cc: <stable@vger.kernel.org>
and a large number of cc's which I scraped together from various
emails.
Could people please eyeball all of this and verify that everything is
good?
From: Jinchao Wang <wangjinchao600@gmail.com>
Subject: buildid: validate page-backed file before parsing build ID
Date: Tue, 23 Dec 2025 18:32:07 +0800
__build_id_parse() only works on page-backed storage. Its helper paths
eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
that do not map a regular file or lack valid address_space operations.
Link: https://lkml.kernel.org/r/20251223103214.2412446-1-wangjinchao600@gmail.com
Fixes: ad41251c290d ("lib/buildid: implement sleepable build_id_parse() API")
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Reported-by: <syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com>
Tested-by: <syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com>
Link: https://lkml.kernel.org/r/694a67ab.050a0220.19928e.001c.GAE@google.com
Closes: https://lkml.kernel.org/r/693540fe.a70a0220.38f243.004c.GAE@google.com
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Deepanshu Kartikey <kartikey406@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkman <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
lib/buildid.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/lib/buildid.c~buildid-validate-page-backed-file-before-parsing-build-id
+++ a/lib/buildid.c
@@ -288,7 +288,10 @@ static int __build_id_parse(struct vm_ar
int ret;
/* only works for page backed storage */
- if (!vma->vm_file)
+ if (!vma->vm_file ||
+ !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
+ !vma->vm_file->f_mapping->a_ops ||
+ !vma->vm_file->f_mapping->a_ops->read_folio)
return -EINVAL;
freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
_
On 12/23/25 18:29, Andrew Morton wrote: > On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote: > >> __build_id_parse() only works on page-backed storage. Its helper paths >> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs >> that do not map a regular file or lack valid address_space operations. >> >> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com >> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> >> >> ... >> >> --- a/lib/buildid.c >> +++ b/lib/buildid.c >> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, >> int ret; >> >> /* only works for page backed storage */ >> - if (!vma->vm_file) >> + if (!vma->vm_file || >> + !S_ISREG(file_inode(vma->vm_file)->i_mode) || >> + !vma->vm_file->f_mapping->a_ops || >> + !vma->vm_file->f_mapping->a_ops->read_folio) >> return -EINVAL; Just wondering, we are fine with MAP_PRIVATE files, right? I guess it's not about the actual content in the VMA (which might be different for a MAP_PRIVATE VMA), but only about the content of the mapped file. LGTM, although I wonder whether some of these these checks should be exposed as part of the read_cache_folio()/do_read_cache_folio() API. Like, having a helper function that tells us whether we can use do_read_cache_folio() against a given mapping+file. -- Cheers David
On Tue, Dec 30, 2025 at 2:11 PM David Hildenbrand (Red Hat) <david@kernel.org> wrote: > > On 12/23/25 18:29, Andrew Morton wrote: > > On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote: > > > >> __build_id_parse() only works on page-backed storage. Its helper paths > >> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs > >> that do not map a regular file or lack valid address_space operations. > >> > >> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com > >> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> > >> > >> ... > >> > >> --- a/lib/buildid.c > >> +++ b/lib/buildid.c > >> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, > >> int ret; > >> > >> /* only works for page backed storage */ > >> - if (!vma->vm_file) > >> + if (!vma->vm_file || > >> + !S_ISREG(file_inode(vma->vm_file)->i_mode) || > >> + !vma->vm_file->f_mapping->a_ops || > >> + !vma->vm_file->f_mapping->a_ops->read_folio) > >> return -EINVAL; > > Just wondering, we are fine with MAP_PRIVATE files, right? I guess it's > not about the actual content in the VMA (which might be different for a > MAP_PRIVATE VMA), but only about the content of the mapped file. Yep, this code is fetching contents of a file that backs given VMA. > > > LGTM, although I wonder whether some of these these checks should be > exposed as part of the read_cache_folio()/do_read_cache_folio() API. > > Like, having a helper function that tells us whether we can use > do_read_cache_folio() against a given mapping+file. I agree, this seems to be leaking a lot of internal mm details into higher-level caller (__build_id_parse). Right now we try to fetch folio with filemap_get_folio() and if that succeeds, then we do read_cache_folio. Would it be possible for filemap_get_folio() to return error if the folio cannot be read using read_cache_folio()? Or maybe have a variant of filemap_get_folio() that would have this semantic? > > -- > Cheers > > David
On 1/5/26 23:52, Andrii Nakryiko wrote:
> On Tue, Dec 30, 2025 at 2:11 PM David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>>
>> On 12/23/25 18:29, Andrew Morton wrote:
>>> On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote:
>>>
>>>> __build_id_parse() only works on page-backed storage. Its helper paths
>>>> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
>>>> that do not map a regular file or lack valid address_space operations.
>>>>
>>>> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
>>>> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
>>>>
>>>> ...
>>>>
>>>> --- a/lib/buildid.c
>>>> +++ b/lib/buildid.c
>>>> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
>>>> int ret;
>>>>
>>>> /* only works for page backed storage */
>>>> - if (!vma->vm_file)
>>>> + if (!vma->vm_file ||
>>>> + !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
>>>> + !vma->vm_file->f_mapping->a_ops ||
>>>> + !vma->vm_file->f_mapping->a_ops->read_folio)
>>>> return -EINVAL;
>>
>> Just wondering, we are fine with MAP_PRIVATE files, right? I guess it's
>> not about the actual content in the VMA (which might be different for a
>> MAP_PRIVATE VMA), but only about the content of the mapped file.
>
> Yep, this code is fetching contents of a file that backs given VMA.
Good!
>
>>
>>
>> LGTM, although I wonder whether some of these these checks should be
>> exposed as part of the read_cache_folio()/do_read_cache_folio() API.
>>
>> Like, having a helper function that tells us whether we can use
>> do_read_cache_folio() against a given mapping+file.
>
> I agree, this seems to be leaking a lot of internal mm details into
> higher-level caller (__build_id_parse). Right now we try to fetch
> folio with filemap_get_folio() and if that succeeds, then we do
> read_cache_folio. Would it be possible for filemap_get_folio() to
> return error if the folio cannot be read using read_cache_folio()? Or
> maybe have a variant of filemap_get_folio() that would have this
> semantic?
Good question. But really, for files that always have everything in the pagecache,
there would not be a problem, right? I'm thinking about hugetlb, for example.
There, we never expect to fallback to do_read_cache_folio().
So maybe we could just teach do_read_cache_folio() to fail properly?
diff --git a/mm/filemap.c b/mm/filemap.c
index ebd75684cb0a7..3f81b8481af4c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4051,8 +4051,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
struct folio *folio;
int err;
- if (!filler)
+ if (!filler) {
+ if (!mapping->a_ops || !mapping->a_ops->read_folio)
+ return ERR_PTR(-EINVAL);
filler = mapping->a_ops->read_folio;
+ }
repeat:
folio = filemap_get_folio(mapping, index);
if (IS_ERR(folio)) {
Then __build_id_parse() would only check for the existence of vma->vm_file and maybe
the !S_ISREG(file_inode(vma->vm_file)->i_mode).
--
Cheers
David
On Tue, Jan 6, 2026 at 11:16 AM David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> On 1/5/26 23:52, Andrii Nakryiko wrote:
> > On Tue, Dec 30, 2025 at 2:11 PM David Hildenbrand (Red Hat)
> > <david@kernel.org> wrote:
> >>
> >> On 12/23/25 18:29, Andrew Morton wrote:
> >>> On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote:
> >>>
> >>>> __build_id_parse() only works on page-backed storage. Its helper paths
> >>>> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
> >>>> that do not map a regular file or lack valid address_space operations.
> >>>>
> >>>> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
> >>>> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> >>>>
> >>>> ...
> >>>>
> >>>> --- a/lib/buildid.c
> >>>> +++ b/lib/buildid.c
> >>>> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
> >>>> int ret;
> >>>>
> >>>> /* only works for page backed storage */
> >>>> - if (!vma->vm_file)
> >>>> + if (!vma->vm_file ||
> >>>> + !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
> >>>> + !vma->vm_file->f_mapping->a_ops ||
> >>>> + !vma->vm_file->f_mapping->a_ops->read_folio)
> >>>> return -EINVAL;
> >>
> >> Just wondering, we are fine with MAP_PRIVATE files, right? I guess it's
> >> not about the actual content in the VMA (which might be different for a
> >> MAP_PRIVATE VMA), but only about the content of the mapped file.
> >
> > Yep, this code is fetching contents of a file that backs given VMA.
>
> Good!
>
> >
> >>
> >>
> >> LGTM, although I wonder whether some of these these checks should be
> >> exposed as part of the read_cache_folio()/do_read_cache_folio() API.
> >>
> >> Like, having a helper function that tells us whether we can use
> >> do_read_cache_folio() against a given mapping+file.
> >
> > I agree, this seems to be leaking a lot of internal mm details into
> > higher-level caller (__build_id_parse). Right now we try to fetch
> > folio with filemap_get_folio() and if that succeeds, then we do
> > read_cache_folio. Would it be possible for filemap_get_folio() to
> > return error if the folio cannot be read using read_cache_folio()? Or
> > maybe have a variant of filemap_get_folio() that would have this
> > semantic?
>
> Good question. But really, for files that always have everything in the pagecache,
> there would not be a problem, right? I'm thinking about hugetlb, for example.
>
> There, we never expect to fallback to do_read_cache_folio().
>
> So maybe we could just teach do_read_cache_folio() to fail properly?
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ebd75684cb0a7..3f81b8481af4c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4051,8 +4051,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
> struct folio *folio;
> int err;
>
> - if (!filler)
> + if (!filler) {
> + if (!mapping->a_ops || !mapping->a_ops->read_folio)
> + return ERR_PTR(-EINVAL);
> filler = mapping->a_ops->read_folio;
> + }
> repeat:
> folio = filemap_get_folio(mapping, index);
> if (IS_ERR(folio)) {
>
> Then __build_id_parse() would only check for the existence of vma->vm_file and maybe
> the !S_ISREG(file_inode(vma->vm_file)->i_mode).
>
That would be great. But something like this was proposed earlier and
Matthew didn't particularly like this approach ([0]).
[0] https://lore.kernel.org/all/aReUv1kVACh3UKv-@casper.infradead.org/
>
> --
> Cheers
>
> David
On 1/10/26 00:43, Andrii Nakryiko wrote:
> On Tue, Jan 6, 2026 at 11:16 AM David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>>
>> On 1/5/26 23:52, Andrii Nakryiko wrote:
>>> On Tue, Dec 30, 2025 at 2:11 PM David Hildenbrand (Red Hat)
>>> <david@kernel.org> wrote:
>>>>
>>>> On 12/23/25 18:29, Andrew Morton wrote:
>>>>> On Tue, 23 Dec 2025 18:32:07 +0800 Jinchao Wang <wangjinchao600@gmail.com> wrote:
>>>>>
>>>>>> __build_id_parse() only works on page-backed storage. Its helper paths
>>>>>> eventually call mapping->a_ops->read_folio(), so explicitly reject VMAs
>>>>>> that do not map a regular file or lack valid address_space operations.
>>>>>>
>>>>>> Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> --- a/lib/buildid.c
>>>>>> +++ b/lib/buildid.c
>>>>>> @@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
>>>>>> int ret;
>>>>>>
>>>>>> /* only works for page backed storage */
>>>>>> - if (!vma->vm_file)
>>>>>> + if (!vma->vm_file ||
>>>>>> + !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
>>>>>> + !vma->vm_file->f_mapping->a_ops ||
>>>>>> + !vma->vm_file->f_mapping->a_ops->read_folio)
>>>>>> return -EINVAL;
>>>>
>>>> Just wondering, we are fine with MAP_PRIVATE files, right? I guess it's
>>>> not about the actual content in the VMA (which might be different for a
>>>> MAP_PRIVATE VMA), but only about the content of the mapped file.
>>>
>>> Yep, this code is fetching contents of a file that backs given VMA.
>>
>> Good!
>>
>>>
>>>>
>>>>
>>>> LGTM, although I wonder whether some of these these checks should be
>>>> exposed as part of the read_cache_folio()/do_read_cache_folio() API.
>>>>
>>>> Like, having a helper function that tells us whether we can use
>>>> do_read_cache_folio() against a given mapping+file.
>>>
>>> I agree, this seems to be leaking a lot of internal mm details into
>>> higher-level caller (__build_id_parse). Right now we try to fetch
>>> folio with filemap_get_folio() and if that succeeds, then we do
>>> read_cache_folio. Would it be possible for filemap_get_folio() to
>>> return error if the folio cannot be read using read_cache_folio()? Or
>>> maybe have a variant of filemap_get_folio() that would have this
>>> semantic?
>>
>> Good question. But really, for files that always have everything in the pagecache,
>> there would not be a problem, right? I'm thinking about hugetlb, for example.
>>
>> There, we never expect to fallback to do_read_cache_folio().
>>
>> So maybe we could just teach do_read_cache_folio() to fail properly?
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index ebd75684cb0a7..3f81b8481af4c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -4051,8 +4051,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
>> struct folio *folio;
>> int err;
>>
>> - if (!filler)
>> + if (!filler) {
>> + if (!mapping->a_ops || !mapping->a_ops->read_folio)
>> + return ERR_PTR(-EINVAL);
>> filler = mapping->a_ops->read_folio;
>> + }
>> repeat:
>> folio = filemap_get_folio(mapping, index);
>> if (IS_ERR(folio)) {
>>
>> Then __build_id_parse() would only check for the existence of vma->vm_file and maybe
>> the !S_ISREG(file_inode(vma->vm_file)->i_mode).
>>
>
> That would be great. But something like this was proposed earlier and
> Matthew didn't particularly like this approach ([0]).
>
> [0] https://lore.kernel.org/all/aReUv1kVACh3UKv-@casper.infradead.org/
Well, but on the higher level we don't know whether we have to even call
into read_cache_folio().
Again, hugetlb. Might be worth reproducing with hugetlb/shmem, and
making sure it keeps working even with your changes.
--
Cheers
David
© 2016 - 2026 Red Hat, Inc.