fs/erofs/super.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.
This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).
One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.
After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.
As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.
This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.
Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.
We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.
Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: Alexander Larsson <alexl@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Sheng Yong <shengyong1@xiaomi.com>
Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v2:
- Update commit message (suggested by Amir in 1-on-1 talk);
- Add proper `Reported-by:`.
fs/erofs/super.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 937a215f626c..0cf41ed7ced8 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
* fs contexts (including its own) due to self-controlled RO
* accesses/contexts and no side-effect changes that need to
* context save & restore so it can reuse the current thread
- * context. However, it still needs to bump `s_stack_depth` to
- * avoid kernel stack overflow from nested filesystems.
+ * context.
+ * However, we still need to prevent kernel stack overflow due
+ * to filesystem nesting: just ensure that s_stack_depth is 0
+ * to disallow mounting EROFS on stacked filesystems.
+ * Note: s_stack_depth is not incremented here for now, since
+ * EROFS is the only fs supporting file-backed mounts for now.
+ * It MUST change if another fs plans to support them, which
+ * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
*/
if (erofs_is_fileio_mode(sbi)) {
- sb->s_stack_depth =
- file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
- if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
- erofs_err(sb, "maximum fs stacking depth exceeded");
+ inode = file_inode(sbi->dif0.file);
+ if (inode->i_sb->s_op == &erofs_sops ||
+ inode->i_sb->s_stack_depth) {
+ erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
return -ENOTBLK;
}
}
--
2.43.5
On 1/7/26 01:05, Gao Xiang wrote:
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Cc: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2:
> - Update commit message (suggested by Amir in 1-on-1 talk);
> - Add proper `Reported-by:`.
>
> fs/erofs/super.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 937a215f626c..0cf41ed7ced8 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> * fs contexts (including its own) due to self-controlled RO
> * accesses/contexts and no side-effect changes that need to
> * context save & restore so it can reuse the current thread
> - * context. However, it still needs to bump `s_stack_depth` to
> - * avoid kernel stack overflow from nested filesystems.
> + * context.
> + * However, we still need to prevent kernel stack overflow due
> + * to filesystem nesting: just ensure that s_stack_depth is 0
> + * to disallow mounting EROFS on stacked filesystems.
> + * Note: s_stack_depth is not incremented here for now, since
> + * EROFS is the only fs supporting file-backed mounts for now.
> + * It MUST change if another fs plans to support them, which
> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
> */
> if (erofs_is_fileio_mode(sbi)) {
> - sb->s_stack_depth =
> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> - erofs_err(sb, "maximum fs stacking depth exceeded");
> + inode = file_inode(sbi->dif0.file);
> + if (inode->i_sb->s_op == &erofs_sops ||
Hi, Xiang
In Android APEX scenario, apex images formatted as EROFS are packed in
system.img which is also EROFS format. As a result, it will always fail
to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
is true.
Any thoughts to handle such scenario?
thanks,
shengyong
> + inode->i_sb->s_stack_depth) {
> + erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
> return -ENOTBLK;
> }
> }
Hi Sheng,
On 2026/1/8 10:26, Sheng Yong wrote:
> On 1/7/26 01:05, Gao Xiang wrote:
>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
>> stack overflow when stacking an unlimited number of EROFS on top of
>> each other.
>>
>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
>> (and such setups are already used in production for quite a long time).
>>
>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
>> from 2 to 3, but proving that this is safe in general is a high bar.
>>
>> After a long discussion on GitHub issues [1] about possible solutions,
>> one conclusion is that there is no need to support nesting file-backed
>> EROFS mounts on stacked filesystems, because there is always the option
>> to use loopback devices as a fallback.
>>
>> As a quick fix for the composefs regression for this cycle, instead of
>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
>> nesting file-backed EROFS over EROFS and over filesystems with
>> `s_stack_depth` > 0.
>>
>> This works for all known file-backed mount use cases (composefs,
>> containerd, and Android APEX for some Android vendors), and the fix is
>> self-contained.
>>
>> Essentially, we are allowing one extra unaccounted fs stacking level of
>> EROFS below stacking filesystems, but EROFS can only be used in the read
>> path (i.e. overlayfs lower layers), which typically has much lower stack
>> usage than the write path.
>>
>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
>> stack usage analysis or using alternative approaches, such as splitting
>> the `s_stack_depth` limitation according to different combinations of
>> stacking.
>>
>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
>> Reported-by: Dusty Mabe <dusty@dustymabe.com>
>> Reported-by: Timothée Ravier <tim@siosm.fr>
>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
>> Acked-by: Amir Goldstein <amir73il@gmail.com>
>> Cc: Alexander Larsson <alexl@redhat.com>
>> Cc: Christian Brauner <brauner@kernel.org>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> Cc: Sheng Yong <shengyong1@xiaomi.com>
>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>> ---
>> v2:
>> - Update commit message (suggested by Amir in 1-on-1 talk);
>> - Add proper `Reported-by:`.
>>
>> fs/erofs/super.c | 18 ++++++++++++------
>> 1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index 937a215f626c..0cf41ed7ced8 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>> * fs contexts (including its own) due to self-controlled RO
>> * accesses/contexts and no side-effect changes that need to
>> * context save & restore so it can reuse the current thread
>> - * context. However, it still needs to bump `s_stack_depth` to
>> - * avoid kernel stack overflow from nested filesystems.
>> + * context.
>> + * However, we still need to prevent kernel stack overflow due
>> + * to filesystem nesting: just ensure that s_stack_depth is 0
>> + * to disallow mounting EROFS on stacked filesystems.
>> + * Note: s_stack_depth is not incremented here for now, since
>> + * EROFS is the only fs supporting file-backed mounts for now.
>> + * It MUST change if another fs plans to support them, which
>> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
>> */
>> if (erofs_is_fileio_mode(sbi)) {
>> - sb->s_stack_depth =
>> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
>> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
>> - erofs_err(sb, "maximum fs stacking depth exceeded");
>> + inode = file_inode(sbi->dif0.file);
>> + if (inode->i_sb->s_op == &erofs_sops ||
>
> Hi, Xiang
>
> In Android APEX scenario, apex images formatted as EROFS are packed in
> system.img which is also EROFS format. As a result, it will always fail
> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
> is true.
> Any thoughts to handle such scenario?
Sorry, I forgot this popular case, I think it can be simply resolved
by the following diff:
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0cf41ed7ced8..e93264034b5d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
*/
if (erofs_is_fileio_mode(sbi)) {
inode = file_inode(sbi->dif0.file);
- if (inode->i_sb->s_op == &erofs_sops ||
+ if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
inode->i_sb->s_stack_depth) {
erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
return -ENOTBLK;
"!sb->s_bdev" covers file-backed EROFS mounts and
(deprecated) fscache EROFS mounts, I will send v3 soon.
Thanks,
Gao Xiang
>
> thanks,
> shengyong
>
>> + inode->i_sb->s_stack_depth) {
>> + erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
>> return -ENOTBLK;
>> }
>> }
On 2026/1/8 10:32, Gao Xiang wrote:
> Hi Sheng,
>
> On 2026/1/8 10:26, Sheng Yong wrote:
>> On 1/7/26 01:05, Gao Xiang wrote:
>>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
>>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
>>> stack overflow when stacking an unlimited number of EROFS on top of
>>> each other.
>>>
>>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
>>> (and such setups are already used in production for quite a long time).
>>>
>>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
>>> from 2 to 3, but proving that this is safe in general is a high bar.
>>>
>>> After a long discussion on GitHub issues [1] about possible solutions,
>>> one conclusion is that there is no need to support nesting file-backed
>>> EROFS mounts on stacked filesystems, because there is always the option
>>> to use loopback devices as a fallback.
>>>
>>> As a quick fix for the composefs regression for this cycle, instead of
>>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
>>> nesting file-backed EROFS over EROFS and over filesystems with
>>> `s_stack_depth` > 0.
>>>
>>> This works for all known file-backed mount use cases (composefs,
>>> containerd, and Android APEX for some Android vendors), and the fix is
>>> self-contained.
>>>
>>> Essentially, we are allowing one extra unaccounted fs stacking level of
>>> EROFS below stacking filesystems, but EROFS can only be used in the read
>>> path (i.e. overlayfs lower layers), which typically has much lower stack
>>> usage than the write path.
>>>
>>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
>>> stack usage analysis or using alternative approaches, such as splitting
>>> the `s_stack_depth` limitation according to different combinations of
>>> stacking.
>>>
>>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
>>> Reported-by: Dusty Mabe <dusty@dustymabe.com>
>>> Reported-by: Timothée Ravier <tim@siosm.fr>
>>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
>>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
>>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
>>> Acked-by: Amir Goldstein <amir73il@gmail.com>
>>> Cc: Alexander Larsson <alexl@redhat.com>
>>> Cc: Christian Brauner <brauner@kernel.org>
>>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>>> Cc: Sheng Yong <shengyong1@xiaomi.com>
>>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
>>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>>> ---
>>> v2:
>>> - Update commit message (suggested by Amir in 1-on-1 talk);
>>> - Add proper `Reported-by:`.
>>>
>>> fs/erofs/super.c | 18 ++++++++++++------
>>> 1 file changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>>> index 937a215f626c..0cf41ed7ced8 100644
>>> --- a/fs/erofs/super.c
>>> +++ b/fs/erofs/super.c
>>> @@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>>> * fs contexts (including its own) due to self-controlled RO
>>> * accesses/contexts and no side-effect changes that need to
>>> * context save & restore so it can reuse the current thread
>>> - * context. However, it still needs to bump `s_stack_depth` to
>>> - * avoid kernel stack overflow from nested filesystems.
>>> + * context.
>>> + * However, we still need to prevent kernel stack overflow due
>>> + * to filesystem nesting: just ensure that s_stack_depth is 0
>>> + * to disallow mounting EROFS on stacked filesystems.
>>> + * Note: s_stack_depth is not incremented here for now, since
>>> + * EROFS is the only fs supporting file-backed mounts for now.
>>> + * It MUST change if another fs plans to support them, which
>>> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
>>> */
>>> if (erofs_is_fileio_mode(sbi)) {
>>> - sb->s_stack_depth =
>>> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
>>> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
>>> - erofs_err(sb, "maximum fs stacking depth exceeded");
>>> + inode = file_inode(sbi->dif0.file);
>>> + if (inode->i_sb->s_op == &erofs_sops ||
>>
>> Hi, Xiang
>>
>> In Android APEX scenario, apex images formatted as EROFS are packed in
>> system.img which is also EROFS format. As a result, it will always fail
>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
>> is true.
>> Any thoughts to handle such scenario?
>
> Sorry, I forgot this popular case, I think it can be simply resolved
> by the following diff:
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 0cf41ed7ced8..e93264034b5d 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> */
> if (erofs_is_fileio_mode(sbi)) {
> inode = file_inode(sbi->dif0.file);
> - if (inode->i_sb->s_op == &erofs_sops ||
> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
Sorry it should be `!inode->i_sb->s_bdev`, I've
fixed it in v3 RESEND:
https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
Thanks,
Gao Xiang
> inode->i_sb->s_stack_depth) {
> erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
> return -ENOTBLK;
>
> "!sb->s_bdev" covers file-backed EROFS mounts and
> (deprecated) fscache EROFS mounts, I will send v3 soon.
>
> Thanks,
> Gao Xiang
On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
>
>
> On 2026/1/8 10:32, Gao Xiang wrote:
> > Hi Sheng,
> >
> > On 2026/1/8 10:26, Sheng Yong wrote:
> >> On 1/7/26 01:05, Gao Xiang wrote:
> >>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> >>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> >>> stack overflow when stacking an unlimited number of EROFS on top of
> >>> each other.
> >>>
> >>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> >>> (and such setups are already used in production for quite a long time).
> >>>
> >>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> >>> from 2 to 3, but proving that this is safe in general is a high bar.
> >>>
> >>> After a long discussion on GitHub issues [1] about possible solutions,
> >>> one conclusion is that there is no need to support nesting file-backed
> >>> EROFS mounts on stacked filesystems, because there is always the option
> >>> to use loopback devices as a fallback.
> >>>
> >>> As a quick fix for the composefs regression for this cycle, instead of
> >>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> >>> nesting file-backed EROFS over EROFS and over filesystems with
> >>> `s_stack_depth` > 0.
> >>>
> >>> This works for all known file-backed mount use cases (composefs,
> >>> containerd, and Android APEX for some Android vendors), and the fix is
> >>> self-contained.
> >>>
> >>> Essentially, we are allowing one extra unaccounted fs stacking level of
> >>> EROFS below stacking filesystems, but EROFS can only be used in the read
> >>> path (i.e. overlayfs lower layers), which typically has much lower stack
> >>> usage than the write path.
> >>>
> >>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> >>> stack usage analysis or using alternative approaches, such as splitting
> >>> the `s_stack_depth` limitation according to different combinations of
> >>> stacking.
> >>>
> >>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> >>> Reported-by: Dusty Mabe <dusty@dustymabe.com>
> >>> Reported-by: Timothée Ravier <tim@siosm.fr>
> >>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> >>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> >>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> >>> Acked-by: Amir Goldstein <amir73il@gmail.com>
> >>> Cc: Alexander Larsson <alexl@redhat.com>
> >>> Cc: Christian Brauner <brauner@kernel.org>
> >>> Cc: Miklos Szeredi <mszeredi@redhat.com>
> >>> Cc: Sheng Yong <shengyong1@xiaomi.com>
> >>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> >>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> >>> ---
> >>> v2:
> >>> - Update commit message (suggested by Amir in 1-on-1 talk);
> >>> - Add proper `Reported-by:`.
> >>>
> >>> fs/erofs/super.c | 18 ++++++++++++------
> >>> 1 file changed, 12 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> >>> index 937a215f626c..0cf41ed7ced8 100644
> >>> --- a/fs/erofs/super.c
> >>> +++ b/fs/erofs/super.c
> >>> @@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> >>> * fs contexts (including its own) due to self-controlled RO
> >>> * accesses/contexts and no side-effect changes that need to
> >>> * context save & restore so it can reuse the current thread
> >>> - * context. However, it still needs to bump `s_stack_depth` to
> >>> - * avoid kernel stack overflow from nested filesystems.
> >>> + * context.
> >>> + * However, we still need to prevent kernel stack overflow due
> >>> + * to filesystem nesting: just ensure that s_stack_depth is 0
> >>> + * to disallow mounting EROFS on stacked filesystems.
> >>> + * Note: s_stack_depth is not incremented here for now, since
> >>> + * EROFS is the only fs supporting file-backed mounts for now.
> >>> + * It MUST change if another fs plans to support them, which
> >>> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
> >>> */
> >>> if (erofs_is_fileio_mode(sbi)) {
> >>> - sb->s_stack_depth =
> >>> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
> >>> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> >>> - erofs_err(sb, "maximum fs stacking depth exceeded");
> >>> + inode = file_inode(sbi->dif0.file);
> >>> + if (inode->i_sb->s_op == &erofs_sops ||
> >>
> >> Hi, Xiang
> >>
> >> In Android APEX scenario, apex images formatted as EROFS are packed in
> >> system.img which is also EROFS format. As a result, it will always fail
> >> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
> >> is true.
> >> Any thoughts to handle such scenario?
> >
> > Sorry, I forgot this popular case, I think it can be simply resolved
> > by the following diff:
> >
> > diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> > index 0cf41ed7ced8..e93264034b5d 100644
> > --- a/fs/erofs/super.c
> > +++ b/fs/erofs/super.c
> > @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> > */
> > if (erofs_is_fileio_mode(sbi)) {
> > inode = file_inode(sbi->dif0.file);
> > - if (inode->i_sb->s_op == &erofs_sops ||
> > + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
>
> Sorry it should be `!inode->i_sb->s_bdev`, I've
> fixed it in v3 RESEND:
A RESEND implies no changes since v3, so this is bad practice.
> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
>
Ouch! If the erofs maintainer got this condition wrong... twice...
Maybe better using the helper instead of open coding this non trivial check?
if ((inode->i_sb->s_op == &erofs_sops &&
erofs_is_fileio_mode(EROFS_I_SB(inode)))
Thanks,
Amir.
Hi Amir,
On 2026/1/8 16:02, Amir Goldstein wrote:
> On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
...
>>>>
>>>> Hi, Xiang
>>>>
>>>> In Android APEX scenario, apex images formatted as EROFS are packed in
>>>> system.img which is also EROFS format. As a result, it will always fail
>>>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
>>>> is true.
>>>> Any thoughts to handle such scenario?
>>>
>>> Sorry, I forgot this popular case, I think it can be simply resolved
>>> by the following diff:
>>>
>>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>>> index 0cf41ed7ced8..e93264034b5d 100644
>>> --- a/fs/erofs/super.c
>>> +++ b/fs/erofs/super.c
>>> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>>> */
>>> if (erofs_is_fileio_mode(sbi)) {
>>> inode = file_inode(sbi->dif0.file);
>>> - if (inode->i_sb->s_op == &erofs_sops ||
>>> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
>>
>> Sorry it should be `!inode->i_sb->s_bdev`, I've
>> fixed it in v3 RESEND:
>
> A RESEND implies no changes since v3, so this is bad practice.
>
>> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
>>
>
> Ouch! If the erofs maintainer got this condition wrong... twice...
> Maybe better using the helper instead of open coding this non trivial check?
>
> if ((inode->i_sb->s_op == &erofs_sops &&
> erofs_is_fileio_mode(EROFS_I_SB(inode)))
I was thought to use that, but it excludes fscache as the
backing fs.. so I suggest to use !s_bdev directly to
cover both file-backed mounts and fscache cases directly.
Thanks,
Gao Xiang
>
> Thanks,
> Amir.
On Thu, 8 Jan 2026 16:05:03 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> Hi Amir,
>
> On 2026/1/8 16:02, Amir Goldstein wrote:
> > On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> ...
>
> >>>>
> >>>> Hi, Xiang
> >>>>
> >>>> In Android APEX scenario, apex images formatted as EROFS are packed in
> >>>> system.img which is also EROFS format. As a result, it will always fail
> >>>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
> >>>> is true.
> >>>> Any thoughts to handle such scenario?
> >>>
> >>> Sorry, I forgot this popular case, I think it can be simply resolved
> >>> by the following diff:
> >>>
> >>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> >>> index 0cf41ed7ced8..e93264034b5d 100644
> >>> --- a/fs/erofs/super.c
> >>> +++ b/fs/erofs/super.c
> >>> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> >>> */
> >>> if (erofs_is_fileio_mode(sbi)) {
> >>> inode = file_inode(sbi->dif0.file);
> >>> - if (inode->i_sb->s_op == &erofs_sops ||
> >>> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
> >>
> >> Sorry it should be `!inode->i_sb->s_bdev`, I've
> >> fixed it in v3 RESEND:
> >
> > A RESEND implies no changes since v3, so this is bad practice.
> >
> >> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
> >>
> >
> > Ouch! If the erofs maintainer got this condition wrong... twice...
> > Maybe better using the helper instead of open coding this non trivial check?
> >
> > if ((inode->i_sb->s_op == &erofs_sops &&
> > erofs_is_fileio_mode(EROFS_I_SB(inode)))
>
> I was thought to use that, but it excludes fscache as the
> backing fs.. so I suggest to use !s_bdev directly to
> cover both file-backed mounts and fscache cases directly.
Is it worth just allocating each fs a 'stack needed' value and then
allowing the mount if the total is low enough.
This is equivalent to counting the recursion depth, but lets erofs only
add (say) 0.5.
Ideally you'd want to do static analysis to find the value to add,
but 'inspired guesswork' is probably good enough.
Isn't there also a big difference between recursive mounts (which need
to do read/write on the underlying file) and overlay mounts (which just
pass the request onto the lower filesystem).
David
>
> Thanks,
> Gao Xiang
>
> >
> > Thanks,
> > Amir.
>
>
Hi David,
On 2026/1/8 18:26, David Laight wrote:
> On Thu, 8 Jan 2026 16:05:03 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
>> Hi Amir,
>>
>> On 2026/1/8 16:02, Amir Goldstein wrote:
>>> On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> ...
>>
>>>>>>
>>>>>> Hi, Xiang
>>>>>>
>>>>>> In Android APEX scenario, apex images formatted as EROFS are packed in
>>>>>> system.img which is also EROFS format. As a result, it will always fail
>>>>>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
>>>>>> is true.
>>>>>> Any thoughts to handle such scenario?
>>>>>
>>>>> Sorry, I forgot this popular case, I think it can be simply resolved
>>>>> by the following diff:
>>>>>
>>>>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>>>>> index 0cf41ed7ced8..e93264034b5d 100644
>>>>> --- a/fs/erofs/super.c
>>>>> +++ b/fs/erofs/super.c
>>>>> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>>>>> */
>>>>> if (erofs_is_fileio_mode(sbi)) {
>>>>> inode = file_inode(sbi->dif0.file);
>>>>> - if (inode->i_sb->s_op == &erofs_sops ||
>>>>> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
>>>>
>>>> Sorry it should be `!inode->i_sb->s_bdev`, I've
>>>> fixed it in v3 RESEND:
>>>
>>> A RESEND implies no changes since v3, so this is bad practice.
>>>
>>>> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
>>>>
>>>
>>> Ouch! If the erofs maintainer got this condition wrong... twice...
>>> Maybe better using the helper instead of open coding this non trivial check?
>>>
>>> if ((inode->i_sb->s_op == &erofs_sops &&
>>> erofs_is_fileio_mode(EROFS_I_SB(inode)))
>>
>> I was thought to use that, but it excludes fscache as the
>> backing fs.. so I suggest to use !s_bdev directly to
>> cover both file-backed mounts and fscache cases directly.
>
> Is it worth just allocating each fs a 'stack needed' value and then
> allowing the mount if the total is low enough.
> This is equivalent to counting the recursion depth, but lets erofs only
> add (say) 0.5.
> Ideally you'd want to do static analysis to find the value to add,
> but 'inspired guesswork' is probably good enough.
That is a good alternative way but I could also use some
realistic issue such as how to evaluate stack usage under
the block layer.
And the rule exposing to userspace becomes complex if we
do in such way.
>
> Isn't there also a big difference between recursive mounts (which need
> to do read/write on the underlying file) and overlay mounts (which just
> pass the request onto the lower filesystem).
As for EROFS, we only care read since it's safe enough
but I won't speak of write paths (like sb_writers and
journal nesting for example, and I don't want to spread
the discussion since it's much unrelated to the topic).
I agree but as I said above, it makes the rule more
complex and users have no idea when it can mount and
when it cannot mount.
Anyway, I think for the current 16k kernel stack,
FILESYSTEM_MAX_STACK_DEPTH = 3 is safe enough to provide
an abundant margin for the underlay storage stack.
I have no idea how to prove it strictly but I think it's
roughly provable to show the stack usages when reaching
the real backing fs (e.g. the remaining stack size when
reaching the real backing fs) and
FILESYSTEM_MAX_STACK_DEPTH 2 was an arbitary one too.
Thanks,
Gao Xiang
>
> David
>
>>
>> Thanks,
>> Gao Xiang
>>
>>>
>>> Thanks,
>>> Amir.
>>
>>
On Thu, Jan 8, 2026 at 9:05 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> Hi Amir,
>
> On 2026/1/8 16:02, Amir Goldstein wrote:
> > On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
> ...
>
> >>>>
> >>>> Hi, Xiang
> >>>>
> >>>> In Android APEX scenario, apex images formatted as EROFS are packed in
> >>>> system.img which is also EROFS format. As a result, it will always fail
> >>>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
> >>>> is true.
> >>>> Any thoughts to handle such scenario?
> >>>
> >>> Sorry, I forgot this popular case, I think it can be simply resolved
> >>> by the following diff:
> >>>
> >>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> >>> index 0cf41ed7ced8..e93264034b5d 100644
> >>> --- a/fs/erofs/super.c
> >>> +++ b/fs/erofs/super.c
> >>> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> >>> */
> >>> if (erofs_is_fileio_mode(sbi)) {
> >>> inode = file_inode(sbi->dif0.file);
> >>> - if (inode->i_sb->s_op == &erofs_sops ||
> >>> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
> >>
> >> Sorry it should be `!inode->i_sb->s_bdev`, I've
> >> fixed it in v3 RESEND:
> >
> > A RESEND implies no changes since v3, so this is bad practice.
> >
> >> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
> >>
> >
> > Ouch! If the erofs maintainer got this condition wrong... twice...
> > Maybe better using the helper instead of open coding this non trivial check?
> >
> > if ((inode->i_sb->s_op == &erofs_sops &&
> > erofs_is_fileio_mode(EROFS_I_SB(inode)))
>
> I was thought to use that, but it excludes fscache as the
> backing fs.. so I suggest to use !s_bdev directly to
> cover both file-backed mounts and fscache cases directly.
Your fs, your decision.
But what are you actually saying?
Are you saying that reading from file backed fscache has similar
stack usage to reading from file backed erofs?
Isn't filecache doing async file IO?
If we regard fscache an extra unaccounted layer, because of all the
sync operations that it does, then we already allowed this setup a long
time ago, e.g. fscache+nfs+ovl^2.
This could be an argument to support the claim that stack usage of
file+erofs+ovl^2 should also be fine.
Thanks,
Amir.
On 2026/1/8 16:24, Amir Goldstein wrote:
> On Thu, Jan 8, 2026 at 9:05 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> Hi Amir,
>>
>> On 2026/1/8 16:02, Amir Goldstein wrote:
>>> On Thu, Jan 8, 2026 at 4:10 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>> ...
>>
>>>>>>
>>>>>> Hi, Xiang
>>>>>>
>>>>>> In Android APEX scenario, apex images formatted as EROFS are packed in
>>>>>> system.img which is also EROFS format. As a result, it will always fail
>>>>>> to do APEX-file-backed mount since `inode->i_sb->s_op == &erofs_sops'
>>>>>> is true.
>>>>>> Any thoughts to handle such scenario?
>>>>>
>>>>> Sorry, I forgot this popular case, I think it can be simply resolved
>>>>> by the following diff:
>>>>>
>>>>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>>>>> index 0cf41ed7ced8..e93264034b5d 100644
>>>>> --- a/fs/erofs/super.c
>>>>> +++ b/fs/erofs/super.c
>>>>> @@ -655,7 +655,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>>>>> */
>>>>> if (erofs_is_fileio_mode(sbi)) {
>>>>> inode = file_inode(sbi->dif0.file);
>>>>> - if (inode->i_sb->s_op == &erofs_sops ||
>>>>> + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
>>>>
>>>> Sorry it should be `!inode->i_sb->s_bdev`, I've
>>>> fixed it in v3 RESEND:
>>>
>>> A RESEND implies no changes since v3, so this is bad practice.
>>>
>>>> https://lore.kernel.org/r/20260108030709.3305545-1-hsiangkao@linux.alibaba.com
>>>>
>>>
>>> Ouch! If the erofs maintainer got this condition wrong... twice...
>>> Maybe better using the helper instead of open coding this non trivial check?
>>>
>>> if ((inode->i_sb->s_op == &erofs_sops &&
>>> erofs_is_fileio_mode(EROFS_I_SB(inode)))
>>
>> I was thought to use that, but it excludes fscache as the
>> backing fs.. so I suggest to use !s_bdev directly to
>> cover both file-backed mounts and fscache cases directly.
>
> Your fs, your decision.
>
> But what are you actually saying?
> Are you saying that reading from file backed fscache has similar
> stack usage to reading from file backed erofs?
Nope, I just don't want to be bothered with fscache in any
cases since it's already deprecated, IOWs I don't want such
setup works:
erofs (file-backed) + erofs(fscache) + ...
I just want to allow
erofs(APEX) + erofs(bdev) + ...
cases since Android users use it
in addition to
ovl^2 + erofs + ext4 / xfs /... (composefs, containerd and ...)
Does that make sense?
> Isn't filecache doing async file IO?
But as I said, AIO is not a must, it can still
fallback to sync I/Os.
>
> If we regard fscache an extra unaccounted layer, because of all the
> sync operations that it does, then we already allowed this setup a long
> time ago, e.g. fscache+nfs+ovl^2.
>
> This could be an argument to support the claim that stack usage of
> file+erofs+ovl^2 should also be fine.
Anyway, I'm not sure how many users really use that so
I won't speak of that.
Thanks,
Gao Xiang
>
> Thanks,
> Amir.
Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.
This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).
One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.
After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.
As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.
This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.
Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.
We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.
Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Sheng Yong <shengyong1@xiaomi.com>
Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v2->v3 RESEND:
- Exclude bdev-backed EROFS mounts since it will be a real terminal fs
as pointed out by Sheng Yong (APEX will rely on this);
- Preserve previous "Acked-by:" and "Tested-by:" since it's trivial.
fs/erofs/super.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 937a215f626c..5136cda5972a 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -644,14 +644,21 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
* fs contexts (including its own) due to self-controlled RO
* accesses/contexts and no side-effect changes that need to
* context save & restore so it can reuse the current thread
- * context. However, it still needs to bump `s_stack_depth` to
- * avoid kernel stack overflow from nested filesystems.
+ * context.
+ * However, we still need to prevent kernel stack overflow due
+ * to filesystem nesting: just ensure that s_stack_depth is 0
+ * to disallow mounting EROFS on stacked filesystems.
+ * Note: s_stack_depth is not incremented here for now, since
+ * EROFS is the only fs supporting file-backed mounts for now.
+ * It MUST change if another fs plans to support them, which
+ * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
*/
if (erofs_is_fileio_mode(sbi)) {
- sb->s_stack_depth =
- file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
- if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
- erofs_err(sb, "maximum fs stacking depth exceeded");
+ inode = file_inode(sbi->dif0.file);
+ if ((inode->i_sb->s_op == &erofs_sops &&
+ !inode->i_sb->s_bdev) ||
+ inode->i_sb->s_stack_depth) {
+ erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
return -ENOTBLK;
}
}
--
2.43.5
On Thu, Jan 08, 2026 at 11:07:09AM +0800, Gao Xiang wrote:
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Acked-by: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
Acked-by: Christian Brauner <brauner@kernel.org>
On 1/8/2026 11:07 AM, Gao Xiang wrote:
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Acked-by: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
Gao Xiang <hsiangkao@linux.alibaba.com> 于2026年1月8日周四 11:07写道:
>
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Acked-by: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> ---
> v2->v3 RESEND:
> - Exclude bdev-backed EROFS mounts since it will be a real terminal fs
> as pointed out by Sheng Yong (APEX will rely on this);
>
> - Preserve previous "Acked-by:" and "Tested-by:" since it's trivial.
>
> fs/erofs/super.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 937a215f626c..5136cda5972a 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -644,14 +644,21 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> * fs contexts (including its own) due to self-controlled RO
> * accesses/contexts and no side-effect changes that need to
> * context save & restore so it can reuse the current thread
> - * context. However, it still needs to bump `s_stack_depth` to
> - * avoid kernel stack overflow from nested filesystems.
> + * context.
> + * However, we still need to prevent kernel stack overflow due
> + * to filesystem nesting: just ensure that s_stack_depth is 0
> + * to disallow mounting EROFS on stacked filesystems.
> + * Note: s_stack_depth is not incremented here for now, since
> + * EROFS is the only fs supporting file-backed mounts for now.
> + * It MUST change if another fs plans to support them, which
> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
> */
> if (erofs_is_fileio_mode(sbi)) {
> - sb->s_stack_depth =
> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> - erofs_err(sb, "maximum fs stacking depth exceeded");
> + inode = file_inode(sbi->dif0.file);
> + if ((inode->i_sb->s_op == &erofs_sops &&
> + !inode->i_sb->s_bdev) ||
> + inode->i_sb->s_stack_depth) {
> + erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
Hi Xiang
Do we need to print s_stack_depth here to distinguish which specific
problem case it is?
Other LGTM based on my basic test. so
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Thanks!
> return -ENOTBLK;
> }
> }
> --
> 2.43.5
>
On 2026/1/8 17:28, Zhiguo Niu wrote:
> Gao Xiang <hsiangkao@linux.alibaba.com> 于2026年1月8日周四 11:07写道:
>>
>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
>> stack overflow when stacking an unlimited number of EROFS on top of
>> each other.
>>
>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
>> (and such setups are already used in production for quite a long time).
>>
>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
>> from 2 to 3, but proving that this is safe in general is a high bar.
>>
>> After a long discussion on GitHub issues [1] about possible solutions,
>> one conclusion is that there is no need to support nesting file-backed
>> EROFS mounts on stacked filesystems, because there is always the option
>> to use loopback devices as a fallback.
>>
>> As a quick fix for the composefs regression for this cycle, instead of
>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
>> nesting file-backed EROFS over EROFS and over filesystems with
>> `s_stack_depth` > 0.
>>
>> This works for all known file-backed mount use cases (composefs,
>> containerd, and Android APEX for some Android vendors), and the fix is
>> self-contained.
>>
>> Essentially, we are allowing one extra unaccounted fs stacking level of
>> EROFS below stacking filesystems, but EROFS can only be used in the read
>> path (i.e. overlayfs lower layers), which typically has much lower stack
>> usage than the write path.
>>
>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
>> stack usage analysis or using alternative approaches, such as splitting
>> the `s_stack_depth` limitation according to different combinations of
>> stacking.
>>
>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
>> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
>> Reported-by: Timothée Ravier <tim@siosm.fr>
>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
>> Acked-by: Amir Goldstein <amir73il@gmail.com>
>> Acked-by: Alexander Larsson <alexl@redhat.com>
>> Cc: Christian Brauner <brauner@kernel.org>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> Cc: Sheng Yong <shengyong1@xiaomi.com>
>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>> ---
>> v2->v3 RESEND:
>> - Exclude bdev-backed EROFS mounts since it will be a real terminal fs
>> as pointed out by Sheng Yong (APEX will rely on this);
>>
>> - Preserve previous "Acked-by:" and "Tested-by:" since it's trivial.
>>
>> fs/erofs/super.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index 937a215f626c..5136cda5972a 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -644,14 +644,21 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>> * fs contexts (including its own) due to self-controlled RO
>> * accesses/contexts and no side-effect changes that need to
>> * context save & restore so it can reuse the current thread
>> - * context. However, it still needs to bump `s_stack_depth` to
>> - * avoid kernel stack overflow from nested filesystems.
>> + * context.
>> + * However, we still need to prevent kernel stack overflow due
>> + * to filesystem nesting: just ensure that s_stack_depth is 0
>> + * to disallow mounting EROFS on stacked filesystems.
>> + * Note: s_stack_depth is not incremented here for now, since
>> + * EROFS is the only fs supporting file-backed mounts for now.
>> + * It MUST change if another fs plans to support them, which
>> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
>> */
>> if (erofs_is_fileio_mode(sbi)) {
>> - sb->s_stack_depth =
>> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
>> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
>> - erofs_err(sb, "maximum fs stacking depth exceeded");
>> + inode = file_inode(sbi->dif0.file);
>> + if ((inode->i_sb->s_op == &erofs_sops &&
>> + !inode->i_sb->s_bdev) ||
>> + inode->i_sb->s_stack_depth) {
>> + erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
> Hi Xiang
> Do we need to print s_stack_depth here to distinguish which specific
> problem case it is?
.. I don't want to complex it (since it's just a short-term
solution and erofs is unaccounted so s_stack_depth really
mean nothing) unless it's really needed for Android vendors?
> Other LGTM based on my basic test. so
>
> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Thanks for this too.
Thanks,
Gao Xiang
> Thanks!
>> return -ENOTBLK;
>> }
>> }
>> --
>> 2.43.5
>>
On 1/8/26 11:07, Gao Xiang wrote:
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Acked-by: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
I tested the APEX scenario on an Android phone. APEX images are
filebacked-mounted correctly. And for a stacked APEX testcase,
it reports error as expected.
thanks,
shengyong
> ---
> v2->v3 RESEND:
> - Exclude bdev-backed EROFS mounts since it will be a real terminal fs
> as pointed out by Sheng Yong (APEX will rely on this);
>
> - Preserve previous "Acked-by:" and "Tested-by:" since it's trivial.
>
> fs/erofs/super.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 937a215f626c..5136cda5972a 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -644,14 +644,21 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> * fs contexts (including its own) due to self-controlled RO
> * accesses/contexts and no side-effect changes that need to
> * context save & restore so it can reuse the current thread
> - * context. However, it still needs to bump `s_stack_depth` to
> - * avoid kernel stack overflow from nested filesystems.
> + * context.
> + * However, we still need to prevent kernel stack overflow due
> + * to filesystem nesting: just ensure that s_stack_depth is 0
> + * to disallow mounting EROFS on stacked filesystems.
> + * Note: s_stack_depth is not incremented here for now, since
> + * EROFS is the only fs supporting file-backed mounts for now.
> + * It MUST change if another fs plans to support them, which
> + * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
> */
> if (erofs_is_fileio_mode(sbi)) {
> - sb->s_stack_depth =
> - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
> - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> - erofs_err(sb, "maximum fs stacking depth exceeded");
> + inode = file_inode(sbi->dif0.file);
> + if ((inode->i_sb->s_op == &erofs_sops &&
> + !inode->i_sb->s_bdev) ||
> + inode->i_sb->s_stack_depth) {
> + erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
> return -ENOTBLK;
> }
> }
Hi Sheng,
On 2026/1/8 17:14, Sheng Yong wrote:
> On 1/8/26 11:07, Gao Xiang wrote:
>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
>> stack overflow when stacking an unlimited number of EROFS on top of
>> each other.
>>
>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
>> (and such setups are already used in production for quite a long time).
>>
>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
>> from 2 to 3, but proving that this is safe in general is a high bar.
>>
>> After a long discussion on GitHub issues [1] about possible solutions,
>> one conclusion is that there is no need to support nesting file-backed
>> EROFS mounts on stacked filesystems, because there is always the option
>> to use loopback devices as a fallback.
>>
>> As a quick fix for the composefs regression for this cycle, instead of
>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
>> nesting file-backed EROFS over EROFS and over filesystems with
>> `s_stack_depth` > 0.
>>
>> This works for all known file-backed mount use cases (composefs,
>> containerd, and Android APEX for some Android vendors), and the fix is
>> self-contained.
>>
>> Essentially, we are allowing one extra unaccounted fs stacking level of
>> EROFS below stacking filesystems, but EROFS can only be used in the read
>> path (i.e. overlayfs lower layers), which typically has much lower stack
>> usage than the write path.
>>
>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
>> stack usage analysis or using alternative approaches, such as splitting
>> the `s_stack_depth` limitation according to different combinations of
>> stacking.
>>
>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
>> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
>> Reported-by: Timothée Ravier <tim@siosm.fr>
>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
>> Acked-by: Amir Goldstein <amir73il@gmail.com>
>> Acked-by: Alexander Larsson <alexl@redhat.com>
>> Cc: Christian Brauner <brauner@kernel.org>
>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>> Cc: Sheng Yong <shengyong1@xiaomi.com>
>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>
> Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
>
> I tested the APEX scenario on an Android phone. APEX images are
> filebacked-mounted correctly.
> And for a stacked APEX testcase, it reports error as expected.
Just to make sure it's an invalid case (should not be used on
Android), yes? If so, thanks for the test on the APEX side.
Thanks,
Gao Xiang
>
> thanks,
> shengyong
On 1/8/26 17:25, Gao Xiang wrote:
> Hi Sheng,
>
> On 2026/1/8 17:14, Sheng Yong wrote:
>> On 1/8/26 11:07, Gao Xiang wrote:
>>> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
>>> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
>>> stack overflow when stacking an unlimited number of EROFS on top of
>>> each other.
>>>
>>> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
>>> (and such setups are already used in production for quite a long time).
>>>
>>> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
>>> from 2 to 3, but proving that this is safe in general is a high bar.
>>>
>>> After a long discussion on GitHub issues [1] about possible solutions,
>>> one conclusion is that there is no need to support nesting file-backed
>>> EROFS mounts on stacked filesystems, because there is always the option
>>> to use loopback devices as a fallback.
>>>
>>> As a quick fix for the composefs regression for this cycle, instead of
>>> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
>>> nesting file-backed EROFS over EROFS and over filesystems with
>>> `s_stack_depth` > 0.
>>>
>>> This works for all known file-backed mount use cases (composefs,
>>> containerd, and Android APEX for some Android vendors), and the fix is
>>> self-contained.
>>>
>>> Essentially, we are allowing one extra unaccounted fs stacking level of
>>> EROFS below stacking filesystems, but EROFS can only be used in the read
>>> path (i.e. overlayfs lower layers), which typically has much lower stack
>>> usage than the write path.
>>>
>>> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
>>> stack usage analysis or using alternative approaches, such as splitting
>>> the `s_stack_depth` limitation according to different combinations of
>>> stacking.
>>>
>>> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
>>> Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
>>> Reported-by: Timothée Ravier <tim@siosm.fr>
>>> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
>>> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
>>> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
>>> Acked-by: Amir Goldstein <amir73il@gmail.com>
>>> Acked-by: Alexander Larsson <alexl@redhat.com>
>>> Cc: Christian Brauner <brauner@kernel.org>
>>> Cc: Miklos Szeredi <mszeredi@redhat.com>
>>> Cc: Sheng Yong <shengyong1@xiaomi.com>
>>> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
>>> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>>
>> Reviewed-and-tested-by: Sheng Yong <shengyong1@xiaomi.com>
>>
>> I tested the APEX scenario on an Android phone. APEX images are
>> filebacked-mounted correctly.
>
>
>> And for a stacked APEX testcase, it reports error as expected.
>
Hi, Xiang,
> Just to make sure it's an invalid case (should not be used on
> Android), yes? If so, thanks for the test on the APEX side.
No, it's not a real use case, just an invalid case, and only
used to test the error handling path.
thanks,
shengyong
>
> Thanks,
> Gao Xiang
>
>>
>> thanks,
>> shengyong
Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
stack overflow when stacking an unlimited number of EROFS on top of
each other.
This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
(and such setups are already used in production for quite a long time).
One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
from 2 to 3, but proving that this is safe in general is a high bar.
After a long discussion on GitHub issues [1] about possible solutions,
one conclusion is that there is no need to support nesting file-backed
EROFS mounts on stacked filesystems, because there is always the option
to use loopback devices as a fallback.
As a quick fix for the composefs regression for this cycle, instead of
bumping `s_stack_depth` for file backed EROFS mounts, we disallow
nesting file-backed EROFS over EROFS and over filesystems with
`s_stack_depth` > 0.
This works for all known file-backed mount use cases (composefs,
containerd, and Android APEX for some Android vendors), and the fix is
self-contained.
Essentially, we are allowing one extra unaccounted fs stacking level of
EROFS below stacking filesystems, but EROFS can only be used in the read
path (i.e. overlayfs lower layers), which typically has much lower stack
usage than the write path.
We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
stack usage analysis or using alternative approaches, such as splitting
the `s_stack_depth` limitation according to different combinations of
stacking.
Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
Reported-and-tested-by: Dusty Mabe <dusty@dustymabe.com>
Reported-by: Timothée Ravier <tim@siosm.fr>
Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Alexander Larsson <alexl@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Sheng Yong <shengyong1@xiaomi.com>
Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
v3:
- Exclude bdev-backed EROFS mounts since it will be a real terminal fs
as pointed out by Sheng Yong (APEX will rely on this);
- Preserve previous "Acked-by:" and "Tested-by:" since it's trivial.
fs/erofs/super.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 937a215f626c..e93264034b5d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -644,14 +644,20 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
* fs contexts (including its own) due to self-controlled RO
* accesses/contexts and no side-effect changes that need to
* context save & restore so it can reuse the current thread
- * context. However, it still needs to bump `s_stack_depth` to
- * avoid kernel stack overflow from nested filesystems.
+ * context.
+ * However, we still need to prevent kernel stack overflow due
+ * to filesystem nesting: just ensure that s_stack_depth is 0
+ * to disallow mounting EROFS on stacked filesystems.
+ * Note: s_stack_depth is not incremented here for now, since
+ * EROFS is the only fs supporting file-backed mounts for now.
+ * It MUST change if another fs plans to support them, which
+ * may also require adjusting FILESYSTEM_MAX_STACK_DEPTH.
*/
if (erofs_is_fileio_mode(sbi)) {
- sb->s_stack_depth =
- file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1;
- if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
- erofs_err(sb, "maximum fs stacking depth exceeded");
+ inode = file_inode(sbi->dif0.file);
+ if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) ||
+ inode->i_sb->s_stack_depth) {
+ erofs_err(sb, "file-backed mounts cannot be applied to stacked fses");
return -ENOTBLK;
}
}
--
2.43.5
On 1/6/26 12:05 PM, Gao Xiang wrote:
> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> stack overflow when stacking an unlimited number of EROFS on top of
> each other.
>
> This fix breaks composefs mounts, which need EROFS+ovl^2 sometimes
> (and such setups are already used in production for quite a long time).
>
> One way to fix this regression is to bump FILESYSTEM_MAX_STACK_DEPTH
> from 2 to 3, but proving that this is safe in general is a high bar.
>
> After a long discussion on GitHub issues [1] about possible solutions,
> one conclusion is that there is no need to support nesting file-backed
> EROFS mounts on stacked filesystems, because there is always the option
> to use loopback devices as a fallback.
>
> As a quick fix for the composefs regression for this cycle, instead of
> bumping `s_stack_depth` for file backed EROFS mounts, we disallow
> nesting file-backed EROFS over EROFS and over filesystems with
> `s_stack_depth` > 0.
>
> This works for all known file-backed mount use cases (composefs,
> containerd, and Android APEX for some Android vendors), and the fix is
> self-contained.
>
> Essentially, we are allowing one extra unaccounted fs stacking level of
> EROFS below stacking filesystems, but EROFS can only be used in the read
> path (i.e. overlayfs lower layers), which typically has much lower stack
> usage than the write path.
>
> We can consider increasing FILESYSTEM_MAX_STACK_DEPTH later, after more
> stack usage analysis or using alternative approaches, such as splitting
> the `s_stack_depth` limitation according to different combinations of
> stacking.
>
> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> Reported-by: Dusty Mabe <dusty@dustymabe.com>
> Reported-by: Timothée Ravier <tim@siosm.fr>
> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> Reported-by: "Alekséi Naidénov" <an@digitaltide.io>
> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> Cc: Alexander Larsson <alexl@redhat.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Miklos Szeredi <mszeredi@redhat.com>
> Cc: Sheng Yong <shengyong1@xiaomi.com>
> Cc: Zhiguo Niu <niuzhiguo84@gmail.com>
> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Tested-by: Dusty Mabe <dusty@dustymabe.com>
I tested this fixed the problem we observed in our Fedora CoreOS CI documented over in
https://github.com/coreos/fedora-coreos-tracker/issues/2087
© 2016 - 2026 Red Hat, Inc.