[RFC PATCH] dm ioctl: fix erroneous EINVAL when signaled

Khazhismel Kumykov posted 1 patch 1 year, 5 months ago
There is a newer version of this series
drivers/md/dm-ioctl.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
[RFC PATCH] dm ioctl: fix erroneous EINVAL when signaled
Posted by Khazhismel Kumykov 1 year, 5 months ago
do_resume when loading a new map first calls dm_suspend, which could
silently fail. When we proceeded to dm_swap_table, we would bail out
with EINVAL. Instead, restore new_map and return ERESTARTSYS when
signaled.

Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 drivers/md/dm-ioctl.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)


RFC as I am rather unfamiliar with the locking semantics here - whether
we do need to re-grab hash_lock to write to an hc we previously grabbed,
and whether the device becoming unhashed while we're in this function is
really something that needs to be checked. However, this patch does fix
the issue we were seeing - we'd get EINVAL when thread in ioctl was
signaled.


diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index c2c07bfa6471..b81650c6d096 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1181,8 +1181,22 @@ static int do_resume(struct dm_ioctl *param)
 			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
 		if (param->flags & DM_NOFLUSH_FLAG)
 			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
-		if (!dm_suspended_md(md))
-			dm_suspend(md, suspend_flags);
+		if (!dm_suspended_md(md)) {
+			r = dm_suspend(md, suspend_flags);
+			if (r == -EINTR)
+				r = -ERESTARTSYS;
+			if (r) {
+				down_write(&_hash_lock);
+				hc = dm_get_mdptr(md);
+				if (!hc)
+					r = -ENXIO;
+				else
+					hc->new_map = new_map;
+				up_write(&_hash_lock);
+				dm_put(md);
+				return r;
+			}
+		}
 
 		old_size = dm_get_size(md);
 		old_map = dm_swap_table(md, new_map);
-- 
2.45.2.993.g49e7a77208-goog
Re: [RFC PATCH] dm ioctl: fix erroneous EINVAL when signaled
Posted by Mikulas Patocka 1 year, 5 months ago
Hi

I am wondering why does do_resume need to call dm_suspend at all. Does 
anyone here remember why is this code path needed?

Mikulas



On Wed, 17 Jul 2024, Khazhismel Kumykov wrote:

> do_resume when loading a new map first calls dm_suspend, which could
> silently fail. When we proceeded to dm_swap_table, we would bail out
> with EINVAL. Instead, restore new_map and return ERESTARTSYS when
> signaled.
> 
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> ---
>  drivers/md/dm-ioctl.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> 
> RFC as I am rather unfamiliar with the locking semantics here - whether
> we do need to re-grab hash_lock to write to an hc we previously grabbed,
> and whether the device becoming unhashed while we're in this function is
> really something that needs to be checked. However, this patch does fix
> the issue we were seeing - we'd get EINVAL when thread in ioctl was
> signaled.
> 
> 
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index c2c07bfa6471..b81650c6d096 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1181,8 +1181,22 @@ static int do_resume(struct dm_ioctl *param)
>  			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
>  		if (param->flags & DM_NOFLUSH_FLAG)
>  			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
> -		if (!dm_suspended_md(md))
> -			dm_suspend(md, suspend_flags);
> +		if (!dm_suspended_md(md)) {
> +			r = dm_suspend(md, suspend_flags);
> +			if (r == -EINTR)
> +				r = -ERESTARTSYS;
> +			if (r) {
> +				down_write(&_hash_lock);
> +				hc = dm_get_mdptr(md);
> +				if (!hc)
> +					r = -ENXIO;
> +				else
> +					hc->new_map = new_map;
> +				up_write(&_hash_lock);
> +				dm_put(md);
> +				return r;
> +			}
> +		}
>  
>  		old_size = dm_get_size(md);
>  		old_map = dm_swap_table(md, new_map);
> -- 
> 2.45.2.993.g49e7a77208-goog
>
Re: [RFC PATCH] dm ioctl: fix erroneous EINVAL when signaled
Posted by Khazhy Kumykov 1 year, 5 months ago
On Wed, Jul 17, 2024 at 12:45 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Hi
>
> I am wondering why does do_resume need to call dm_suspend at all. Does
> anyone here remember why is this code path needed?

In our case, we have a sequence with load_table followed by a resume,
with no suspend first. The resume path suspends if needed, swaps
tables, then resumes. Removing the suspend here would break existing
userspace, I'd imagine. It seems like minimizing the suspended time
would also be a nice benefit.

>
> Mikulas
>
>
>
> On Wed, 17 Jul 2024, Khazhismel Kumykov wrote:
>
> > do_resume when loading a new map first calls dm_suspend, which could
> > silently fail. When we proceeded to dm_swap_table, we would bail out
> > with EINVAL. Instead, restore new_map and return ERESTARTSYS when
> > signaled.
> >
> > Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> > ---
> >  drivers/md/dm-ioctl.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> >
> >
> > RFC as I am rather unfamiliar with the locking semantics here - whether
> > we do need to re-grab hash_lock to write to an hc we previously grabbed,
> > and whether the device becoming unhashed while we're in this function is
> > really something that needs to be checked. However, this patch does fix
> > the issue we were seeing - we'd get EINVAL when thread in ioctl was
> > signaled.
> >
> >
> > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> > index c2c07bfa6471..b81650c6d096 100644
> > --- a/drivers/md/dm-ioctl.c
> > +++ b/drivers/md/dm-ioctl.c
> > @@ -1181,8 +1181,22 @@ static int do_resume(struct dm_ioctl *param)
> >                       suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
> >               if (param->flags & DM_NOFLUSH_FLAG)
> >                       suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
> > -             if (!dm_suspended_md(md))
> > -                     dm_suspend(md, suspend_flags);
> > +             if (!dm_suspended_md(md)) {
> > +                     r = dm_suspend(md, suspend_flags);
> > +                     if (r == -EINTR)
> > +                             r = -ERESTARTSYS;
> > +                     if (r) {
> > +                             down_write(&_hash_lock);
> > +                             hc = dm_get_mdptr(md);
> > +                             if (!hc)
> > +                                     r = -ENXIO;
> > +                             else
> > +                                     hc->new_map = new_map;
Oh - I probably want to check if hc->new_map has become non-null in
the meantime and if so... pick a winner then put the loser? Presumably
the newest map should win if that happens / is possible. although the
concept seems suspect to me.
> > +                             up_write(&_hash_lock);
> > +                             dm_put(md);
> > +                             return r;
> > +                     }
> > +             }
> >
> >               old_size = dm_get_size(md);
> >               old_map = dm_swap_table(md, new_map);
> > --
> > 2.45.2.993.g49e7a77208-goog
> >
>
Re: [RFC PATCH] dm ioctl: fix erroneous EINVAL when signaled
Posted by Zdenek Kabelac 1 year, 4 months ago
Dne 17. 07. 24 v 21:52 Khazhy Kumykov napsal(a):
> On Wed, Jul 17, 2024 at 12:45 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>> Hi
>>
>> I am wondering why does do_resume need to call dm_suspend at all. Does
>> anyone here remember why is this code path needed?
> In our case, we have a sequence with load_table followed by a resume,
> with no suspend first. The resume path suspends if needed, swaps
> tables, then resumes. Removing the suspend here would break existing
> userspace, I'd imagine. It seems like minimizing the suspended time
> would also be a nice benefit.


lvm2 maintainer POV

Automatic 'suspend' for resume is a kernel 'feature' that should not be 
normally used from the userspace. Userspace is supposed to call    'suspend'  
- handle error cases - eventually drop preloaded table and resume existing 
table that should work.

If userspace is using ONLY  'resume'  without calling suspend upfront - there 
are some unsolvable error cases.


So no -  'minimizing'  suspend time is NOT the main reason here. The only 
valid reason to use it is basically if you are  admin and you need to reload 
table for a device you are running from - in this case calling 'dmsetup 
suspend'  might leave your system in 'blocked' state since your rootfs will be 
'frozen/suspend' and you would have no chance to  call 'dmsetup resume'.

lvm2 app is locking itself in the RAM in this critical section so it can 
proceed with regular sequence:    'write metadata - preload DM - suspend DM  - 
commit metadata - resume DM'  which basicall all userland apps should be using.

Regards


Zdenek

[RFC PATCH v2] dm ioctl: fix erroneous EINVAL when signaled
Posted by Khazhismel Kumykov 1 year, 5 months ago
do_resume when loading a new map first calls dm_suspend, which could
silently fail. When we proceeded to dm_swap_table, we would bail out
with EINVAL. Instead, attempt to restore new_map and return ERESTARTSYS
when signaled.

Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 drivers/md/dm-ioctl.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

v2: don't leak new_map if we can't assign it back to hc.

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index c2c07bfa6471..0591455ad63c 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1181,8 +1181,27 @@ static int do_resume(struct dm_ioctl *param)
 			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
 		if (param->flags & DM_NOFLUSH_FLAG)
 			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
-		if (!dm_suspended_md(md))
-			dm_suspend(md, suspend_flags);
+		if (!dm_suspended_md(md)) {
+			r = dm_suspend(md, suspend_flags);
+			if (r == -EINTR)
+				r = -ERESTARTSYS;
+			if (r) {
+				down_write(&_hash_lock);
+				hc = dm_get_mdptr(md);
+				if (!hc)
+					r = -ENXIO;
+				if (hc && !hc->new_map) {
+					hc->new_map = new_map;
+					up_write(&_hash_lock);
+				} else {
+					up_write(&_hash_lock);
+					dm_sync_table(md);
+					dm_table_destroy(new_map);
+				}
+				dm_put(md);
+				return r;
+			}
+		}
 
 		old_size = dm_get_size(md);
 		old_map = dm_swap_table(md, new_map);
-- 
2.45.2.993.g49e7a77208-goog
Re: [RFC PATCH v2] dm ioctl: fix erroneous EINVAL when signaled
Posted by Mikulas Patocka 1 year, 4 months ago

On Wed, 17 Jul 2024, Khazhismel Kumykov wrote:

> do_resume when loading a new map first calls dm_suspend, which could
> silently fail. When we proceeded to dm_swap_table, we would bail out
> with EINVAL. Instead, attempt to restore new_map and return ERESTARTSYS
> when signaled.
> 
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> ---
>  drivers/md/dm-ioctl.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> v2: don't leak new_map if we can't assign it back to hc.
> 
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index c2c07bfa6471..0591455ad63c 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1181,8 +1181,27 @@ static int do_resume(struct dm_ioctl *param)
>  			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
>  		if (param->flags & DM_NOFLUSH_FLAG)
>  			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
> -		if (!dm_suspended_md(md))
> -			dm_suspend(md, suspend_flags);
> +		if (!dm_suspended_md(md)) {
> +			r = dm_suspend(md, suspend_flags);
> +			if (r == -EINTR)
> +				r = -ERESTARTSYS;

I'd like to ask why the "EINTR -> ERESTARTSYS" conversion is here and why 
it isn't in dm_suspend?

What do libdevmapper+lvm maintainers think about it? Does lvm hadle EINTR 
by restarting the ioctl syscall? Should we return ERESTARTSYS when suspend 
is interrupted?

Mikulas
Re: [RFC PATCH v2] dm ioctl: fix erroneous EINVAL when signaled
Posted by Zdenek Kabelac 1 year, 4 months ago
Dne 23. 07. 24 v 14:51 Mikulas Patocka napsal(a):
> 
> 
> On Wed, 17 Jul 2024, Khazhismel Kumykov wrote:
> 
>> do_resume when loading a new map first calls dm_suspend, which could
>> silently fail. When we proceeded to dm_swap_table, we would bail out
>> with EINVAL. Instead, attempt to restore new_map and return ERESTARTSYS
>> when signaled.
>>
>> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
>> ---
>>   drivers/md/dm-ioctl.c | 23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> v2: don't leak new_map if we can't assign it back to hc.
>>
>> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
>> index c2c07bfa6471..0591455ad63c 100644
>> --- a/drivers/md/dm-ioctl.c
>> +++ b/drivers/md/dm-ioctl.c
>> @@ -1181,8 +1181,27 @@ static int do_resume(struct dm_ioctl *param)
>>   			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
>>   		if (param->flags & DM_NOFLUSH_FLAG)
>>   			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
>> -		if (!dm_suspended_md(md))
>> -			dm_suspend(md, suspend_flags);
>> +		if (!dm_suspended_md(md)) {
>> +			r = dm_suspend(md, suspend_flags);
>> +			if (r == -EINTR)
>> +				r = -ERESTARTSYS;
> 
> I'd like to ask why the "EINTR -> ERESTARTSYS" conversion is here and why
> it isn't in dm_suspend?
> 
> What do libdevmapper+lvm maintainers think about it? Does lvm hadle EINTR
> by restarting the ioctl syscall? Should we return ERESTARTSYS when suspend
> is interrupted?

In general - with suspend failures - we are just stopping whole operation - 
and restoring previous state - so user can run operation again.

There is no special check for exact reason of ioctl failure.

Regards

Zdenek
Re: [RFC PATCH v2] dm ioctl: fix erroneous EINVAL when signaled
Posted by Khazhy Kumykov 1 year, 4 months ago
On Tue, Jul 23, 2024 at 6:11 AM Zdenek Kabelac <zdenek.kabelac@gmail.com> wrote:
>
> Dne 23. 07. 24 v 14:51 Mikulas Patocka napsal(a):
> >
> >
> > On Wed, 17 Jul 2024, Khazhismel Kumykov wrote:
> >
> >> do_resume when loading a new map first calls dm_suspend, which could
> >> silently fail. When we proceeded to dm_swap_table, we would bail out
> >> with EINVAL. Instead, attempt to restore new_map and return ERESTARTSYS
> >> when signaled.
> >>
> >> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> >> ---
> >>   drivers/md/dm-ioctl.c | 23 +++++++++++++++++++++--
> >>   1 file changed, 21 insertions(+), 2 deletions(-)
> >>
> >> v2: don't leak new_map if we can't assign it back to hc.
> >>
> >> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> >> index c2c07bfa6471..0591455ad63c 100644
> >> --- a/drivers/md/dm-ioctl.c
> >> +++ b/drivers/md/dm-ioctl.c
> >> @@ -1181,8 +1181,27 @@ static int do_resume(struct dm_ioctl *param)
> >>                      suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
> >>              if (param->flags & DM_NOFLUSH_FLAG)
> >>                      suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
> >> -            if (!dm_suspended_md(md))
> >> -                    dm_suspend(md, suspend_flags);
> >> +            if (!dm_suspended_md(md)) {
> >> +                    r = dm_suspend(md, suspend_flags);
> >> +                    if (r == -EINTR)
> >> +                            r = -ERESTARTSYS;
> >
> > I'd like to ask why the "EINTR -> ERESTARTSYS" conversion is here and why
> > it isn't in dm_suspend?
I proposed ERESTARTSYS here since the act of waiting for the device to
suspend successfully seems "restartable" - I think the same reasoning
would apply to do_suspend.
> >
> > What do libdevmapper+lvm maintainers think about it? Does lvm hadle EINTR
> > by restarting the ioctl syscall? Should we return ERESTARTSYS when suspend
> > is interrupted?
>
> In general - with suspend failures - we are just stopping whole operation -
> and restoring previous state - so user can run operation again.
>
> There is no special check for exact reason of ioctl failure.
>
> Regards
>
> Zdenek
>
Re: [RFC PATCH v2] dm ioctl: fix erroneous EINVAL when signaled
Posted by Mike Snitzer 1 year, 5 months ago
On Wed, Jul 17, 2024 at 04:18:33PM -0700, Khazhismel Kumykov wrote:
> do_resume when loading a new map first calls dm_suspend, which could
> silently fail. When we proceeded to dm_swap_table, we would bail out
> with EINVAL. Instead, attempt to restore new_map and return ERESTARTSYS
> when signaled.
> 
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> ---
>  drivers/md/dm-ioctl.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> v2: don't leak new_map if we can't assign it back to hc.
> 
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index c2c07bfa6471..0591455ad63c 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1181,8 +1181,27 @@ static int do_resume(struct dm_ioctl *param)
>  			suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
>  		if (param->flags & DM_NOFLUSH_FLAG)
>  			suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
> -		if (!dm_suspended_md(md))
> -			dm_suspend(md, suspend_flags);
> +		if (!dm_suspended_md(md)) {
> +			r = dm_suspend(md, suspend_flags);
> +			if (r == -EINTR)
> +				r = -ERESTARTSYS;
> +			if (r) {
> +				down_write(&_hash_lock);
> +				hc = dm_get_mdptr(md);
> +				if (!hc)
> +					r = -ENXIO;
> +				if (hc && !hc->new_map) {
> +					hc->new_map = new_map;
> +					up_write(&_hash_lock);
> +				} else {
> +					up_write(&_hash_lock);
> +					dm_sync_table(md);
> +					dm_table_destroy(new_map);
> +				}
> +				dm_put(md);
> +				return r;
> +			}
> +		}
>  
>  		old_size = dm_get_size(md);
>  		old_map = dm_swap_table(md, new_map);
> -- 
> 2.45.2.993.g49e7a77208-goog
> 
> 

Thanks for the patch.  The header could use more context for how this
issue has caused problems in practice (you touched on that in reply to
Mikulas for v1).

But I will review this closely starting the week of July 29.  This is
a very fundamental codepath for DM so needs extended review.

Mike