[PATCH v1 01/34] virDevMapperGetTargetsImpl: Close /dev/mapper/control in the end

Michal Privoznik posted 34 patches 5 years, 6 months ago
[PATCH v1 01/34] virDevMapperGetTargetsImpl: Close /dev/mapper/control in the end
Posted by Michal Privoznik 5 years, 6 months ago
When building domain's private /dev in a namespace, libdevmapper
is consulted for getting full dependency tree of domain's disks.
The reason is that for a multipath devices all dependent devices
must be created in the namespace and allowed in CGroups.

However, this approach is very fragile as building of namespace
happens in the forked off child process, after mass close of FDs
and just before dropping privileges and execing QEMU. And it so
happens that when calling libdevmapper APIs, one of them opens
/dev/mapper/control and saves the FD into a global variable. The
FD is kept open until the lib is unlinked or dm_lib_release() is
called explicitly. We are doing neither.

This is not a problem when calling the function from libvirtd
(when setting up CGroups), but it is a problem when called from
the pre-exec hook because we leak the FD into QEMU.

Fixes: a30078cb832646177defd256e77c632905f1e6d0
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1858260

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/util/virdevmapper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
index 40a82285f9..1c216fb6c1 100644
--- a/src/util/virdevmapper.c
+++ b/src/util/virdevmapper.c
@@ -156,6 +156,7 @@ virDevMapperGetTargetsImpl(const char *path,
     virStringListFree(recursiveDevPaths);
     virStringListFree(devPaths);
     dm_task_destroy(dmt);
+    dm_lib_release();
     return ret;
 }
 
-- 
2.26.2

Re: [PATCH v1 01/34] virDevMapperGetTargetsImpl: Close /dev/mapper/control in the end
Posted by Daniel P. Berrangé 5 years, 6 months ago
On Wed, Jul 22, 2020 at 11:39:55AM +0200, Michal Privoznik wrote:
> When building domain's private /dev in a namespace, libdevmapper
> is consulted for getting full dependency tree of domain's disks.
> The reason is that for a multipath devices all dependent devices
> must be created in the namespace and allowed in CGroups.
> 
> However, this approach is very fragile as building of namespace
> happens in the forked off child process, after mass close of FDs
> and just before dropping privileges and execing QEMU. And it so
> happens that when calling libdevmapper APIs, one of them opens
> /dev/mapper/control and saves the FD into a global variable. The
> FD is kept open until the lib is unlinked or dm_lib_release() is
> called explicitly. We are doing neither.
> 
> This is not a problem when calling the function from libvirtd
> (when setting up CGroups), but it is a problem when called from
> the pre-exec hook because we leak the FD into QEMU.
> 
> Fixes: a30078cb832646177defd256e77c632905f1e6d0
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1858260
> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/util/virdevmapper.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
> index 40a82285f9..1c216fb6c1 100644
> --- a/src/util/virdevmapper.c
> +++ b/src/util/virdevmapper.c
> @@ -156,6 +156,7 @@ virDevMapperGetTargetsImpl(const char *path,
>      virStringListFree(recursiveDevPaths);
>      virStringListFree(devPaths);
>      dm_task_destroy(dmt);
> +    dm_lib_release();
>      return ret;
>  }

Hmm, this function isn't threadsafe though, so I'm kind of worried about
us breaking parallel callers.

For that matter dm_task_run isn't thread safe when it opens the
FD in the first place either AFAICT.

This libdevice-mapper.so looks like a bit of a disaster in general, as
I can't see mutexes acquired anywhere in the public APIs and it has a
load of static global variables including this FD :-(

We could put mutex locking around our own virDevMapperGetTargetsImpl
if we blindly assume nothing else we call may secretly be using
libdevice-mapper too.

Makes me wonder though if there's any way we can obtain the info we need
without touching libdevice-mapper.so at all.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v1 01/34] virDevMapperGetTargetsImpl: Close /dev/mapper/control in the end
Posted by Michal Privoznik 5 years, 6 months ago
On 7/22/20 12:46 PM, Daniel P. Berrangé wrote:
> On Wed, Jul 22, 2020 at 11:39:55AM +0200, Michal Privoznik wrote:
>> When building domain's private /dev in a namespace, libdevmapper
>> is consulted for getting full dependency tree of domain's disks.
>> The reason is that for a multipath devices all dependent devices
>> must be created in the namespace and allowed in CGroups.
>>
>> However, this approach is very fragile as building of namespace
>> happens in the forked off child process, after mass close of FDs
>> and just before dropping privileges and execing QEMU. And it so
>> happens that when calling libdevmapper APIs, one of them opens
>> /dev/mapper/control and saves the FD into a global variable. The
>> FD is kept open until the lib is unlinked or dm_lib_release() is
>> called explicitly. We are doing neither.
>>
>> This is not a problem when calling the function from libvirtd
>> (when setting up CGroups), but it is a problem when called from
>> the pre-exec hook because we leak the FD into QEMU.
>>
>> Fixes: a30078cb832646177defd256e77c632905f1e6d0
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1858260
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
>>   src/util/virdevmapper.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
>> index 40a82285f9..1c216fb6c1 100644
>> --- a/src/util/virdevmapper.c
>> +++ b/src/util/virdevmapper.c
>> @@ -156,6 +156,7 @@ virDevMapperGetTargetsImpl(const char *path,
>>       virStringListFree(recursiveDevPaths);
>>       virStringListFree(devPaths);
>>       dm_task_destroy(dmt);
>> +    dm_lib_release();
>>       return ret;
>>   }
> 
> Hmm, this function isn't threadsafe though, so I'm kind of worried about
> us breaking parallel callers.
> 
> For that matter dm_task_run isn't thread safe when it opens the
> FD in the first place either AFAICT.
> 
> This libdevice-mapper.so looks like a bit of a disaster in general, as
> I can't see mutexes acquired anywhere in the public APIs and it has a
> load of static global variables including this FD :-(
> 
> We could put mutex locking around our own virDevMapperGetTargetsImpl
> if we blindly assume nothing else we call may secretly be using
> libdevice-mapper too.
> 
> Makes me wonder though if there's any way we can obtain the info we need
> without touching libdevice-mapper.so at all.

This is sort of resolved in the rest of the series. If we drop this 
patch then after patch 17 which moves populating /dev with domain disks 
into libvirtd rather than forked child, we don't need to call 
dm_lib_release(). But opening will still be unsafe, yes.

Well, looks like devmapper is doing a single ioctl (if I don't count 
VERSION):

ioctl(3</dev/mapper/control>, DM_VERSION, {version=4.0.0, 
data_size=16384, flags=DM_EXISTS_FLAG} => {version=4.42.0, 
data_size=16384, flags=DM_EXISTS_FLAG}) = 0
ioctl(3</dev/mapper/control>, DM_TABLE_DEPS, {version=4.0.0, 
data_size=16384, data_start=312, name="myusb", flags=DM_EXISTS_FLAG} => 
{version=4.42.0, data_size=328, data_start=312, dev=makedev(0xfd, 0x1), 
name="myusb", uuid="CRYPT-LUKS1-12fa3b2b646d43eb82ffd8f671515f93-myusb", 
target_count=1, open_count=0, event_nr=0, 
flags=DM_EXISTS_FLAG|DM_ACTIVE_PRESENT_FLAG, ...}) = 0

so maybe we can use ioctl() ourselves and drop libdevmapper completely?

Michal

Re: [PATCH v1 01/34] virDevMapperGetTargetsImpl: Close /dev/mapper/control in the end
Posted by Daniel P. Berrangé 5 years, 6 months ago
On Wed, Jul 22, 2020 at 02:14:38PM +0200, Michal Privoznik wrote:
> On 7/22/20 12:46 PM, Daniel P. Berrangé wrote:
> > On Wed, Jul 22, 2020 at 11:39:55AM +0200, Michal Privoznik wrote:
> > > When building domain's private /dev in a namespace, libdevmapper
> > > is consulted for getting full dependency tree of domain's disks.
> > > The reason is that for a multipath devices all dependent devices
> > > must be created in the namespace and allowed in CGroups.
> > > 
> > > However, this approach is very fragile as building of namespace
> > > happens in the forked off child process, after mass close of FDs
> > > and just before dropping privileges and execing QEMU. And it so
> > > happens that when calling libdevmapper APIs, one of them opens
> > > /dev/mapper/control and saves the FD into a global variable. The
> > > FD is kept open until the lib is unlinked or dm_lib_release() is
> > > called explicitly. We are doing neither.
> > > 
> > > This is not a problem when calling the function from libvirtd
> > > (when setting up CGroups), but it is a problem when called from
> > > the pre-exec hook because we leak the FD into QEMU.
> > > 
> > > Fixes: a30078cb832646177defd256e77c632905f1e6d0
> > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1858260
> > > 
> > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> > > ---
> > >   src/util/virdevmapper.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c
> > > index 40a82285f9..1c216fb6c1 100644
> > > --- a/src/util/virdevmapper.c
> > > +++ b/src/util/virdevmapper.c
> > > @@ -156,6 +156,7 @@ virDevMapperGetTargetsImpl(const char *path,
> > >       virStringListFree(recursiveDevPaths);
> > >       virStringListFree(devPaths);
> > >       dm_task_destroy(dmt);
> > > +    dm_lib_release();
> > >       return ret;
> > >   }
> > 
> > Hmm, this function isn't threadsafe though, so I'm kind of worried about
> > us breaking parallel callers.
> > 
> > For that matter dm_task_run isn't thread safe when it opens the
> > FD in the first place either AFAICT.
> > 
> > This libdevice-mapper.so looks like a bit of a disaster in general, as
> > I can't see mutexes acquired anywhere in the public APIs and it has a
> > load of static global variables including this FD :-(
> > 
> > We could put mutex locking around our own virDevMapperGetTargetsImpl
> > if we blindly assume nothing else we call may secretly be using
> > libdevice-mapper too.
> > 
> > Makes me wonder though if there's any way we can obtain the info we need
> > without touching libdevice-mapper.so at all.
> 
> This is sort of resolved in the rest of the series. If we drop this patch
> then after patch 17 which moves populating /dev with domain disks into
> libvirtd rather than forked child, we don't need to call dm_lib_release().
> But opening will still be unsafe, yes.

Yeah, I'm a little concerned that the full patch series is too big
for downstreams to be able to backport, so most likely they'll do a
targetted fix of just this patch.

> Well, looks like devmapper is doing a single ioctl (if I don't count
> VERSION):
> 
> ioctl(3</dev/mapper/control>, DM_VERSION, {version=4.0.0, data_size=16384,
> flags=DM_EXISTS_FLAG} => {version=4.42.0, data_size=16384,
> flags=DM_EXISTS_FLAG}) = 0
> ioctl(3</dev/mapper/control>, DM_TABLE_DEPS, {version=4.0.0,
> data_size=16384, data_start=312, name="myusb", flags=DM_EXISTS_FLAG} =>
> {version=4.42.0, data_size=328, data_start=312, dev=makedev(0xfd, 0x1),
> name="myusb", uuid="CRYPT-LUKS1-12fa3b2b646d43eb82ffd8f671515f93-myusb",
> target_count=1, open_count=0, event_nr=0,
> flags=DM_EXISTS_FLAG|DM_ACTIVE_PRESENT_FLAG, ...}) = 0
> 
> so maybe we can use ioctl() ourselves and drop libdevmapper completely?

Given that the kernel promises stable ABI, this should be OK, since we're
only after one very small piece of information.

The dm_task_run method seems quite simple if we exclude the bits that
are not relevant for DM_TABLE_DEPS.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|