[libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions

Laine Stump posted 5 patches 3 years, 1 month ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20210212055407.685776-1-laine@redhat.com
src/conf/device_conf.c   | 15 +++-----
src/conf/device_conf.h   |  2 +-
src/conf/domain_conf.c   | 81 ++++++++++++++++++++--------------------
src/conf/domain_conf.h   |  1 -
src/libvirt_private.syms |  1 -
5 files changed, 47 insertions(+), 53 deletions(-)
[libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Laine Stump 3 years, 1 month ago
I've looked at a few of these, and one thing I've found is that very
often we have a function called somethingSomethingClear(), and:

1) The only places it is ever called will immediately free the memory
of the object as soon as they clear it.

and very possibly

2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call

    bobLoblawFree(def->bobloblaw)

and then don't actually set def->bobloblaw to NULL - so the functions
aren't actually "Clearing", they're "Freeing and then clearing a few
things, but not everything".

So I'm wondering if it is worthwhile to

A) audit all the *Clear() functions and rename the functions that
don't actually need to clear the contents to be, e.g.
bobLobLawFreeContents(), while also replacing VIR_FREE with g_free().
(this is what I've done in these 5 patches)

or if we should

B) just do the wholesale approach and (as danpb suggested last week)
change all VIR_FREE in *Clear() functions to g_free(), and put a
"memset(obj, 0, sizeof(*obj))" at the end of each function, ignoring
whether or not we actually need that.

(B) would obviously be much faster to get done, and simpler for
everyone to keep track of what's happening, but of course it is less
efficient. Very likely this efficiency is completely meaningless in
the large scheme (even in the medium or small scheme).

(I actually like the idea of 0'ing out *everything*[*] when we're done
with it, extra cycles be damned! I think of the two choices above,
after going through this exercise, I'd say (B) is a more reasonable
choice, but since I tried this, I thought I'd send it and see if
anyone else has an opinion (or different idea)

[*](including all those places I've replaced VIR_FREE with g_free in
the name of "progress?")

Laine Stump (5):
  conf: rename and narrow scope of virDomainHostdevDefClear()
  conf: rename virDomainHostdevSubsysSCSIClear
  conf: replace pointless VIR_FREEs with g_free in
    virDomainVideoDefClear()
  conf: don't call virDomainDeviceInfoClear from
    virDomainDeviceInfoParseXML
  conf: replace virDomainDeviceInfoClear with
    virDomainDeviceInfoFreeContents

 src/conf/device_conf.c   | 15 +++-----
 src/conf/device_conf.h   |  2 +-
 src/conf/domain_conf.c   | 81 ++++++++++++++++++++--------------------
 src/conf/domain_conf.h   |  1 -
 src/libvirt_private.syms |  1 -
 5 files changed, 47 insertions(+), 53 deletions(-)

-- 
2.29.2

Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Michal Privoznik 3 years, 1 month ago
On 2/12/21 6:54 AM, Laine Stump wrote:
> I've looked at a few of these, and one thing I've found is that very
> often we have a function called somethingSomethingClear(), and:
> 
> 1) The only places it is ever called will immediately free the memory
> of the object as soon as they clear it.
> 
> and very possibly
> 
> 2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
> 
>      bobLoblawFree(def->bobloblaw)
> 
> and then don't actually set def->bobloblaw to NULL - so the functions
> aren't actually "Clearing", they're "Freeing and then clearing a few
> things, but not everything".
> 
> So I'm wondering if it is worthwhile to
> 
> A) audit all the *Clear() functions and rename the functions that
> don't actually need to clear the contents to be, e.g.
> bobLobLawFreeContents(), while also replacing VIR_FREE with g_free().
> (this is what I've done in these 5 patches)
> 
> or if we should
> 
> B) just do the wholesale approach and (as danpb suggested last week)
> change all VIR_FREE in *Clear() functions to g_free(), and put a
> "memset(obj, 0, sizeof(*obj))" at the end of each function, ignoring
> whether or not we actually need that.
> 
> (B) would obviously be much faster to get done, and simpler for
> everyone to keep track of what's happening, but of course it is less
> efficient. Very likely this efficiency is completely meaningless in
> the large scheme (even in the medium or small scheme).
> 
> (I actually like the idea of 0'ing out *everything*[*] when we're done
> with it, extra cycles be damned! I think of the two choices above,
> after going through this exercise, I'd say (B) is a more reasonable
> choice, but since I tried this, I thought I'd send it and see if
> anyone else has an opinion (or different idea)
> 
> [*](including all those places I've replaced VIR_FREE with g_free in
> the name of "progress?")
> 
> Laine Stump (5):
>    conf: rename and narrow scope of virDomainHostdevDefClear()
>    conf: rename virDomainHostdevSubsysSCSIClear
>    conf: replace pointless VIR_FREEs with g_free in
>      virDomainVideoDefClear()
>    conf: don't call virDomainDeviceInfoClear from
>      virDomainDeviceInfoParseXML
>    conf: replace virDomainDeviceInfoClear with
>      virDomainDeviceInfoFreeContents
> 
>   src/conf/device_conf.c   | 15 +++-----
>   src/conf/device_conf.h   |  2 +-
>   src/conf/domain_conf.c   | 81 ++++++++++++++++++++--------------------
>   src/conf/domain_conf.h   |  1 -
>   src/libvirt_private.syms |  1 -
>   5 files changed, 47 insertions(+), 53 deletions(-)
> 

I don't like change and thus prefer keeping *Clear() with explicit 
memset(0) - if needed, but don't want to stop progress.

Michal

Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Martin Kletzander 3 years, 1 month ago
On Fri, Feb 12, 2021 at 12:54:02AM -0500, Laine Stump wrote:
>I've looked at a few of these, and one thing I've found is that very
>often we have a function called somethingSomethingClear(), and:
>
>1) The only places it is ever called will immediately free the memory
>of the object as soon as they clear it.
>
>and very possibly
>
>2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
>
>    bobLoblawFree(def->bobloblaw)
>
>and then don't actually set def->bobloblaw to NULL - so the functions
>aren't actually "Clearing", they're "Freeing and then clearing a few
>things, but not everything".
>

One thing I am wondering is whether this is really only used where it makes
sense.  As far as I understand, and please correct me if I am way off, the
purpose of the Clear functions is to:

  a) provide a way to remove everything from a structure that the current
     function cannot recreate (there is a pointer to it somewhere else which
     would not be updated) and

  b) provide a way to reset a structure so that it can be filled again without
     needless reallocation.

I think (b) is obviously pointless, especially lately, so the only reasonable
usage would be for the scenario (a).  However, I think I remember this also
being used in places where it would be perfectly fine to free the variable and
recreate it.  Maybe it could ease up the decision, at least by eliminating some
of the code, if my hunch is correct.

In my quick search I only found virDomainVideoDefClear to be used in this manner
and I am not convinced that it is worth having this extra function with extra
memset().  Just food for thought.

>So I'm wondering if it is worthwhile to
>
>A) audit all the *Clear() functions and rename the functions that
>don't actually need to clear the contents to be, e.g.
>bobLobLawFreeContents(), while also replacing VIR_FREE with g_free().
>(this is what I've done in these 5 patches)
>
>or if we should
>
>B) just do the wholesale approach and (as danpb suggested last week)
>change all VIR_FREE in *Clear() functions to g_free(), and put a
>"memset(obj, 0, sizeof(*obj))" at the end of each function, ignoring
>whether or not we actually need that.
>
>(B) would obviously be much faster to get done, and simpler for
>everyone to keep track of what's happening, but of course it is less
>efficient. Very likely this efficiency is completely meaningless in
>the large scheme (even in the medium or small scheme).
>
>(I actually like the idea of 0'ing out *everything*[*] when we're done
>with it, extra cycles be damned! I think of the two choices above,
>after going through this exercise, I'd say (B) is a more reasonable
>choice, but since I tried this, I thought I'd send it and see if
>anyone else has an opinion (or different idea)
>
>[*](including all those places I've replaced VIR_FREE with g_free in
>the name of "progress?")
>
>Laine Stump (5):
>  conf: rename and narrow scope of virDomainHostdevDefClear()
>  conf: rename virDomainHostdevSubsysSCSIClear
>  conf: replace pointless VIR_FREEs with g_free in
>    virDomainVideoDefClear()
>  conf: don't call virDomainDeviceInfoClear from
>    virDomainDeviceInfoParseXML
>  conf: replace virDomainDeviceInfoClear with
>    virDomainDeviceInfoFreeContents
>
> src/conf/device_conf.c   | 15 +++-----
> src/conf/device_conf.h   |  2 +-
> src/conf/domain_conf.c   | 81 ++++++++++++++++++++--------------------
> src/conf/domain_conf.h   |  1 -
> src/libvirt_private.syms |  1 -
> 5 files changed, 47 insertions(+), 53 deletions(-)
>
>-- 
>2.29.2
>
Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Erik Skultety 3 years, 1 month ago
On Fri, Feb 12, 2021 at 10:43:56AM +0100, Martin Kletzander wrote:
> On Fri, Feb 12, 2021 at 12:54:02AM -0500, Laine Stump wrote:
> > I've looked at a few of these, and one thing I've found is that very
> > often we have a function called somethingSomethingClear(), and:
> > 
> > 1) The only places it is ever called will immediately free the memory
> > of the object as soon as they clear it.
> > 
> > and very possibly
> > 
> > 2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
> > 
> >    bobLoblawFree(def->bobloblaw)
> > 
> > and then don't actually set def->bobloblaw to NULL - so the functions
> > aren't actually "Clearing", they're "Freeing and then clearing a few
> > things, but not everything".
> > 
> 
> One thing I am wondering is whether this is really only used where it makes
> sense.  As far as I understand, and please correct me if I am way off, the
> purpose of the Clear functions is to:
> 
>  a) provide a way to remove everything from a structure that the current
>     function cannot recreate (there is a pointer to it somewhere else which
>     would not be updated) and
> 
>  b) provide a way to reset a structure so that it can be filled again without
>     needless reallocation.
> 
> I think (b) is obviously pointless, especially lately, so the only reasonable
> usage would be for the scenario (a).  However, I think I remember this also
> being used in places where it would be perfectly fine to free the variable and
> recreate it.  Maybe it could ease up the decision, at least by eliminating some
> of the code, if my hunch is correct.
> 
> In my quick search I only found virDomainVideoDefClear to be used in this manner
> and I am not convinced that it is worth having this extra function with extra

You could always memset it explicitly, someone might find the code more
readable then. IMO I'd vote for an explicit memset just for the sake of better
security practice (since we'll have to wait a little while for something like
SGX to be convenient to deploy and develop with...). Anyhow, I'm not sure how
many cycles exactly would be wasted, but IIRC a recent discussion memset can be
optimized away (correct me if I don't remember it well!), so Dan P.B.
suggested to gradually convert to some platform-specific ways on how to
sanitize the memory safely - with that in mind, I'd say we use an explicit
memset in all the functions in question and convert them later?

Erik

Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Michal Privoznik 3 years, 1 month ago
On 2/12/21 11:07 AM, Erik Skultety wrote:
> On Fri, Feb 12, 2021 at 10:43:56AM +0100, Martin Kletzander wrote:
>> On Fri, Feb 12, 2021 at 12:54:02AM -0500, Laine Stump wrote:
>>> I've looked at a few of these, and one thing I've found is that very
>>> often we have a function called somethingSomethingClear(), and:
>>>
>>> 1) The only places it is ever called will immediately free the memory
>>> of the object as soon as they clear it.
>>>
>>> and very possibly
>>>
>>> 2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
>>>
>>>     bobLoblawFree(def->bobloblaw)
>>>
>>> and then don't actually set def->bobloblaw to NULL - so the functions
>>> aren't actually "Clearing", they're "Freeing and then clearing a few
>>> things, but not everything".
>>>
>>
>> One thing I am wondering is whether this is really only used where it makes
>> sense.  As far as I understand, and please correct me if I am way off, the
>> purpose of the Clear functions is to:
>>
>>   a) provide a way to remove everything from a structure that the current
>>      function cannot recreate (there is a pointer to it somewhere else which
>>      would not be updated) and
>>
>>   b) provide a way to reset a structure so that it can be filled again without
>>      needless reallocation.
>>
>> I think (b) is obviously pointless, especially lately, so the only reasonable
>> usage would be for the scenario (a).  However, I think I remember this also
>> being used in places where it would be perfectly fine to free the variable and
>> recreate it.  Maybe it could ease up the decision, at least by eliminating some
>> of the code, if my hunch is correct.
>>
>> In my quick search I only found virDomainVideoDefClear to be used in this manner
>> and I am not convinced that it is worth having this extra function with extra
> 
> You could always memset it explicitly, someone might find the code more
> readable then. IMO I'd vote for an explicit memset just for the sake of better
> security practice (since we'll have to wait a little while for something like
> SGX to be convenient to deploy and develop with...). Anyhow, I'm not sure how
> many cycles exactly would be wasted, but IIRC a recent discussion memset can be
> optimized away (correct me if I don't remember it well!), so Dan P.B.
> suggested to gradually convert to some platform-specific ways on how to
> sanitize the memory safely - with that in mind, I'd say we use an explicit
> memset in all the functions in question and convert them later?

I think one can argue that if there's a memset() called inside a 
function that is supposed to clear out a member of a struct and later 
the member is tested against NULL, but compiler decides to "optimize" 
memset out - it's a compiler bug.

Michal

Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Daniel P. Berrangé 3 years, 1 month ago
On Fri, Feb 12, 2021 at 11:07:21AM +0100, Erik Skultety wrote:
> On Fri, Feb 12, 2021 at 10:43:56AM +0100, Martin Kletzander wrote:
> > On Fri, Feb 12, 2021 at 12:54:02AM -0500, Laine Stump wrote:
> > > I've looked at a few of these, and one thing I've found is that very
> > > often we have a function called somethingSomethingClear(), and:
> > > 
> > > 1) The only places it is ever called will immediately free the memory
> > > of the object as soon as they clear it.
> > > 
> > > and very possibly
> > > 
> > > 2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
> > > 
> > >    bobLoblawFree(def->bobloblaw)
> > > 
> > > and then don't actually set def->bobloblaw to NULL - so the functions
> > > aren't actually "Clearing", they're "Freeing and then clearing a few
> > > things, but not everything".
> > > 
> > 
> > One thing I am wondering is whether this is really only used where it makes
> > sense.  As far as I understand, and please correct me if I am way off, the
> > purpose of the Clear functions is to:
> > 
> >  a) provide a way to remove everything from a structure that the current
> >     function cannot recreate (there is a pointer to it somewhere else which
> >     would not be updated) and
> > 
> >  b) provide a way to reset a structure so that it can be filled again without
> >     needless reallocation.
> > 
> > I think (b) is obviously pointless, especially lately, so the only reasonable
> > usage would be for the scenario (a).  However, I think I remember this also
> > being used in places where it would be perfectly fine to free the variable and
> > recreate it.  Maybe it could ease up the decision, at least by eliminating some
> > of the code, if my hunch is correct.
> > 
> > In my quick search I only found virDomainVideoDefClear to be used in this manner
> > and I am not convinced that it is worth having this extra function with extra
> 
> You could always memset it explicitly, someone might find the code more
> readable then. IMO I'd vote for an explicit memset just for the sake of better
> security practice (since we'll have to wait a little while for something like
> SGX to be convenient to deploy and develop with...). Anyhow, I'm not sure how
> many cycles exactly would be wasted, but IIRC a recent discussion memset can be
> optimized away (correct me if I don't remember it well!), so Dan P.B.
> suggested to gradually convert to some platform-specific ways on how to
> sanitize the memory safely - with that in mind, I'd say we use an explicit
> memset in all the functions in question and convert them later?

I only suggest that for places where security is required. ie to scrub
passwords.

If the compiler wants to cull memsets in places unrelated to security
that's fine by me, or at least, not our problem to worry about.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt RFC PATCH 0/5] eliminating VIR_FREE in the *Clear() functions
Posted by Laine Stump 3 years, 1 month ago
On 2/12/21 5:25 AM, Daniel P. Berrangé wrote:
> On Fri, Feb 12, 2021 at 11:07:21AM +0100, Erik Skultety wrote:
>> On Fri, Feb 12, 2021 at 10:43:56AM +0100, Martin Kletzander wrote:
>>> On Fri, Feb 12, 2021 at 12:54:02AM -0500, Laine Stump wrote:
>>>> I've looked at a few of these, and one thing I've found is that very
>>>> often we have a function called somethingSomethingClear(), and:
>>>>
>>>> 1) The only places it is ever called will immediately free the memory
>>>> of the object as soon as they clear it.
>>>>
>>>> and very possibly
>>>>
>>>> 2) It doesn't actually *clear* everything. Some items are cleared via VIR_FREE(), but then some of the other pointers call
>>>>
>>>>     bobLoblawFree(def->bobloblaw)
>>>>
>>>> and then don't actually set def->bobloblaw to NULL - so the functions
>>>> aren't actually "Clearing", they're "Freeing and then clearing a few
>>>> things, but not everything".
>>>>
>>>
>>> One thing I am wondering is whether this is really only used where it makes
>>> sense.  As far as I understand, and please correct me if I am way off, the
>>> purpose of the Clear functions is to:
>>>
>>>   a) provide a way to remove everything from a structure that the current
>>>      function cannot recreate (there is a pointer to it somewhere else which
>>>      would not be updated) and
>>>
>>>   b) provide a way to reset a structure so that it can be filled again without
>>>      needless reallocation.
>>>
>>> I think (b) is obviously pointless, especially lately, so the only reasonable
>>> usage would be for the scenario (a).  However, I think I remember this also
>>> being used in places where it would be perfectly fine to free the variable and
>>> recreate it.  Maybe it could ease up the decision, at least by eliminating some
>>> of the code, if my hunch is correct.
>>>
>>> In my quick search I only found virDomainVideoDefClear to be used in this manner
>>> and I am not convinced that it is worth having this extra function with extra
>>
>> You could always memset it explicitly, someone might find the code more
>> readable then. IMO I'd vote for an explicit memset just for the sake of better
>> security practice (since we'll have to wait a little while for something like
>> SGX to be convenient to deploy and develop with...). Anyhow, I'm not sure how
>> many cycles exactly would be wasted, but IIRC a recent discussion memset can be
>> optimized away (correct me if I don't remember it well!), so Dan P.B.
>> suggested to gradually convert to some platform-specific ways on how to
>> sanitize the memory safely - with that in mind, I'd say we use an explicit
>> memset in all the functions in question and convert them later?
> 
> I only suggest that for places where security is required. ie to scrub
> passwords.

Yeah, I'm not planning to touch anything that is clearing out passwords 
and such. Only the *Clear() functions that currently have the dual 
purposes of:

1) freeing memory pointed to by the object in question (and any sub-objects)

2) clearing out the object so that it can be re-used with no side 
effects (e.g., pointers NULLed so that subsequent uses believe 
(correctly) that nothing is being pointed at, setting counters to 0, 
types to ..._NONE, etc.

> If the compiler wants to cull memsets in places unrelated to security
> that's fine by me, or at least, not our problem to worry about.

I would hope that the compiler would be smart enough to not optimize it 
out if it can't determine 100% that it will never make a difference. 
This would mean that, for example, unless a *Clear() function is defined 
static, it couldn't optimize out a memset() at the end (because it can't 
know what would be done with the object after return).

But if it's going to optimize out a memset, it would likely also 
optimize out the "loblaw = NULL;" in the VIR_FREE invocation, so...

(My mind keeps going back to 1994, when I turned on the 80386 "invalid 
address faults" bit (forget the exact name) on our router product that 
was running 8086 realmode *BSD, and suddenly so many stupid pointer bugs 
were immediately revealed )by a segfault) instead of the code just 
silently going off into the weeds. And when we started NULLing out 
pointers as things were freed we found so many more; the sources of 
mysterious problems that customers had been reporting for months were 
suddenly obvious. So my subconscious tells me that NULLing out freed 
pointers (and the memory they point to) is just "safer", and we're 
spending all this time removing that safety; kind of like going through 
all the cars in the world to remove their seatbelts because they make 
driving less convenient, and airbags offer a similar type of protection...)