util: hash: Append to hash buckets when adding new entries

[libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Peter Krempa 5 years ago

In cases when the hash function for a name collides with other entry
already in the hash we prepend to the bucket. This creates a 'stack
effect' on the buckets if we then iterate through the hash. Normally
this is not a problem, but in tests we want deterministic results.

Since it does not matter where we add the entry and it's usually more
probable that a different entry will be accessed next change it to
append to the end of the bucket. Luckily we already iterate throught the
bucket once thus we can easily find the last entry and just connect the
new entry after it.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/util/virhash.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/util/virhash.c b/src/util/virhash.c
index 4907c1124f..0e30106041 100644
--- a/src/util/virhash.c
+++ b/src/util/virhash.c
@@ -316,6 +316,7 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name,
 {
     size_t key, len = 0;
     virHashEntryPtr entry;
+    virHashEntryPtr last = NULL;
     void *new_name;

     if ((table == NULL) || (name == NULL))
@@ -337,6 +338,7 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name,
                 return -1;
             }
         }
+        last = entry;
         len++;
     }

@@ -347,8 +349,11 @@ virHashAddOrUpdateEntry(virHashTablePtr table, const void *name,

     entry->name = new_name;
     entry->payload = userdata;
-    entry->next = table->table[key];
-    table->table[key] = entry;
+
+    if (last)
+        last->next = entry;
+    else
+        table->table[key] = entry;

     table->nbElems++;

-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Daniel P. Berrangé 5 years ago

On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> In cases when the hash function for a name collides with other entry
> already in the hash we prepend to the bucket. This creates a 'stack
> effect' on the buckets if we then iterate through the hash. Normally
> this is not a problem, but in tests we want deterministic results.
> 
> Since it does not matter where we add the entry and it's usually more
> probable that a different entry will be accessed next change it to
> append to the end of the bucket. Luckily we already iterate throught the
> bucket once thus we can easily find the last entry and just connect the
> new entry after it.

I'm not understanding how this change makes the tests any more
deterministic than before.

The determinism for hash bucket placement is already ensured by
having a non-random impl forvirHashCodeGen in
tests/virdeterministichashmock.c

This change simply reverses the order in which we iterate over values
when many keys hash to the same bucket.  If this mattered for tests
then surely, this change would have to update the test suite too to
take account of the new reversed ordering ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Peter Krempa 5 years ago

On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
> On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> > In cases when the hash function for a name collides with other entry
> > already in the hash we prepend to the bucket. This creates a 'stack
> > effect' on the buckets if we then iterate through the hash. Normally
> > this is not a problem, but in tests we want deterministic results.
> > 
> > Since it does not matter where we add the entry and it's usually more
> > probable that a different entry will be accessed next change it to
> > append to the end of the bucket. Luckily we already iterate throught the
> > bucket once thus we can easily find the last entry and just connect the
> > new entry after it.
> 
> I'm not understanding how this change makes the tests any more
> deterministic than before.
> 
> The determinism for hash bucket placement is already ensured by
> having a non-random impl forvirHashCodeGen in
> tests/virdeterministichashmock.c
> 
> This change simply reverses the order in which we iterate over values
> when many keys hash to the same bucket.  If this mattered for tests
> then surely, this change would have to update the test suite too to
> take account of the new reversed ordering ?

If you have a hash collision when adding into the hash and then use
virHashForeach the items are formatted in opposite order.

Thus if you use this to e.g. store things parsed from XML and then
format it back, every single iteration reverses the order. This is not a
problem per-se since we can use a different output file, but it's
inconvenient since we can't use the same output file.

Currently there aren't any collisions which would cause problems, but I
have one on my private branch.

Given that this change is rather simple I figured we could just stop
inverting them.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Daniel P. Berrangé 5 years ago

On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
> On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
> > On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> > > In cases when the hash function for a name collides with other entry
> > > already in the hash we prepend to the bucket. This creates a 'stack
> > > effect' on the buckets if we then iterate through the hash. Normally
> > > this is not a problem, but in tests we want deterministic results.
> > > 
> > > Since it does not matter where we add the entry and it's usually more
> > > probable that a different entry will be accessed next change it to
> > > append to the end of the bucket. Luckily we already iterate throught the
> > > bucket once thus we can easily find the last entry and just connect the
> > > new entry after it.
> > 
> > I'm not understanding how this change makes the tests any more
> > deterministic than before.
> > 
> > The determinism for hash bucket placement is already ensured by
> > having a non-random impl forvirHashCodeGen in
> > tests/virdeterministichashmock.c
> > 
> > This change simply reverses the order in which we iterate over values
> > when many keys hash to the same bucket.  If this mattered for tests
> > then surely, this change would have to update the test suite too to
> > take account of the new reversed ordering ?
> 
> If you have a hash collision when adding into the hash and then use
> virHashForeach the items are formatted in opposite order.
> 
> Thus if you use this to e.g. store things parsed from XML and then
> format it back, every single iteration reverses the order. This is not a
> problem per-se since we can use a different output file, but it's
> inconvenient since we can't use the same output file.
> 
> Currently there aren't any collisions which would cause problems, but I
> have one on my private branch.
> 
> Given that this change is rather simple I figured we could just stop
> inverting them.

Ok, that does make a little more sense, though generally my view would
be that if we're doing something that is order sensitive like XML
parsing & formatting, then we shouldn't be using a hash table to
represent the data - at least not as the primary record. 

Which bit of code is using the hash in this case ? Is it something
purely inside the test suite, or in the real XML handling code ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Peter Krempa 5 years ago

On Tue, Apr 16, 2019 at 15:32:31 +0100, Daniel Berrange wrote:
> On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
> > On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
> > > On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:

[...]

> > Currently there aren't any collisions which would cause problems, but I
> > have one on my private branch.
> > 
> > Given that this change is rather simple I figured we could just stop
> > inverting them.
> 
> Ok, that does make a little more sense, though generally my view would
> be that if we're doing something that is order sensitive like XML
> parsing & formatting, then we shouldn't be using a hash table to
> represent the data - at least not as the primary record. 
> 
> Which bit of code is using the hash in this case ? Is it something
> purely inside the test suite, or in the real XML handling code ?

I'm adding some private data for blockjobs which are formatted into the
status XML. I'm storing them in a hash table as the order isn't really
important here and the user should never see this.

I was just bothered by the fact that I'd need to have an output file (or
change the name of the blockjob) and figured that the insertion into the
hash table is very simple and does not really matter how we do it.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Daniel P. Berrangé 5 years ago

On Tue, Apr 16, 2019 at 04:41:57PM +0200, Peter Krempa wrote:
> On Tue, Apr 16, 2019 at 15:32:31 +0100, Daniel Berrange wrote:
> > On Tue, Apr 16, 2019 at 04:26:49PM +0200, Peter Krempa wrote:
> > > On Tue, Apr 16, 2019 at 15:17:31 +0100, Daniel Berrange wrote:
> > > > On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> 
> [...]
> 
> > > Currently there aren't any collisions which would cause problems, but I
> > > have one on my private branch.
> > > 
> > > Given that this change is rather simple I figured we could just stop
> > > inverting them.
> > 
> > Ok, that does make a little more sense, though generally my view would
> > be that if we're doing something that is order sensitive like XML
> > parsing & formatting, then we shouldn't be using a hash table to
> > represent the data - at least not as the primary record. 
> > 
> > Which bit of code is using the hash in this case ? Is it something
> > purely inside the test suite, or in the real XML handling code ?
> 
> I'm adding some private data for blockjobs which are formatted into the
> status XML. I'm storing them in a hash table as the order isn't really
> important here and the user should never see this.
> 
> I was just bothered by the fact that I'd need to have an output file (or
> change the name of the blockjob) and figured that the insertion into the
> hash table is very simple and does not really matter how we do it.

ok, that makes more sense now, thanks for explaining.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] util: hash: Append to hash buckets when adding new entries

Posted by Daniel P. Berrangé 5 years ago

On Tue, Apr 16, 2019 at 04:09:54PM +0200, Peter Krempa wrote:
> In cases when the hash function for a name collides with other entry
> already in the hash we prepend to the bucket. This creates a 'stack
> effect' on the buckets if we then iterate through the hash. Normally
> this is not a problem, but in tests we want deterministic results.
> 
> Since it does not matter where we add the entry and it's usually more
> probable that a different entry will be accessed next change it to
> append to the end of the bucket. Luckily we already iterate throught the
> bucket once thus we can easily find the last entry and just connect the
> new entry after it.
> 
> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
> ---
>  src/util/virhash.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list