[v1] RE: [PATCH 0/1] Rosebush, a new hash table

RE: [PATCH 0/1] Rosebush, a new hash table

Posted by David Laight 1 year, 11 months ago

From: Herbert Xu
> Sent: 24 February 2024 00:21
> 
> On Thu, Feb 22, 2024 at 08:37:23PM +0000, Matthew Wilcox (Oracle) wrote:
> >
> > Where I expect rosebush to shine is on dependent cache misses.
> > I've assumed an average chain length of 10 for rhashtable in the above
> > memory calculations.  That means on average a lookup would take five cache
> > misses that can't be speculated.  Rosebush does a linear walk of 4-byte
> 
> Normally an rhashtable gets resized when it reaches 75% capacity
> so the average chain length should always be one.

The average length of non-empty hash chains is more interesting.
You don't usually search for items in empty chains.
The only way you'll get all the chains of length one is if you've
carefully picked the data so that it hashed that way.

I remember playing around with the elf symbol table for a browser
and all its shared libraries.
While the hash function is pretty trivial, it really didn't matter
whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
chains were always long.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Kent Overstreet 1 year, 11 months ago

On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> From: Herbert Xu
> > Sent: 24 February 2024 00:21
> > 
> > On Thu, Feb 22, 2024 at 08:37:23PM +0000, Matthew Wilcox (Oracle) wrote:
> > >
> > > Where I expect rosebush to shine is on dependent cache misses.
> > > I've assumed an average chain length of 10 for rhashtable in the above
> > > memory calculations.  That means on average a lookup would take five cache
> > > misses that can't be speculated.  Rosebush does a linear walk of 4-byte
> > 
> > Normally an rhashtable gets resized when it reaches 75% capacity
> > so the average chain length should always be one.
> 
> The average length of non-empty hash chains is more interesting.
> You don't usually search for items in empty chains.
> The only way you'll get all the chains of length one is if you've
> carefully picked the data so that it hashed that way.
> 
> I remember playing around with the elf symbol table for a browser
> and all its shared libraries.
> While the hash function is pretty trivial, it really didn't matter
> whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> chains were always long.

that's a pretty bad hash, even golden ratio hash would be better, but
still bad; you really should be using at least jhash.

you really want a good avalanche effect, because in real world usage
your entropy is often only in a relatively few bits.

when I implemented cuckoo (which is more obviously sensitive to a weak
hash function), I had to go with siphash, even jhash wasn't giving me
great reslts. and looking at the code it's not hard to see why, it's all
adds, and the rotates are byte aligned... you want mixed adds and xors
and the rotates to be more prime-ish.

right idea, just old...

what would be ideal is something more like siphash, but with fewer
rounds, so same number of instructions as jhash. xxhash might fit the
bill, I haven't looked at the code yet...

RE: [PATCH 0/1] Rosebush, a new hash table

Posted by David Laight 1 year, 11 months ago

From: Kent Overstreet
> Sent: 25 February 2024 03:19
..
> when I implemented cuckoo (which is more obviously sensitive to a weak
> hash function), I had to go with siphash, even jhash wasn't giving me
> great reslts. and looking at the code it's not hard to see why, it's all
> adds, and the rotates are byte aligned... you want mixed adds and xors
> and the rotates to be more prime-ish.
> 
> right idea, just old...
> 
> what would be ideal is something more like siphash, but with fewer
> rounds, so same number of instructions as jhash. xxhash might fit the
> bill, I haven't looked at the code yet...

There is likely to be a point where scanning a list of values
for the right hash value is faster than executing a hash function
that is good enough to separate them to separate buckets.

You don't want to scan a linked list because they have horrid
cache footprints.
The locking is equally horrid - especially for remove.
Arrays of pointers ar ethe way forward :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Kent Overstreet 1 year, 11 months ago

On Sun, Feb 25, 2024 at 02:47:45PM +0000, David Laight wrote:
> From: Kent Overstreet
> > Sent: 25 February 2024 03:19
> ..
> > when I implemented cuckoo (which is more obviously sensitive to a weak
> > hash function), I had to go with siphash, even jhash wasn't giving me
> > great reslts. and looking at the code it's not hard to see why, it's all
> > adds, and the rotates are byte aligned... you want mixed adds and xors
> > and the rotates to be more prime-ish.
> > 
> > right idea, just old...
> > 
> > what would be ideal is something more like siphash, but with fewer
> > rounds, so same number of instructions as jhash. xxhash might fit the
> > bill, I haven't looked at the code yet...
> 
> There is likely to be a point where scanning a list of values
> for the right hash value is faster than executing a hash function
> that is good enough to separate them to separate buckets.

Executing a bunch of alu ops that parallelize nicely is _fast_
(it's all xors/adds/rotates, chacha20 is a good example of how
the modern stuff works).

Also, for 2-way cuckoo there's an xor trick so you don't need to compute
a second hash.

But the key thing about cuckoo is that the loads on lookup _aren't
dependent_ - they can run in parallel. Every cache miss that goes all
the way to DRAM is stupidly expensive, remember - hundreds of
instructions, had you been able to keep the pipeline fed.

> You don't want to scan a linked list because they have horrid
> cache footprints.
> The locking is equally horrid - especially for remove.
> Arrays of pointers ar ethe way forward :-)

Well, maybe; I'm waiting to see fill factor numbers and benchmarks. Fill
factor was my concern when I was playing around with the concept; I used
it in a place where the hash table didn't need to be that big, and the
point was to avoid having to separately allocate the entries (and it
avoids the hassle of tombstone entries with linear/quadratic probing).

The fact that it requires a second dependent load, because buckets are
separately allocated, also looks like a big negative to me. I still
think a good lockless cuckoo hashing implementation ought to beat it.

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Matthew Wilcox 1 year, 11 months ago

On Sat, Feb 24, 2024 at 10:18:31PM -0500, Kent Overstreet wrote:
> On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> > I remember playing around with the elf symbol table for a browser
> > and all its shared libraries.
> > While the hash function is pretty trivial, it really didn't matter
> > whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> > chains were always long.
> 
> that's a pretty bad hash, even golden ratio hash would be better, but
> still bad; you really should be using at least jhash.

There's a "fun" effect; essentially the "biased observer" effect which
leads students to erroneously conclude that the majority of classes are
oversubscribed.  As somebody observed in this thread, for some usecases
you only look up hashes which actually exist.

Task a trivial example where you have four entries unevenly distributed
between two buckets, three in one bucket and one in the other.  Now 3/4
of your lookups hit in one bucket and 1/4 in the other bucket.
Obviously it's not as pronounced if you have 1000 buckets with 1000
entries randomly distributed between the buckets.  But that distribution
is not nearly as even as you might expect:

$ ./distrib
0: 362
1: 371
2: 193
3: 57
4: 13
5: 4

That's using lrand48() to decide which bucket to use, so not even a
"quality of hash" problem, just a "your mathematical intuition may not
be right here".

To put this data another way, 371 entries are in a bucket with a single
entry, 384 are in a bucket with two entries, 171 are in a 3-entry
bucket, 52 are in a 4-entry bucket and 20 are in a 5-entry bucket.

$ cat distrib.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

int bucket[1000];
int freq[10];

int main(int argc, char **argv)
{
	int i;

	for (i = 0; i < 1000; i++)
		bucket[lrand48() % 1000]++;

	for (i = 0; i < 1000; i++)
		freq[bucket[i]]++;

	for (i = 0; i < 10; i++)
		printf("%d: %d\n", i, freq[i]);

	return 0;
}

(ok, quibble about "well, 1000 doesn't divide INT_MAX evenly so your
random number generation is biased", but i maintain that will not
materially affect these results due to it affecting only 0.00003% of
numbers generated)

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Kent Overstreet 1 year, 11 months ago

On Sun, Feb 25, 2024 at 05:01:19AM +0000, Matthew Wilcox wrote:
> On Sat, Feb 24, 2024 at 10:18:31PM -0500, Kent Overstreet wrote:
> > On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> > > I remember playing around with the elf symbol table for a browser
> > > and all its shared libraries.
> > > While the hash function is pretty trivial, it really didn't matter
> > > whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> > > chains were always long.
> > 
> > that's a pretty bad hash, even golden ratio hash would be better, but
> > still bad; you really should be using at least jhash.
> 
> There's a "fun" effect; essentially the "biased observer" effect which
> leads students to erroneously conclude that the majority of classes are
> oversubscribed.  As somebody observed in this thread, for some usecases
> you only look up hashes which actually exist.
> 
> Task a trivial example where you have four entries unevenly distributed
> between two buckets, three in one bucket and one in the other.  Now 3/4
> of your lookups hit in one bucket and 1/4 in the other bucket.
> Obviously it's not as pronounced if you have 1000 buckets with 1000
> entries randomly distributed between the buckets.  But that distribution
> is not nearly as even as you might expect:
> 
> $ ./distrib
> 0: 362
> 1: 371
> 2: 193
> 3: 57
> 4: 13
> 5: 4
> 
> That's using lrand48() to decide which bucket to use, so not even a
> "quality of hash" problem, just a "your mathematical intuition may not
> be right here".

well, golden ratio hash - hash_32(i, 32)
0: 368
1: 264
2: 368
3: 0

but your distribution actually is accurate in general, golden ratio hash
is relly nice for sequential integers. the actual problem with your test
is that you're testing 100% occupancy - no one does that.

75% occupancy, siphash:
0: 933
1: 60
2: 6
3: 1
4: 0

that looks about right to me.

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Herbert Xu 1 year, 11 months ago

On Sun, Feb 25, 2024 at 12:51:06AM -0500, Kent Overstreet wrote:
>
> but your distribution actually is accurate in general, golden ratio hash
> is relly nice for sequential integers. the actual problem with your test
> is that you're testing 100% occupancy - no one does that.
> 
> 75% occupancy, siphash:
> 0: 933
> 1: 60
> 2: 6
> 3: 1
> 4: 0
> 
> that looks about right to me.

The point is that the worst-case length grows with the size of
the table so it won't always be 3.  You need to take into account
the largest table size that you will support.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Kent Overstreet 1 year, 11 months ago

On Sun, Feb 25, 2024 at 01:53:35PM +0800, Herbert Xu wrote:
> On Sun, Feb 25, 2024 at 12:51:06AM -0500, Kent Overstreet wrote:
> >
> > but your distribution actually is accurate in general, golden ratio hash
> > is relly nice for sequential integers. the actual problem with your test
> > is that you're testing 100% occupancy - no one does that.
> > 
> > 75% occupancy, siphash:
> > 0: 933
> > 1: 60
> > 2: 6
> > 3: 1
> > 4: 0
> > 
> > that looks about right to me.
> 
> The point is that the worst-case length grows with the size of
> the table so it won't always be 3.  You need to take into account
> the largest table size that you will support.

ok, but - one million entries, siphash, 75% fill factor

0: 472053
1: 354786
2: 132663
3: 33267
4: 6218
5: 884
6: 110
7: 17
8: 2
9: 0

100 million:

0: 51342703
1: 34224025
2: 11413241
3: 2534946
4: 421816
5: 56271
6: 6346
7: 593
8: 56
9: 3
10: 0

it's a log curve - chain length of 16 means you picked a bad hash
function.

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Herbert Xu 1 year, 11 months ago

On Sun, Feb 25, 2024 at 01:14:39AM -0500, Kent Overstreet wrote:
>
> it's a log curve -

Yes it's O(log N/log log N).

> chain length of 16 means you picked a bad hash
> function.

Or that someone is trying to attack you.  The number 16 is the
cut-off where we decide that someone has discovered our hash
secret and can cause a particular to chain to grow without bound.

At this point rhashtable will force a rehash with a different
secret.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Herbert Xu 1 year, 11 months ago

On Sun, Feb 25, 2024 at 05:01:19AM +0000, Matthew Wilcox wrote:
> 
> Task a trivial example where you have four entries unevenly distributed
> between two buckets, three in one bucket and one in the other.  Now 3/4
> of your lookups hit in one bucket and 1/4 in the other bucket.
> Obviously it's not as pronounced if you have 1000 buckets with 1000
> entries randomly distributed between the buckets.  But that distribution
> is not nearly as even as you might expect:
> 
> $ ./distrib
> 0: 362
> 1: 371
> 2: 193
> 3: 57
> 4: 13
> 5: 4

Indeed, that's why rhashtable only triggers a forced rehash at
a chain length of 16 even though we expect the average chain length
to be just 1.

The theoretical worst-case value is expected to be O(lg n/lg lg n).
However, I think 16 was picked because it was sufficient even for a
hash table that filled all memory.  Of course if anyone can provide
some calculation showing that this is insufficient I'm happy to raise
the limit.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Herbert Xu 1 year, 11 months ago

On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
>
> > Normally an rhashtable gets resized when it reaches 75% capacity
> > so the average chain length should always be one.
> 
> The average length of non-empty hash chains is more interesting.
> You don't usually search for items in empty chains.
> The only way you'll get all the chains of length one is if you've
> carefully picked the data so that it hashed that way.

Sure.  But given the 75% capacity, you'd need a really bad hash
function to get an *average* (not worst-case) chain length of
10.

> I remember playing around with the elf symbol table for a browser
> and all its shared libraries.
> While the hash function is pretty trivial, it really didn't matter
> whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> chains were always long.

Even in the unlikely event of bad luck and everything bunches up
together, we change theh hash function (through hash_rnd) every
time we resize so you would expect things to even out after the
resize event.

A rehash is also automatically triggered if the worst-case chain
length exceeds 16.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/1] Rosebush, a new hash table

Posted by Kent Overstreet 1 year, 11 months ago

On Sun, Feb 25, 2024 at 08:50:36AM +0800, Herbert Xu wrote:
> On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> >
> > > Normally an rhashtable gets resized when it reaches 75% capacity
> > > so the average chain length should always be one.
> > 
> > The average length of non-empty hash chains is more interesting.
> > You don't usually search for items in empty chains.
> > The only way you'll get all the chains of length one is if you've
> > carefully picked the data so that it hashed that way.
> 
> Sure.  But given the 75% capacity, you'd need a really bad hash
> function to get an *average* (not worst-case) chain length of
> 10.
> 
> > I remember playing around with the elf symbol table for a browser
> > and all its shared libraries.
> > While the hash function is pretty trivial, it really didn't matter
> > whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> > chains were always long.
> 
> Even in the unlikely event of bad luck and everything bunches up
> together, we change theh hash function (through hash_rnd) every
> time we resize so you would expect things to even out after the
> resize event.
> 
> A rehash is also automatically triggered if the worst-case chain
> length exceeds 16.

16!? that's crap, use a decent hash function and 3-5 should be your
worst upper bound.