This provides an implementation for mul_u64_u64_div_u64() that always
produces exact results.
Changes from v2 (https://lore.kernel.org/lkml/20240703033552.906852-2-nico@fluxnic.net/T/):
- Dispense with the fancy union. This makes the source smaller and
avoids #ifdef for endian ordering that trips kernel test robot somehow.
Changes from v1 (https://lkml.org/lkml/2024/6/28/1130):
- Use the already available u128 type instead of "unsigned __int128".
- Add a test module.