FDT properties are aligned by 4 bytes, not 8 bytes.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/ppc/vof.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index e3b430a81f4f..b5b6514d79fc 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
if (sc == 2) {
- mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
+ mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
} else {
mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
}
--
2.45.2
On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > FDT properties are aligned by 4 bytes, not 8 bytes. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > --- > hw/ppc/vof.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > index e3b430a81f4f..b5b6514d79fc 100644 > --- a/hw/ppc/vof.c > +++ b/hw/ppc/vof.c > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > if (sc == 2) { > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > } else { > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > } I did wonder if there was a better way to do what this is doing, but neither we (in system/device_tree.c) nor libfdt seem to provide one. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM
On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > --- > > hw/ppc/vof.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > index e3b430a81f4f..b5b6514d79fc 100644 > > --- a/hw/ppc/vof.c > > +++ b/hw/ppc/vof.c > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > if (sc == 2) { > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > } else { > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > } > > I did wonder if there was a better way to do what this is doing, > but neither we (in system/device_tree.c) nor libfdt seem to > provide one. libfdt does provide unaligned access helpers (fdt32_ld() etc.), but not an automatic aligned-or-unaligned helper. Maybe we should add that? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > --- > > > hw/ppc/vof.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > --- a/hw/ppc/vof.c > > > +++ b/hw/ppc/vof.c > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > if (sc == 2) { > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > } else { > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > } > > > > I did wonder if there was a better way to do what this is doing, > > but neither we (in system/device_tree.c) nor libfdt seem to > > provide one. > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > not an automatic aligned-or-unaligned helper. Maybe we should add that? fdt32_ld() and friends only do the "load from this bit of memory" part, which we already have QEMU utility functions for (and which are this patch uses). This particular bit of code is dealing with an fdt property ("memory") that is an array of (address, size) tuples where address and size can independently be either 32 or 64 bits, and it wants the size value of tuple 0. So the missing functionality is something at a higher level than fdt32_ld() which would let you say "give me tuple N field X" with some way to specify the tuple layout. (Which is an awkward kind of API to write in C.) Slightly less general, but for this case we could perhaps have something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): uint64_t value_array[2]; qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, ac, sc); /* * fills in value_array[0] with address, value_array[1] with size, * probably barfs if the varargs-list of cell-sizes doesn't * cover the whole property, similar to the current assert on * proplen. */ mem0_end = value_array[0]; thanks -- PMM
On Thu, Jul 04, 2024 at 01:15:57PM +0100, Peter Maydell wrote: > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > --- > > > > hw/ppc/vof.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > --- a/hw/ppc/vof.c > > > > +++ b/hw/ppc/vof.c > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > if (sc == 2) { > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > } else { > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > } > > > > > > I did wonder if there was a better way to do what this is doing, > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > provide one. > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > fdt32_ld() and friends only do the "load from this bit of memory" > part, which we already have QEMU utility functions for (and which > are this patch uses). > > This particular bit of code is dealing with an fdt property ("memory") > that is an array of (address, size) tuples where address and size > can independently be either 32 or 64 bits, and it wants the > size value of tuple 0. So the missing functionality is something at > a higher level than fdt32_ld() which would let you say "give me > tuple N field X" with some way to specify the tuple layout. (Which > is an awkward kind of API to write in C.) Ah, right. Yeah.. that's a pretty awkward API in C. > Slightly less general, but for this case we could perhaps have > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > uint64_t value_array[2]; > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > ac, sc); > /* > * fills in value_array[0] with address, value_array[1] with size, > * probably barfs if the varargs-list of cell-sizes doesn't > * cover the whole property, similar to the current assert on > * proplen. > */ > mem0_end = value_array[0]; Seems reasonable to me. The only other thought I had was something like Python's struct.unpack() [0]. But your suggestion is probably more natural in C. [0] https://docs.python.org/3/library/struct.html#struct.unpack -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > --- > > > > hw/ppc/vof.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > --- a/hw/ppc/vof.c > > > > +++ b/hw/ppc/vof.c > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > if (sc == 2) { > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > } else { > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > } > > > > > > I did wonder if there was a better way to do what this is doing, > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > provide one. > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > fdt32_ld() and friends only do the "load from this bit of memory" > part, which we already have QEMU utility functions for (and which > are this patch uses). > > This particular bit of code is dealing with an fdt property ("memory") > that is an array of (address, size) tuples where address and size > can independently be either 32 or 64 bits, and it wants the > size value of tuple 0. So the missing functionality is something at > a higher level than fdt32_ld() which would let you say "give me > tuple N field X" with some way to specify the tuple layout. (Which > is an awkward kind of API to write in C.) > > Slightly less general, but for this case we could perhaps have > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > uint64_t value_array[2]; > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > ac, sc); > /* > * fills in value_array[0] with address, value_array[1] with size, > * probably barfs if the varargs-list of cell-sizes doesn't > * cover the whole property, similar to the current assert on > * proplen. > */ > mem0_end = value_array[0]; Since 4/8 byte cells are most common and size is probably normally known, what about something simpler to start with? Thanks, Nick --- diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h index 0677fea..c4b6355 100644 --- a/libfdt/libfdt.h +++ b/libfdt/libfdt.h @@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p) | bp[3]; } +/* + * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned + * so can use a single load. + */ +static inline uint32_t fdt32_ld_prop(const fdt32_t *p) +{ + return fdt32_to_cpu(*p); +} + static inline void fdt32_st(void *property, uint32_t value) { uint8_t *bp = (uint8_t *)property; @@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p) | bp[7]; } +/* + * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned + * so can use two loads. + */ +static inline uint64_t fdt64_ld_prop(const fdt64_t *p) +{ + const fdt64_t *_p = p; + + return ((uint64_t)fdt32_to_cpu(_p[0]) << 32) + | fdt32_to_cpu(_p[1]); +} + static inline void fdt64_st(void *property, uint64_t value) { uint8_t *bp = (uint8_t *)property;
On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > --- > > > > > hw/ppc/vof.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > --- a/hw/ppc/vof.c > > > > > +++ b/hw/ppc/vof.c > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > if (sc == 2) { > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > } else { > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > } > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > provide one. > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > part, which we already have QEMU utility functions for (and which > > are this patch uses). > > > > This particular bit of code is dealing with an fdt property ("memory") > > that is an array of (address, size) tuples where address and size > > can independently be either 32 or 64 bits, and it wants the > > size value of tuple 0. So the missing functionality is something at > > a higher level than fdt32_ld() which would let you say "give me > > tuple N field X" with some way to specify the tuple layout. (Which > > is an awkward kind of API to write in C.) > > > > Slightly less general, but for this case we could perhaps have > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > uint64_t value_array[2]; > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > ac, sc); > > /* > > * fills in value_array[0] with address, value_array[1] with size, > > * probably barfs if the varargs-list of cell-sizes doesn't > > * cover the whole property, similar to the current assert on > > * proplen. > > */ > > mem0_end = value_array[0]; > > Since 4/8 byte cells are most common and size is probably > normally known, what about something simpler to start with? Hrm, I don't think this helps much. As Peter points out the actual load isn't really the issue, it's locating the right spot for it. > > Thanks, > Nick > > --- > diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h > index 0677fea..c4b6355 100644 > --- a/libfdt/libfdt.h > +++ b/libfdt/libfdt.h > @@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p) > | bp[3]; > } > > +/* > + * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned > + * so can use a single load. > + */ > +static inline uint32_t fdt32_ld_prop(const fdt32_t *p) > +{ > + return fdt32_to_cpu(*p); > +} > + > static inline void fdt32_st(void *property, uint32_t value) > { > uint8_t *bp = (uint8_t *)property; > @@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p) > | bp[7]; > } > > +/* > + * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned > + * so can use two loads. > + */ > +static inline uint64_t fdt64_ld_prop(const fdt64_t *p) > +{ > + const fdt64_t *_p = p; > + > + return ((uint64_t)fdt32_to_cpu(_p[0]) << 32) > + | fdt32_to_cpu(_p[1]); > +} > + > static inline void fdt64_st(void *property, uint64_t value) > { > uint8_t *bp = (uint8_t *)property; > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > --- > > > > > > hw/ppc/vof.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > --- a/hw/ppc/vof.c > > > > > > +++ b/hw/ppc/vof.c > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > if (sc == 2) { > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > } else { > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > } > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > provide one. > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > part, which we already have QEMU utility functions for (and which > > > are this patch uses). > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > that is an array of (address, size) tuples where address and size > > > can independently be either 32 or 64 bits, and it wants the > > > size value of tuple 0. So the missing functionality is something at > > > a higher level than fdt32_ld() which would let you say "give me > > > tuple N field X" with some way to specify the tuple layout. (Which > > > is an awkward kind of API to write in C.) > > > > > > Slightly less general, but for this case we could perhaps have > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > uint64_t value_array[2]; > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > ac, sc); > > > /* > > > * fills in value_array[0] with address, value_array[1] with size, > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > * cover the whole property, similar to the current assert on > > > * proplen. > > > */ > > > mem0_end = value_array[0]; > > > > Since 4/8 byte cells are most common and size is probably > > normally known, what about something simpler to start with? > > Hrm, I don't think this helps much. As Peter points out the actual > load isn't really the issue, it's locating the right spot for it. I don't really see why that's a problem, it's just a pointer addition - base + fdt_address_cells * 4. The problem was in the memory access (yes it's fixed with the patch but you could add a general libfdt way to do it). Some fancy function like above could be used, But is it really worth implementing such a thing for this? Thanks, Nick
On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > --- > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > --- a/hw/ppc/vof.c > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > if (sc == 2) { > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > } else { > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > } > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > provide one. > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > part, which we already have QEMU utility functions for (and which > > > > are this patch uses). > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > that is an array of (address, size) tuples where address and size > > > > can independently be either 32 or 64 bits, and it wants the > > > > size value of tuple 0. So the missing functionality is something at > > > > a higher level than fdt32_ld() which would let you say "give me > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > is an awkward kind of API to write in C.) > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > uint64_t value_array[2]; > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > ac, sc); > > > > /* > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > * cover the whole property, similar to the current assert on > > > > * proplen. > > > > */ > > > > mem0_end = value_array[0]; > > > > > > Since 4/8 byte cells are most common and size is probably > > > normally known, what about something simpler to start with? > > > > Hrm, I don't think this helps much. As Peter points out the actual > > load isn't really the issue, it's locating the right spot for it. > > I don't really see why that's a problem, it's just a pointer > addition - base + fdt_address_cells * 4. The problem was in This is harder if #address-cells and #size-cells are different, or if you're parsing ranges and #address-cells is different between parent and child node. > the memory access (yes it's fixed with the patch but you could > add a general libfdt way to do it). Huh.. well I'm getting different impressions of what the problem actually is from what I initially read versus Peter Maydell's comments, so I don't really know what to think. If it's just the load then fdt32_ld() etc. already exist. Or is it really such a hot path that unconditionally handling unaligned accesses isn't tenable? > Some fancy function like above could be used, But is it really > worth implementing such a thing for this? > > Thanks, > Nick > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > > --- > > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > > --- a/hw/ppc/vof.c > > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > > if (sc == 2) { > > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > > } else { > > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > } > > > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > > provide one. > > > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > > part, which we already have QEMU utility functions for (and which > > > > > are this patch uses). > > > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > > that is an array of (address, size) tuples where address and size > > > > > can independently be either 32 or 64 bits, and it wants the > > > > > size value of tuple 0. So the missing functionality is something at > > > > > a higher level than fdt32_ld() which would let you say "give me > > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > > is an awkward kind of API to write in C.) > > > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > > > uint64_t value_array[2]; > > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > > ac, sc); > > > > > /* > > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > > * cover the whole property, similar to the current assert on > > > > > * proplen. > > > > > */ > > > > > mem0_end = value_array[0]; > > > > > > > > Since 4/8 byte cells are most common and size is probably > > > > normally known, what about something simpler to start with? > > > > > > Hrm, I don't think this helps much. As Peter points out the actual > > > load isn't really the issue, it's locating the right spot for it. > > > > I don't really see why that's a problem, it's just a pointer > > addition - base + fdt_address_cells * 4. The problem was in > > This is harder if #address-cells and #size-cells are different, or if > you're parsing ranges and #address-cells is different between parent > and child node. > > > the memory access (yes it's fixed with the patch but you could > > add a general libfdt way to do it). > > Huh.. well I'm getting different impressions of what the problem > actually is from what I initially read versus Peter Maydell's > comments, so I don't really know what to think. > > If it's just the load then fdt32_ld() etc. already exist. Or is it > really such a hot path that unconditionally handling unaligned > accesses isn't tenable? The specific problem here is that the code as written tries to cast a not-aligned-enough pointer to uint64_t* to do the load, which is UB. The patch submitted fixes that, and personally I think it would be entirely fine to say that's all we need to do here. *If* we want to look at the broader question of "why is this code that's reading something out of an fdt having to do the pretty low-level action of getting the start address of the fdt property and then doing pointer arithmetic and then fishing a value out of it as a 64-bit unaligned load?" then we get into "do we want to provide a helper function that lets the caller say 'give me element X from tuple Y'?". But that seems like a lot of effort for what's basically a single callsite we would be tidying up... thanks -- PMM
On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote: > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > > > --- > > > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > > > --- a/hw/ppc/vof.c > > > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > > > if (sc == 2) { > > > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > > > } else { > > > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > } > > > > > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > > > provide one. > > > > > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > > > part, which we already have QEMU utility functions for (and which > > > > > > are this patch uses). > > > > > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > > > that is an array of (address, size) tuples where address and size > > > > > > can independently be either 32 or 64 bits, and it wants the > > > > > > size value of tuple 0. So the missing functionality is something at > > > > > > a higher level than fdt32_ld() which would let you say "give me > > > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > > > is an awkward kind of API to write in C.) > > > > > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > > > > > uint64_t value_array[2]; > > > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > > > ac, sc); > > > > > > /* > > > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > > > * cover the whole property, similar to the current assert on > > > > > > * proplen. > > > > > > */ > > > > > > mem0_end = value_array[0]; > > > > > > > > > > Since 4/8 byte cells are most common and size is probably > > > > > normally known, what about something simpler to start with? > > > > > > > > Hrm, I don't think this helps much. As Peter points out the actual > > > > load isn't really the issue, it's locating the right spot for it. > > > > > > I don't really see why that's a problem, it's just a pointer > > > addition - base + fdt_address_cells * 4. The problem was in > > > > This is harder if #address-cells and #size-cells are different, or if > > you're parsing ranges and #address-cells is different between parent > > and child node. > > > > > the memory access (yes it's fixed with the patch but you could > > > add a general libfdt way to do it). > > > > Huh.. well I'm getting different impressions of what the problem > > actually is from what I initially read versus Peter Maydell's > > comments, so I don't really know what to think. > > > > If it's just the load then fdt32_ld() etc. already exist. Or is it > > really such a hot path that unconditionally handling unaligned > > accesses isn't tenable? > > The specific problem here is that the code as written tries to > cast a not-aligned-enough pointer to uint64_t* to do the load, > which is UB. Ah... and I'm assuming it's the cast itself which triggers the UB, not just dereferencing it. Which makes the interface of fdt32_ld() etc. unusable for their intended purpose. Well.. damn. Now... how do I fix it without breaking compatibility any more than I have to. > The patch submitted fixes that, and personally I > think it would be entirely fine to say that's all we need to do here. > > *If* we want to look at the broader question of "why is this > code that's reading something out of an fdt having to do the > pretty low-level action of getting the start address of the > fdt property and then doing pointer arithmetic and then fishing > a value out of it as a 64-bit unaligned load?" then we get into > "do we want to provide a helper function that lets the caller > say 'give me element X from tuple Y'?". But that seems like a > lot of effort for what's basically a single callsite we would > be tidying up... I tend to agree. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote: > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote: > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > > > > --- > > > > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > > > > --- a/hw/ppc/vof.c > > > > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > > > > if (sc == 2) { > > > > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > > > > } else { > > > > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > > > > provide one. > > > > > > > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > > > > part, which we already have QEMU utility functions for (and which > > > > > > > are this patch uses). > > > > > > > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > > > > that is an array of (address, size) tuples where address and size > > > > > > > can independently be either 32 or 64 bits, and it wants the > > > > > > > size value of tuple 0. So the missing functionality is something at > > > > > > > a higher level than fdt32_ld() which would let you say "give me > > > > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > > > > is an awkward kind of API to write in C.) > > > > > > > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > > > > > > > uint64_t value_array[2]; > > > > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > > > > ac, sc); > > > > > > > /* > > > > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > > > > * cover the whole property, similar to the current assert on > > > > > > > * proplen. > > > > > > > */ > > > > > > > mem0_end = value_array[0]; > > > > > > > > > > > > Since 4/8 byte cells are most common and size is probably > > > > > > normally known, what about something simpler to start with? > > > > > > > > > > Hrm, I don't think this helps much. As Peter points out the actual > > > > > load isn't really the issue, it's locating the right spot for it. > > > > > > > > I don't really see why that's a problem, it's just a pointer > > > > addition - base + fdt_address_cells * 4. The problem was in > > > > > > This is harder if #address-cells and #size-cells are different, or if > > > you're parsing ranges and #address-cells is different between parent > > > and child node. > > > > > > > the memory access (yes it's fixed with the patch but you could > > > > add a general libfdt way to do it). > > > > > > Huh.. well I'm getting different impressions of what the problem > > > actually is from what I initially read versus Peter Maydell's > > > comments, so I don't really know what to think. > > > > > > If it's just the load then fdt32_ld() etc. already exist. Or is it > > > really such a hot path that unconditionally handling unaligned > > > accesses isn't tenable? > > > > The specific problem here is that the code as written tries to > > cast a not-aligned-enough pointer to uint64_t* to do the load, > > which is UB. > > Ah... and I'm assuming it's the cast itself which triggers the UB, not > just dereferencing it. Oh it's just the cast itself that is UB? Looks like that's true. Interesting gcc and clang don't flag it, I guess they care about warning on practical breakage first. > Which makes the interface of fdt32_ld() > etc. unusable for their intended purpose. Well.. damn. Now... how do > I fix it without breaking compatibility any more than I have to. Why not just make them take a void * ptr? I don't think that would break anything but existing code that was forced to add the cast may be at risk of the UB. Could also add fdt64_unaligned_t types with aligned(1) attribute for new code. Those can just be dereferenced directly and the caller the compiler can choose the appropriate access supported by the host. (Actually gcc can recognise that load unaligned and byteswap pattern and do the same anyway, but clang at least can not yet). Thanks, Nick
On Mon, Jul 08, 2024 at 05:49:32PM +1000, Nicholas Piggin wrote: > On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote: > > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote: > > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > > > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > > > > > --- > > > > > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > > > > > --- a/hw/ppc/vof.c > > > > > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > > > > > if (sc == 2) { > > > > > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > > > > > } else { > > > > > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > > > > > provide one. > > > > > > > > > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > > > > > part, which we already have QEMU utility functions for (and which > > > > > > > > are this patch uses). > > > > > > > > > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > > > > > that is an array of (address, size) tuples where address and size > > > > > > > > can independently be either 32 or 64 bits, and it wants the > > > > > > > > size value of tuple 0. So the missing functionality is something at > > > > > > > > a higher level than fdt32_ld() which would let you say "give me > > > > > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > > > > > is an awkward kind of API to write in C.) > > > > > > > > > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > > > > > > > > > uint64_t value_array[2]; > > > > > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > > > > > ac, sc); > > > > > > > > /* > > > > > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > > > > > * cover the whole property, similar to the current assert on > > > > > > > > * proplen. > > > > > > > > */ > > > > > > > > mem0_end = value_array[0]; > > > > > > > > > > > > > > Since 4/8 byte cells are most common and size is probably > > > > > > > normally known, what about something simpler to start with? > > > > > > > > > > > > Hrm, I don't think this helps much. As Peter points out the actual > > > > > > load isn't really the issue, it's locating the right spot for it. > > > > > > > > > > I don't really see why that's a problem, it's just a pointer > > > > > addition - base + fdt_address_cells * 4. The problem was in > > > > > > > > This is harder if #address-cells and #size-cells are different, or if > > > > you're parsing ranges and #address-cells is different between parent > > > > and child node. > > > > > > > > > the memory access (yes it's fixed with the patch but you could > > > > > add a general libfdt way to do it). > > > > > > > > Huh.. well I'm getting different impressions of what the problem > > > > actually is from what I initially read versus Peter Maydell's > > > > comments, so I don't really know what to think. > > > > > > > > If it's just the load then fdt32_ld() etc. already exist. Or is it > > > > really such a hot path that unconditionally handling unaligned > > > > accesses isn't tenable? > > > > > > The specific problem here is that the code as written tries to > > > cast a not-aligned-enough pointer to uint64_t* to do the load, > > > which is UB. > > > > Ah... and I'm assuming it's the cast itself which triggers the UB, not > > just dereferencing it. > > Oh it's just the cast itself that is UB? Looks like that's true. > Interesting gcc and clang don't flag it, I guess they care about > warning on practical breakage first. > > > Which makes the interface of fdt32_ld() > > etc. unusable for their intended purpose. Well.. damn. Now... how do > > I fix it without breaking compatibility any more than I have to. > > Why not just make them take a void * ptr? I don't think that would > break anything but existing code that was forced to add the cast > may be at risk of the UB. That works if recompiling, but I believe it's an ABI change. Typically library users would get these functions as inlines, but I believe it's not impossible something could have taken function pointers to them and get versions from the shared library. Or... maybe it's not possible, since these aren't listed our version.lds. Ugh. Given the current state is something basically impossible to use without UB, wearing an ABI breakage might be the lesser evil. > Could also add fdt64_unaligned_t types with aligned(1) attribute > for new code. Those can just be dereferenced directly and the > caller the compiler can choose the appropriate access supported by > the host. (Actually gcc can recognise that load unaligned and > byteswap pattern and do the same anyway, but clang at least can > not yet). That might be a nice idea, but doesn't solve the immediate versioning problem. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Mon, 8 Jul 2024 at 08:49, Nicholas Piggin <npiggin@gmail.com> wrote: > > On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote: > > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote: > > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > Huh.. well I'm getting different impressions of what the problem > > > > actually is from what I initially read versus Peter Maydell's > > > > comments, so I don't really know what to think. > > > > > > > > If it's just the load then fdt32_ld() etc. already exist. Or is it > > > > really such a hot path that unconditionally handling unaligned > > > > accesses isn't tenable? > > > > > > The specific problem here is that the code as written tries to > > > cast a not-aligned-enough pointer to uint64_t* to do the load, > > > which is UB. > > > > Ah... and I'm assuming it's the cast itself which triggers the UB, not > > just dereferencing it. > > Oh it's just the cast itself that is UB? Looks like that's true. > Interesting gcc and clang don't flag it, I guess they care about > warning on practical breakage first. Er, I was speaking a bit vaguely there, don't take my word for it without going and looking at the text of the C standard. What I *meant* was that the practical problem here is that we really do dereference a pointer for a 64-bit load when the pointer isn't necessarily 64-bit-aligned. As it happens, C99 says that it is the cast that is UB: section 6.3.2.3 para 7 says: "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer." Presumably this is envisaging the possibility of a pointer cast being a destructive operation somehow, such that e.g. a uint64_t* can only represent 64-bit-aligned values. But I bet QEMU does a lot of casting pointers around that might fall foul of this rule, so I'm not particularly worried about trying to clean up that kind of thing (until/unless analysers start warning about it, in which case we have a specific set of things to clean up). What I care about from the point of view of this patch is that we fix the actually-broken-on-some-real-hardware problem of doing the load as a misaligned access. My vote would be for "take Akihiko's patch as-is, rather than gating fixing the bug on deciding on an improvement/change to the fdt API or our wrappers of it". thanks -- PMM
On Mon, Jul 08, 2024 at 04:59:30PM +0100, Peter Maydell wrote: > On Mon, 8 Jul 2024 at 08:49, Nicholas Piggin <npiggin@gmail.com> wrote: > > > > On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote: > > > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote: > > > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > Huh.. well I'm getting different impressions of what the problem > > > > > actually is from what I initially read versus Peter Maydell's > > > > > comments, so I don't really know what to think. > > > > > > > > > > If it's just the load then fdt32_ld() etc. already exist. Or is it > > > > > really such a hot path that unconditionally handling unaligned > > > > > accesses isn't tenable? > > > > > > > > The specific problem here is that the code as written tries to > > > > cast a not-aligned-enough pointer to uint64_t* to do the load, > > > > which is UB. > > > > > > Ah... and I'm assuming it's the cast itself which triggers the UB, not > > > just dereferencing it. > > > > Oh it's just the cast itself that is UB? Looks like that's true. > > Interesting gcc and clang don't flag it, I guess they care about > > warning on practical breakage first. > > Er, I was speaking a bit vaguely there, don't take my word for > it without going and looking at the text of the C standard. Sure. > What I *meant* was that the practical problem here is that we > really do dereference a pointer for a 64-bit load when the > pointer isn't necessarily 64-bit-aligned. From the qemu point of view, yes. And theoretically, the fix is easy, since libfdt provides fdt32_ld() etc. for exactly this use case. But.. > As it happens, C99 says that it is the cast that is UB: > section 6.3.2.3 para 7 says: > "A pointer to an object or incomplete type may be converted to > a pointer to a different object or incomplete type. If the > resulting pointer is not correctly aligned for the pointed-to > type, the behavior is undefined. Otherwise, when converted back > again, the result shall compare equal to the original pointer." .. this makes fdt32_ld() etc. unusable by design. > Presumably this is envisaging the possibility of a pointer cast > being a destructive operation somehow, such that e.g. a uint64_t* > can only represent 64-bit-aligned values. But I bet QEMU does > a lot of casting pointers around that might fall foul of this > rule, so I'm not particularly worried about trying to clean up > that kind of thing (until/unless analysers start warning about > it, in which case we have a specific set of things to clean up). Fair enough from the qemu point of view. However, this unusable by design interface was written by me as part of a library I maintain, so it certainly worries *me*. > What I care about from the point of view of this patch > is that we fix the actually-broken-on-some-real-hardware problem > of doing the load as a misaligned access. My vote would be for > "take Akihiko's patch as-is, rather than gating fixing the bug > on deciding on an improvement/change to the fdt API or our > wrappers of it". > > thanks > -- PMM > -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri Jul 5, 2024 at 3:12 PM AEST, David Gibson wrote: > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > > > > > > --- > > > > > > > > hw/ppc/vof.c | 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > > > > > > --- a/hw/ppc/vof.c > > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > > > > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > > > > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > > > > > > if (sc == 2) { > > > > > > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > > > > > > } else { > > > > > > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > > > > > > } > > > > > > > > > > > > > > I did wonder if there was a better way to do what this is doing, > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to > > > > > > > provide one. > > > > > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > > > > > > not an automatic aligned-or-unaligned helper. Maybe we should add that? > > > > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory" > > > > > part, which we already have QEMU utility functions for (and which > > > > > are this patch uses). > > > > > > > > > > This particular bit of code is dealing with an fdt property ("memory") > > > > > that is an array of (address, size) tuples where address and size > > > > > can independently be either 32 or 64 bits, and it wants the > > > > > size value of tuple 0. So the missing functionality is something at > > > > > a higher level than fdt32_ld() which would let you say "give me > > > > > tuple N field X" with some way to specify the tuple layout. (Which > > > > > is an awkward kind of API to write in C.) > > > > > > > > > > Slightly less general, but for this case we could perhaps have > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): > > > > > > > > > > uint64_t value_array[2]; > > > > > qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, > > > > > ac, sc); > > > > > /* > > > > > * fills in value_array[0] with address, value_array[1] with size, > > > > > * probably barfs if the varargs-list of cell-sizes doesn't > > > > > * cover the whole property, similar to the current assert on > > > > > * proplen. > > > > > */ > > > > > mem0_end = value_array[0]; > > > > > > > > Since 4/8 byte cells are most common and size is probably > > > > normally known, what about something simpler to start with? > > > > > > Hrm, I don't think this helps much. As Peter points out the actual > > > load isn't really the issue, it's locating the right spot for it. > > > > I don't really see why that's a problem, it's just a pointer > > addition - base + fdt_address_cells * 4. The problem was in > > This is harder if #address-cells and #size-cells are different, or if > you're parsing ranges and #address-cells is different between parent > and child node. > > > the memory access (yes it's fixed with the patch but you could > > add a general libfdt way to do it). > > Huh.. well I'm getting different impressions of what the problem > actually is from what I initially read versus Peter Maydell's > comments, so I don't really know what to think. If I'm not mistaken, the sanitizer caught an unaligned 64-bit load which is the bug. The tuple address calculation itself I think is not buggy. I suppose Peter was thinking of an accessor that takes care of addressing and alignment. I don't think we're at the point it warrants it here, but could be convinced (maybe a bunch of other code would use it). I think the API is a little dangerous for overflows though, hard to static check. sscanf() style could be checked by the compiler but seems overkill to implement. > If it's just the load then fdt32_ld() etc. already exist. Or is it > really such a hot path that unconditionally handling unaligned > accesses isn't tenable? Yeah that's true, hardly any point to adding the faster variant. It could just be fixed like this then? The original patch is a fix too, but I do prefer using the same style for both, and I think using the fdt accessor is nicer to read. Thanks, Nick --- diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c index e3b430a81f..a666a133d7 100644 --- a/hw/ppc/vof.c +++ b/hw/ppc/vof.c @@ -646,9 +646,9 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); if (sc == 2) { - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); + mem0_end = fdt64_ld((fdt64_t *)(mem0_reg + sizeof(uint32_t) * ac)); } else { - mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); + mem0_end = fdt32_ld((fdt32_t *)(mem0_reg + sizeof(uint32_t) * ac)); } g_array_sort(claimed, of_claimed_compare_func);
On 2024/07/05 16:50, Nicholas Piggin wrote: > On Fri Jul 5, 2024 at 3:12 PM AEST, David Gibson wrote: >> On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote: >>> On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote: >>>> On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote: >>>>> On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote: >>>>>> On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote: >>>>>>> >>>>>>> On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: >>>>>>>> On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: >>>>>>>>> >>>>>>>>> FDT properties are aligned by 4 bytes, not 8 bytes. >>>>>>>>> >>>>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> >>>>>>>>> --- >>>>>>>>> hw/ppc/vof.c | 2 +- >>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>>>>>>>> index e3b430a81f4f..b5b6514d79fc 100644 >>>>>>>>> --- a/hw/ppc/vof.c >>>>>>>>> +++ b/hw/ppc/vof.c >>>>>>>>> @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) >>>>>>>>> mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); >>>>>>>>> g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); >>>>>>>>> if (sc == 2) { >>>>>>>>> - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); >>>>>>>>> + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); >>>>>>>>> } else { >>>>>>>>> mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); >>>>>>>>> } >>>>>>>> >>>>>>>> I did wonder if there was a better way to do what this is doing, >>>>>>>> but neither we (in system/device_tree.c) nor libfdt seem to >>>>>>>> provide one. >>>>>>> >>>>>>> libfdt does provide unaligned access helpers (fdt32_ld() etc.), but >>>>>>> not an automatic aligned-or-unaligned helper. Maybe we should add that? >>>>>> >>>>>> fdt32_ld() and friends only do the "load from this bit of memory" >>>>>> part, which we already have QEMU utility functions for (and which >>>>>> are this patch uses). >>>>>> >>>>>> This particular bit of code is dealing with an fdt property ("memory") >>>>>> that is an array of (address, size) tuples where address and size >>>>>> can independently be either 32 or 64 bits, and it wants the >>>>>> size value of tuple 0. So the missing functionality is something at >>>>>> a higher level than fdt32_ld() which would let you say "give me >>>>>> tuple N field X" with some way to specify the tuple layout. (Which >>>>>> is an awkward kind of API to write in C.) >>>>>> >>>>>> Slightly less general, but for this case we could perhaps have >>>>>> something like the getprop equivalent of qemu_fdt_setprop_sized_cells(): >>>>>> >>>>>> uint64_t value_array[2]; >>>>>> qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array, >>>>>> ac, sc); >>>>>> /* >>>>>> * fills in value_array[0] with address, value_array[1] with size, >>>>>> * probably barfs if the varargs-list of cell-sizes doesn't >>>>>> * cover the whole property, similar to the current assert on >>>>>> * proplen. >>>>>> */ >>>>>> mem0_end = value_array[0]; >>>>> >>>>> Since 4/8 byte cells are most common and size is probably >>>>> normally known, what about something simpler to start with? >>>> >>>> Hrm, I don't think this helps much. As Peter points out the actual >>>> load isn't really the issue, it's locating the right spot for it. >>> >>> I don't really see why that's a problem, it's just a pointer >>> addition - base + fdt_address_cells * 4. The problem was in >> >> This is harder if #address-cells and #size-cells are different, or if >> you're parsing ranges and #address-cells is different between parent >> and child node. >> >>> the memory access (yes it's fixed with the patch but you could >>> add a general libfdt way to do it). >> >> Huh.. well I'm getting different impressions of what the problem >> actually is from what I initially read versus Peter Maydell's >> comments, so I don't really know what to think. > > If I'm not mistaken, the sanitizer caught an unaligned 64-bit > load which is the bug. > > The tuple address calculation itself I think is not buggy. I suppose > Peter was thinking of an accessor that takes care of addressing and > alignment. I don't think we're at the point it warrants it here, but > could be convinced (maybe a bunch of other code would use it). > > I think the API is a little dangerous for overflows though, hard to > static check. sscanf() style could be checked by the compiler but > seems overkill to implement. > >> If it's just the load then fdt32_ld() etc. already exist. Or is it >> really such a hot path that unconditionally handling unaligned >> accesses isn't tenable? > > Yeah that's true, hardly any point to adding the faster variant. > > It could just be fixed like this then? The original patch is a > fix too, but I do prefer using the same style for both, and > I think using the fdt accessor is nicer to read. > > Thanks, > Nick > > --- > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > index e3b430a81f..a666a133d7 100644 > --- a/hw/ppc/vof.c > +++ b/hw/ppc/vof.c > @@ -646,9 +646,9 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > if (sc == 2) { > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > + mem0_end = fdt64_ld((fdt64_t *)(mem0_reg + sizeof(uint32_t) * ac)); I don't like the extra cast to fdt64_t. Strictly speaking, casting into uint64_t * is undefined in the standard and the compiler is free to optimize it as an aligned access after the cast. clang and gcc does not perform such an optimization, but it is better to avoid such a construct if possible.* It is unfortunate that libfdt requires it. Nevertheless, I won't object to use fdt64_ld() and fdt32_ld(). That's what the upstream provides anyway. Regards, Akihiko Odaki * By the way, I had a related discussion with sanitizer developers: https://github.com/llvm/llvm-project/issues/83710
On Sat Jun 29, 2024 at 1:16 PM AEST, David Gibson wrote: > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote: > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote: > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes. > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > > --- > > > hw/ppc/vof.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > index e3b430a81f4f..b5b6514d79fc 100644 > > > --- a/hw/ppc/vof.c > > > +++ b/hw/ppc/vof.c > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) > > > mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); > > > g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); > > > if (sc == 2) { > > > - mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > + mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac); > > > } else { > > > mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); > > > } > > > > I did wonder if there was a better way to do what this is doing, > > but neither we (in system/device_tree.c) nor libfdt seem to > > provide one. > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but > not an automatic aligned-or-unaligned helper. Maybe we should add that? Runtime test if the pointer is aligned? What about just fdt_prop32_ld() and fdt_prop64_ld() where you know it's 4 byte aligned. Then just do 2 x 4 byte loads for the 64-bit, I don't think performance would matter so much to try get a single load. Thanks, Nick
© 2016 - 2024 Red Hat, Inc.