[PATCH] hw/net/e1000: Use address_space_stb for Rx descriptor status write on aarch64

Liu Gang posted 1 patch 3 days, 7 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260527091711.3901-1-liugang24219@sangfor.com.cn
Maintainers: Jason Wang <jasowang@redhat.com>
hw/net/e1000.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
[PATCH] hw/net/e1000: Use address_space_stb for Rx descriptor status write on aarch64
Posted by Liu Gang 3 days, 7 hours ago
From: LiuGang <liugang24219@sangfor.com.cn>

In x86 environments, writing the Rx descriptor status byte with a single
pci_dma_write() works correctly. However, on aarch64 with glibc 2.24+,
the memcpy/memmove implementation uses a branchless sequence that copies
the same byte three times when count == 1. This results in three
consecutive STRB operations.
Ref: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/memcpy.S;hb=refs/heads/release/2.24/master#l124

When the guest uses an e1000 NIC and DPDK, this behavior can cause
packet reception to stop or packets to be dropped repeatedly. Under
normal operation, DPDK clears the DD flag after processing a packet and
then updates RDT. But due to the triple STRB from memcpy/memmove, the
DD flag may be cleared and then immediately set again. This causes DPDK
to consume the same packet twice, making RDT be incremented by one extra.
As a result, RDH == RDT, and QEMU mistakenly considers the receive
queue full, stopping packet reception.

The issue can be reproduced on aarch64 using the adapted test program:
https://github.com/cdkey/e1000_poc

Replace pci_dma_write() with address_space_stb() to write the DD status
byte directly, avoiding the problematic memcpy/memmove sequence and
resolving the issue.

Fixes: 034d00d48581("e1000: set RX descriptor status in a separate
operation")
Signed-off-by: Liu Gang <liugang24219@sangfor.com.cn>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
---
 hw/net/e1000.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 202ad40401..f7be035e0f 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -869,6 +869,14 @@ e1000_receiver_overrun(E1000State *s, size_t size)
     set_ics(s, 0, E1000_ICS_RXO);
 }
 
+static inline void
+e1000_dma_write_byte(PCIDevice *d, dma_addr_t addr, uint8_t val)
+{
+    AddressSpace *as = pci_get_address_space(d);
+    dma_barrier(as, DMA_DIRECTION_FROM_DEVICE);
+    address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
+}
+
 static ssize_t
 e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 {
@@ -980,8 +988,8 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
         }
         pci_dma_write(d, base, &desc, sizeof(desc));
         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
-        pci_dma_write(d, base + offsetof(struct e1000_rx_desc, status),
-                      &desc.status, sizeof(desc.status));
+        e1000_dma_write_byte(d, base + offsetof(struct e1000_rx_desc, status),
+                             desc.status);
 
         if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
             s->mac_reg[RDH] = 0;
-- 
2.36.0.windows.1
Re: [PATCH] hw/net/e1000: Use address_space_stb for Rx descriptor status write on aarch64
Posted by Peter Maydell 3 days, 3 hours ago
On Wed, 27 May 2026 at 14:23, Liu Gang <liugang24219@sangfor.com.cn> wrote:
>
> From: LiuGang <liugang24219@sangfor.com.cn>
>
> In x86 environments, writing the Rx descriptor status byte with a single
> pci_dma_write() works correctly. However, on aarch64 with glibc 2.24+,
> the memcpy/memmove implementation uses a branchless sequence that copies
> the same byte three times when count == 1. This results in three
> consecutive STRB operations.
> Ref: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/memcpy.S;hb=refs/heads/release/2.24/master#l124
>
> When the guest uses an e1000 NIC and DPDK, this behavior can cause
> packet reception to stop or packets to be dropped repeatedly. Under
> normal operation, DPDK clears the DD flag after processing a packet and
> then updates RDT. But due to the triple STRB from memcpy/memmove, the
> DD flag may be cleared and then immediately set again. This causes DPDK
> to consume the same packet twice, making RDT be incremented by one extra.
> As a result, RDH == RDT, and QEMU mistakenly considers the receive
> queue full, stopping packet reception.
>
> The issue can be reproduced on aarch64 using the adapted test program:
> https://github.com/cdkey/e1000_poc
>
> Replace pci_dma_write() with address_space_stb() to write the DD status
> byte directly, avoiding the problematic memcpy/memmove sequence and
> resolving the issue.
>
> Fixes: 034d00d48581("e1000: set RX descriptor status in a separate
> operation")
> Signed-off-by: Liu Gang <liugang24219@sangfor.com.cn>
> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
> ---
>  hw/net/e1000.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 202ad40401..f7be035e0f 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -869,6 +869,14 @@ e1000_receiver_overrun(E1000State *s, size_t size)
>      set_ics(s, 0, E1000_ICS_RXO);
>  }
>
> +static inline void
> +e1000_dma_write_byte(PCIDevice *d, dma_addr_t addr, uint8_t val)
> +{
> +    AddressSpace *as = pci_get_address_space(d);
> +    dma_barrier(as, DMA_DIRECTION_FROM_DEVICE);
> +    address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);

I don't think there's any guarantee that we won't write
multiple times to the memory in this case either. This
eventually boils down to calling address_space_stm_internal
which for dispatch to RAM calls stm_p(), which for a byte
write will call stb_p(), which is "*(uint8_t *)ptr = v;"
and there's no guarantee in the C standard about that doing
only a single byte store to memory.

I think this change only gives different behaviour by accident,
not by design.

-- PMM
Re: [PATCH] hw/net/e1000: Use address_space_stb for Rx descriptor status write on aarch64
Posted by Liu Gang 2 days, 5 hours ago
在 2026/5/27 21:45, Peter Maydell 写道:
> On Wed, 27 May 2026 at 14:23, Liu Gang <liugang24219@sangfor.com.cn> wrote:
>>
>> From: LiuGang <liugang24219@sangfor.com.cn>
>>
>> In x86 environments, writing the Rx descriptor status byte with a single
>> pci_dma_write() works correctly. However, on aarch64 with glibc 2.24+,
>> the memcpy/memmove implementation uses a branchless sequence that copies
>> the same byte three times when count == 1. This results in three
>> consecutive STRB operations.
>> Ref: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/memcpy.S;hb=refs/heads/release/2.24/master#l124
>>
>> When the guest uses an e1000 NIC and DPDK, this behavior can cause
>> packet reception to stop or packets to be dropped repeatedly. Under
>> normal operation, DPDK clears the DD flag after processing a packet and
>> then updates RDT. But due to the triple STRB from memcpy/memmove, the
>> DD flag may be cleared and then immediately set again. This causes DPDK
>> to consume the same packet twice, making RDT be incremented by one extra.
>> As a result, RDH == RDT, and QEMU mistakenly considers the receive
>> queue full, stopping packet reception.
>>
>> The issue can be reproduced on aarch64 using the adapted test program:
>> https://github.com/cdkey/e1000_poc
>>
>> Replace pci_dma_write() with address_space_stb() to write the DD status
>> byte directly, avoiding the problematic memcpy/memmove sequence and
>> resolving the issue.
>>
>> Fixes: 034d00d48581("e1000: set RX descriptor status in a separate
>> operation")
>> Signed-off-by: Liu Gang <liugang24219@sangfor.com.cn>
>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
>> ---
>>   hw/net/e1000.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index 202ad40401..f7be035e0f 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -869,6 +869,14 @@ e1000_receiver_overrun(E1000State *s, size_t size)
>>       set_ics(s, 0, E1000_ICS_RXO);
>>   }
>>
>> +static inline void
>> +e1000_dma_write_byte(PCIDevice *d, dma_addr_t addr, uint8_t val)
>> +{
>> +    AddressSpace *as = pci_get_address_space(d);
>> +    dma_barrier(as, DMA_DIRECTION_FROM_DEVICE);
>> +    address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
> 
> I don't think there's any guarantee that we won't write
> multiple times to the memory in this case either. This
> eventually boils down to calling address_space_stm_internal
> which for dispatch to RAM calls stm_p(), which for a byte
> write will call stb_p(), which is "*(uint8_t *)ptr = v;"
> and there's no guarantee in the C standard about that doing
> only a single byte store to memory.
> 
> I think this change only gives different behaviour by accident,
> not by design.
> 
> -- PMM
> 
> 
Thank you for the thoughtful review. You raise a very valid point — the 
C standard indeed does not mandate that a non-atomic assignment like 
*(uint8_t *)ptr = v; will be compiled to a single byte store.

I have prepared a minimal test example on Compiler Explorer to verify 
that the direct byte storage method (`*(uint8_t *)ptr = v;`) does not 
result in multiple storage operations under different architectures and 
compilers: https://godbolt.org/z/q3vfPcjME

This example confirms that, at a typical -O2 optimization level, the 
generated code is a single instruction across different architectures 
and compilers. While this is not an absolute guarantee, it reflects what 
commonly happens in practice with common toolchains.

Looking forward to your guidance.