drivers/usb/host/xhci-ring.c | 3 ++- drivers/usb/host/xhci.c | 14 +++++++++++--- include/linux/usb.h | 1 + 3 files changed, 14 insertions(+), 4 deletions(-)
---
I was checking memory allocation behaviors (via memory profiling[1]),
when I notice a high frequent memory allocation in xhci_urb_enqueue, about
250/s when using a USB webcam. If those alloced buffer could be kept and
reused, lots of memory allocations could be avoid over time.
This patch is just a POC, about 0/s memory allocation in xhci with this
patch, when I use my USB devices, webcam/keyboard/mouse.
A dynamic cached memory would be better: URB keep host controller's
private data, if larger size buffer needed for private data, old buffer
released and a larger buffer alloced.
I did not observe any nagative impact with xhci's 250/s allocations
when using my system, hence no measurement of how useful this changes
can make to user. Just want to collect feedbacks before putting more
effort.
[1] https://lore.kernel.org/all/20240221194052.927623-1-surenb@google.com/
---
xhci keeps allocing new memory when enque a urb for private data,
and enque frequency could be high, about 250/s when using a usb
webcam, about 30/s for high pace USB keyboard/mouse usage.
Using a cache/buffer for those private data could avoid
lots memory allocations.
Signed-off-by: David Wang <00107082@163.com>
---
drivers/usb/host/xhci-ring.c | 3 ++-
drivers/usb/host/xhci.c | 14 +++++++++++---
include/linux/usb.h | 1 +
3 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 423bf3649570..bc350b307758 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -831,7 +831,8 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
usb_amd_quirk_pll_enable();
}
}
- xhci_urb_free_priv(urb_priv);
+ if (urb_priv != (void *)urb->hcpriv_buffer)
+ xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
trace_xhci_urb_giveback(urb);
usb_hcd_giveback_urb(hcd, urb, status);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..85aa5fe526c8 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1539,6 +1539,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
unsigned int *ep_state;
struct urb_priv *urb_priv;
int num_tds;
+ size_t private_size;
ep_index = xhci_get_endpoint_index(&urb->ep->desc);
@@ -1552,7 +1553,13 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
else
num_tds = 1;
- urb_priv = kzalloc(struct_size(urb_priv, td, num_tds), mem_flags);
+ private_size = struct_size(urb_priv, td, num_tds);
+ if (private_size <= sizeof(urb->hcpriv_buffer)) {
+ memset(urb->hcpriv_buffer, 0, sizeof(urb->hcpriv_buffer));
+ urb_priv = (struct urb_priv *)urb->hcpriv_buffer;
+ } else {
+ urb_priv = kzalloc(private_size, mem_flags);
+ }
if (!urb_priv)
return -ENOMEM;
@@ -1626,7 +1633,8 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
if (ret) {
free_priv:
- xhci_urb_free_priv(urb_priv);
+ if (urb_priv != (void *)urb->hcpriv_buffer)
+ xhci_urb_free_priv(urb_priv);
urb->hcpriv = NULL;
}
spin_unlock_irqrestore(&xhci->lock, flags);
@@ -1789,7 +1797,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
return ret;
err_giveback:
- if (urb_priv)
+ if (urb_priv && urb_priv != (void *)urb->hcpriv_buffer)
xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
spin_unlock_irqrestore(&xhci->lock, flags);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..4f82bb69081c 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,7 @@ struct urb {
struct kref kref; /* reference count of the URB */
int unlinked; /* unlink error code */
void *hcpriv; /* private data for host controller */
+ u8 hcpriv_buffer[4096]; /* small buffer if private data can fit */
atomic_t use_count; /* concurrent submissions counter */
atomic_t reject; /* submissions will fail */
--
2.39.2
Hi David On 12.5.2025 18.07, David Wang wrote: > --- > I was checking memory allocation behaviors (via memory profiling[1]), > when I notice a high frequent memory allocation in xhci_urb_enqueue, about > 250/s when using a USB webcam. If those alloced buffer could be kept and > reused, lots of memory allocations could be avoid over time. > > This patch is just a POC, about 0/s memory allocation in xhci with this > patch, when I use my USB devices, webcam/keyboard/mouse. Thanks for looking at this, this is something that obviously needs more attention > > A dynamic cached memory would be better: URB keep host controller's > private data, if larger size buffer needed for private data, old buffer > released and a larger buffer alloced. > > I did not observe any nagative impact with xhci's 250/s allocations > when using my system, hence no measurement of how useful this changes > can make to user. Just want to collect feedbacks before putting more > effort. > I think we can make a xhci internal solution that doesn't affect other hosts or usb core. How about adding a list of struct urb_priv nodes for every usb device, or maybe even per endpoint? i.e. to struct xhci_virt_device or struct xhci_virt_ep. Add size and list_head entries to struct urb_priv. In xhci_urb_enqueue() pick the first urb_priv node from list if it exists and is large enough, otherwise just allocate a new one and set the size. When giving back the URB zero everything in the struct urb_priv except the size, and add it to the list. When the device is freed there shouldn't be any nodes left in the list, but if there are then warn and free them. Isoc transfers are the ones with odd urb_priv sizes. If we have a per device or per endpoint urb_priv list then sizes will match better. Thanks Mathias
At 2025-05-13 17:27:34, "Mathias Nyman" <mathias.nyman@intel.com> wrote: >Hi David > >On 12.5.2025 18.07, David Wang wrote: >> --- >> I was checking memory allocation behaviors (via memory profiling[1]), >> when I notice a high frequent memory allocation in xhci_urb_enqueue, about >> 250/s when using a USB webcam. If those alloced buffer could be kept and >> reused, lots of memory allocations could be avoid over time. >> >> This patch is just a POC, about 0/s memory allocation in xhci with this >> patch, when I use my USB devices, webcam/keyboard/mouse. > >Thanks for looking at this, this is something that obviously needs more >attention > >> >> A dynamic cached memory would be better: URB keep host controller's >> private data, if larger size buffer needed for private data, old buffer >> released and a larger buffer alloced. >> >> I did not observe any nagative impact with xhci's 250/s allocations >> when using my system, hence no measurement of how useful this changes >> can make to user. Just want to collect feedbacks before putting more >> effort. >> > >I think we can make a xhci internal solution that doesn't affect other hosts >or usb core. Yes, my latest patches separates xhci changes from usbcore . > >How about adding a list of struct urb_priv nodes for every usb device, or maybe >even per endpoint? i.e. to struct xhci_virt_device or struct xhci_virt_ep. > >Add size and list_head entries to struct urb_priv. > device level mempool could be very complicated, I think a single memory pool hold by URB would be much simpler. (Could you help take a look at this patch: https://lore.kernel.org/lkml/20250513055447.5696-1-00107082@163.com/) >In xhci_urb_enqueue() pick the first urb_priv node from list if it exists and is >large enough, otherwise just allocate a new one and set the size. > >When giving back the URB zero everything in the struct urb_priv except the size, >and add it to the list. > >When the device is freed there shouldn't be any nodes left in the list, but if >there are then warn and free them. > >Isoc transfers are the ones with odd urb_priv sizes. If we have a per device or >per endpoint urb_priv list then sizes will match better. If the mempool is in URB, then only largest size matters, since no two HC can own the URB at the same time. > >Thanks >Mathias > Thanks for the comments and suggestions. David
On Mon, May 12, 2025 at 11:07:24PM +0800, David Wang wrote: > --- > I was checking memory allocation behaviors (via memory profiling[1]), > when I notice a high frequent memory allocation in xhci_urb_enqueue, about > 250/s when using a USB webcam. If those alloced buffer could be kept and > reused, lots of memory allocations could be avoid over time. > > This patch is just a POC, about 0/s memory allocation in xhci with this > patch, when I use my USB devices, webcam/keyboard/mouse. > > A dynamic cached memory would be better: URB keep host controller's > private data, if larger size buffer needed for private data, old buffer > released and a larger buffer alloced. This sounds like a better approach; you should try it. Allocations and dellocations from a private memory pool can be really quick. And it wouldn't waste space on buffers for URBs that don't need them (for example, URBs used with other host controller drivers). Alan Stern
At 2025-05-12 23:34:41, "Alan Stern" <stern@rowland.harvard.edu> wrote: >On Mon, May 12, 2025 at 11:07:24PM +0800, David Wang wrote: >> --- >> I was checking memory allocation behaviors (via memory profiling[1]), >> when I notice a high frequent memory allocation in xhci_urb_enqueue, about >> 250/s when using a USB webcam. If those alloced buffer could be kept and >> reused, lots of memory allocations could be avoid over time. >> >> This patch is just a POC, about 0/s memory allocation in xhci with this >> patch, when I use my USB devices, webcam/keyboard/mouse. >> >> A dynamic cached memory would be better: URB keep host controller's >> private data, if larger size buffer needed for private data, old buffer >> released and a larger buffer alloced. > >This sounds like a better approach; you should try it. Allocations and >dellocations from a private memory pool can be really quick. And it >wouldn't waste space on buffers for URBs that don't need them (for >example, URBs used with other host controller drivers). > >Alan Stern Thanks for the quick feedback~ I will work on it and update later~ David
---
Changes:
1. Use caller's mem_flags if a larger memory is needed.
Thanks to Oliver Neukum <oneukum@suse.com>'s review.
---
URB objects have long lifecycle, an urb can be reused between
enqueue-dequeue loops; The private data needed by some host controller
has very short lifecycle, the memory is alloced when enqueue, and
released when dequeue. For example, on a system with xhci, several
minutes of usage of webcam/keyboard/mouse have memory alloc counts:
drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
Memory allocation frequency for host-controller private data can reach
~1k/s and above.
High frequent allocations for host-controller private data can be
avoided if urb take over the ownership of memory, the memory then shares
the longer lifecycle with urb objects.
Add a mempool to urb for hcpriv usage, the mempool only holds one block
of memory and grows when larger size is requested.
Signed-off-by: David Wang <00107082@163.com>
---
drivers/usb/core/urb.c | 23 +++++++++++++++++++++++
include/linux/usb.h | 3 +++
2 files changed, 26 insertions(+)
diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 5e52a35486af..51bf25125aeb 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
if (urb->transfer_flags & URB_FREE_BUFFER)
kfree(urb->transfer_buffer);
+ kfree(urb->hcpriv_mempool);
kfree(urb);
}
@@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
EXPORT_SYMBOL_GPL(usb_anchor_empty);
+/**
+ * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
+ * @urb: pointer to URB being used
+ * @size: memory size requested by current host controller
+ * @mem_flags: the type of memory to allocate
+ *
+ * Return: NULL if out of memory, otherwise memory are zeroed
+ */
+void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
+{
+ if (urb->hcpriv_mempool_size < size) {
+ kfree(urb->hcpriv_mempool);
+ urb->hcpriv_mempool_size = size;
+ urb->hcpriv_mempool = kmalloc(size, mem_flags);
+ }
+ if (urb->hcpriv_mempool)
+ memset(urb->hcpriv_mempool, 0, size);
+ else
+ urb->hcpriv_mempool_size = 0;
+ return urb->hcpriv_mempool;
+}
+EXPORT_SYMBOL_GPL(urb_hcpriv_mempool_zalloc);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..4400e41271bc 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,8 @@ struct urb {
struct kref kref; /* reference count of the URB */
int unlinked; /* unlink error code */
void *hcpriv; /* private data for host controller */
+ void *hcpriv_mempool; /* a single slot of cache for HCD's private data */
+ size_t hcpriv_mempool_size; /* current size of the memory pool */
atomic_t use_count; /* concurrent submissions counter */
atomic_t reject; /* submissions will fail */
@@ -1790,6 +1792,7 @@ extern int usb_wait_anchor_empty_timeout(struct usb_anchor *anchor,
extern struct urb *usb_get_from_anchor(struct usb_anchor *anchor);
extern void usb_scuttle_anchored_urbs(struct usb_anchor *anchor);
extern int usb_anchor_empty(struct usb_anchor *anchor);
+extern void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags);
#define usb_unblock_urb usb_unpoison_urb
--
2.39.2
On 13.05.25 13:38, David Wang wrote:
> ---
Hi,
still an issue after a second review.
I should have noticed earlier.
> --- a/drivers/usb/core/urb.c
> +++ b/drivers/usb/core/urb.c
> @@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
>
> if (urb->transfer_flags & URB_FREE_BUFFER)
> kfree(urb->transfer_buffer);
> + kfree(urb->hcpriv_mempool);
What if somebody uses usb_init_urb()?
> kfree(urb);
> }
> @@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
>
> EXPORT_SYMBOL_GPL(usb_anchor_empty);
>
> +/**
> + * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
> + * @urb: pointer to URB being used
> + * @size: memory size requested by current host controller
> + * @mem_flags: the type of memory to allocate
> + *
> + * Return: NULL if out of memory, otherwise memory are zeroed
> + */
> +void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
> +{
> + if (urb->hcpriv_mempool_size < size) {
> + kfree(urb->hcpriv_mempool);
> + urb->hcpriv_mempool_size = size;
> + urb->hcpriv_mempool = kmalloc(size, mem_flags);
That could use kzalloc().
Regards
Oliver
At 2025-05-14 19:23:02, "Oliver Neukum" <oneukum@suse.com> wrote:
>
>
>On 13.05.25 13:38, David Wang wrote:
>> ---
>
>Hi,
>
>still an issue after a second review.
>I should have noticed earlier.
>
>> --- a/drivers/usb/core/urb.c
>> +++ b/drivers/usb/core/urb.c
>> @@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
>>
>> if (urb->transfer_flags & URB_FREE_BUFFER)
>> kfree(urb->transfer_buffer);
>> + kfree(urb->hcpriv_mempool);
>
>What if somebody uses usb_init_urb()?
I am not quite sure about the concern here, do you mean somebody create a urb,
and then usb_init_urb() here, and never use urb_destroy to release it?
That would cause memory leak if urb_destroy is not called......But is this really possible?.
>
>> kfree(urb);
>> }
>> @@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
>>
>> EXPORT_SYMBOL_GPL(usb_anchor_empty);
>>
>> +/**
>> + * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
>> + * @urb: pointer to URB being used
>> + * @size: memory size requested by current host controller
>> + * @mem_flags: the type of memory to allocate
>> + *
>> + * Return: NULL if out of memory, otherwise memory are zeroed
>> + */
>> +void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
>> +{
>> + if (urb->hcpriv_mempool_size < size) {
>> + kfree(urb->hcpriv_mempool);
>> + urb->hcpriv_mempool_size = size;
>> + urb->hcpriv_mempool = kmalloc(size, mem_flags);
>
>That could use kzalloc().
The memory would be set to 0 before returning to user, via memset, no matter whether the memory is
newly alloced or just reused. I think using kmalloc is ok here.
Thanks
David
>
> Regards
> Oliver
On 14.05.25 13:51, David Wang wrote: > I am not quite sure about the concern here, do you mean somebody create a urb, > and then usb_init_urb() here, and never use urb_destroy to release it? Yes. > > That would cause memory leak if urb_destroy is not called......But is this really possible?. I think a few drivers under drivers/media do so. Regards Oliver
At 2025-05-14 20:03:02, "Oliver Neukum" <oneukum@suse.com> wrote:
>
>
>On 14.05.25 13:51, David Wang wrote:
>
>> I am not quite sure about the concern here, do you mean somebody create a urb,
>> and then usb_init_urb() here, and never use urb_destroy to release it?
>
>Yes.
>
>>
>> That would cause memory leak if urb_destroy is not called......But is this really possible?.
>
>I think a few drivers under drivers/media do so.
I search through codes, some drivers will use usb_free_urb which would invoke urb_destroy;
But there are indeed several drivers use urb as a struct member, which is not directly kmalloced and
should not be kfreed via usb_free_urb..... It would involve lots of changes.....
On the bright side, when I made the code check, I notice something off:
in drivers/net/wireless/realtek/rtl8xxxu/core.c
5168 usb_free_urb(&tx_urb->urb);
5868 usb_free_urb(&rx_urb->urb);
5890 usb_free_urb(&rx_urb->urb);
5938 usb_free_urb(&rx_urb->urb);
usb_free_urb would kfree the urb pointer, which would be wrong here since rx_urb and tx_urb defined
in drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
1944 struct rtl8xxxu_rx_urb {
1945 struct urb urb;
1946 struct ieee80211_hw *hw;
1947 struct list_head list;
1948 };
1949
1950 struct rtl8xxxu_tx_urb {
1951 struct urb urb;
1952 struct ieee80211_hw *hw;
1953 struct list_head list;
1954 };
Hi, Jes
Would you take a look? I feel usb_free_urb needs a pointer which is allokedd directly, but I would be wrong.....
Thanks
David
>
> Regards
> Oliver
At 2025-05-14 20:03:02, "Oliver Neukum" <oneukum@suse.com> wrote: > > >On 14.05.25 13:51, David Wang wrote: > >> I am not quite sure about the concern here, do you mean somebody create a urb, >> and then usb_init_urb() here, and never use urb_destroy to release it? > >Yes. > >> >> That would cause memory leak if urb_destroy is not called......But is this really possible?. > >I think a few drivers under drivers/media do so. That would cause real problem here.... I will keep this in mind... (It is really a bad pattern to have only init function, it should be paired with corresponding "release" to cancel side-effects......) Thanks~ David > > Regards > Oliver
Hi, Update memory footprints after hours of USB devices usage on my system: (I have webcam/mic/keyboard/mouse/harddisk connected via USB, a full picture of memory footprints is attached below) +----------------------+----------------+-------------------------------------------+-----------------------+ | active memory(bytes) | active objects | alloc location | total objects created | +----------------------+----------------+-------------------------------------------+-----------------------+ | 22912 | 24 | core/urb.c:1054:urb_hcpriv_mempool_zalloc | 10523 | | 11776 | 31 | core/urb.c:76:usb_alloc_urb | 11027 | +----------------------+----------------+-------------------------------------------+-----------------------+ The count for active URB objects remain at low level, its peak is about 12KB when I copied 10G file to my harddisk. The memory pool in this patch takes about 22KB, its peak is 23KB. The patch meant to reuse memory via a mempool, the memory kept in pool is indeed the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.) How balance the tradeoff is depends on how well the mempool is managed. This patch takes a easy approach: put faith in URB objects management and put a single slot of mempool in URB on demands. And the changes, by counting lines in this patch, are very simple. Base on the profiling, the number of active URB objects are kept at a very low scale, only several could have a very long lifecycle. I think URB is a good candidate for caching those memory needed for private data. But I could be very wrong, due simply to the lack of knowledge. And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk would yield high rate of memory allocation for priviate data in xhci_urb_enqueue: +----------------------+----------------+-----------------------------------+-----------------------+ | active memory(bytes) | active objects | alloc location | total objects created | +----------------------+----------------+-----------------------------------+-----------------------+ | 22784 | 23 | host/xhci.c:1555:xhci_urb_enqueue | 894281 << grow|ing very quick | 10880 | 31 | core/urb.c:75:usb_alloc_urb | 4028 | +----------------------+----------------+-----------------------------------+-----------------------+ I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue when I was copying 10G file, and had my webcam opened at the same time. And again, to be honest, I did not observe any observable performance improvement from an enduser's point of view with this patch. The only significant improvement is memory footprint _numbers_. I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of "the less allocation the better". Full drivers/usb memory footprint on my system with this patch: (The data was collected via memory profiling: https://docs.kernel.org/mm/allocation-profiling.html) +----------------------+----------------+------------------------------------------------------+-----------------------+ | active memory(bytes) | active objects | alloc location | total objects created | +----------------------+----------------+------------------------------------------------------+-----------------------+ | 40960 | 5 | host/xhci-mem.c:951:xhci_alloc_virt_device | 10 | | 26624 | 26 | core/port.c:742:usb_hub_create_port_device | 26 | | 24576 | 2 | host/xhci-mem.c:2170:xhci_setup_port_arrays | 3 | | 22912 | 24 | core/urb.c:1054:urb_hcpriv_mempool_zalloc | 10523 | | 21504 | 21 | core/endpoint.c:157:usb_create_ep_devs | 31 | | 18432 | 9 | core/usb.c:650:usb_alloc_dev | 10 | | 16384 | 4 | core/hcd.c:2553:__usb_create_hcd | 4 | | 13312 | 13 | core/message.c:2037:usb_set_configuration | 14 | | 11776 | 31 | core/urb.c:76:usb_alloc_urb | 11027 | | 9216 | 9 | core/config.c:930:usb_get_configuration | 10 | | 5120 | 5 | core/hub.c:1938:hub_probe | 5 | | 5120 | 2 | host/xhci-mem.c:2156:xhci_setup_port_arrays | 3 | | 2560 | 5 | host/xhci-debugfs.c:578:xhci_debugfs_create_slot | 10 | | 2416 | 16 | host/xhci-mem.c:49:xhci_segment_alloc | 44 | | 2176 | 17 | host/xhci-mem.c:377:xhci_ring_alloc | 38 | | 2112 | 22 | core/port.c:746:usb_hub_create_port_device | 26 | | 2048 | 32 | host/xhci-mem.c:38:xhci_segment_alloc | 73 | | 1728 | 18 | host/xhci-debugfs.c:90:xhci_debugfs_alloc_regset | 26 | | 1632 | 17 | core/config.c:619:usb_parse_interface | 18 | | 1504 | 9 | core/config.c:967:usb_get_configuration | 10 | | 1312 | 13 | core/config.c:820:usb_parse_configuration | 14 | | 1024 | 1 | host/xhci-mem.c:812:xhci_alloc_tt_info | 2 | | 704 | 24 | core/message.c:1032:usb_cache_string | 27 | | 512 | 8 | host/xhci-debugfs.c:438:xhci_debugfs_create_endpoint | 22 | | 320 | 10 | host/xhci-mem.c:461:xhci_alloc_container_ctx | 30 | | 272 | 2 | host/xhci-mem.c:1635:scratchpad_alloc | 3 | | 224 | 5 | core/hub.c:1502:hub_configure | 5 | | 192 | 4 | host/xhci-mem.c:2121:xhci_create_rhub_port_array | 6 | | 192 | 2 | host/xhci-mem.c:2263:xhci_alloc_interrupter | 3 | | 160 | 2 | host/xhci-mem.c:2198:xhci_setup_port_arrays | 3 | | 128 | 2 | host/xhci-mem.c:2506:xhci_mem_init | 3 | | 128 | 2 | core/config.c:1063:usb_get_bos_descriptor | 3 | | 128 | 2 | core/config.c:1058:usb_get_bos_descriptor | 3 | | 80 | 5 | core/hub.c:1454:hub_configure | 5 | | 72 | 9 | core/config.c:935:usb_get_configuration | 10 | | 64 | 2 | host/xhci-mem.c:1624:scratchpad_alloc | 3 | | 64 | 2 | core/hcd.c:2565:__usb_create_hcd | 2 | | 64 | 2 | core/hcd.c:2557:__usb_create_hcd | 2 | | 40 | 5 | host/xhci-mem.c:2039:xhci_add_in_port | 6 | | 40 | 5 | core/hub.c:1447:hub_configure | 5 | | 40 | 5 | core/hub.c:1441:hub_configure | 5 | | 0 | 0 | core/message.c:529:usb_sg_init | 10137 | | 0 | 0 | core/message.c:144:usb_control_msg | 793 | | 0 | 0 | core/hcd.c:491:rh_call_control | 449 | | 0 | 0 | host/xhci-mem.c:1705:xhci_alloc_command | 234 | | 0 | 0 | host/xhci-mem.c:1711:xhci_alloc_command | 97 | | 0 | 0 | core/message.c:980:usb_string | 31 | | 0 | 0 | core/message.c:1028:usb_cache_string | 27 | | 0 | 0 | core/message.c:1145:usb_get_status | 19 | | 0 | 0 | core/message.c:1061:usb_get_device_descriptor | 14 | | 0 | 0 | core/message.c:2031:usb_set_configuration | 10 | | 0 | 0 | core/hub.c:4883:hub_port_init | 10 | | 0 | 0 | core/config.c:939:usb_get_configuration | 10 | | 0 | 0 | core/hub.c:5311:descriptors_changed | 4 | | 0 | 0 | core/config.c:1037:usb_get_bos_descriptor | 3 | | 0 | 0 | storage/usb.c:540:associate_dev | 1 | +----------------------+----------------+------------------------------------------------------+-----------------------+ FYI David
On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote: > Hi, > > Update memory footprints after hours of USB devices usage > on my system: > (I have webcam/mic/keyboard/mouse/harddisk connected via USB, > a full picture of memory footprints is attached below) > +----------------------+----------------+-------------------------------------------+-----------------------+ > | active memory(bytes) | active objects | alloc location | total objects created | > +----------------------+----------------+-------------------------------------------+-----------------------+ > | 22912 | 24 | core/urb.c:1054:urb_hcpriv_mempool_zalloc | 10523 | > | 11776 | 31 | core/urb.c:76:usb_alloc_urb | 11027 | > +----------------------+----------------+-------------------------------------------+-----------------------+ > > The count for active URB objects remain at low level, > its peak is about 12KB when I copied 10G file to my harddisk. > The memory pool in this patch takes about 22KB, its peak is 23KB. > The patch meant to reuse memory via a mempool, the memory kept in pool is indeed > the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.) > How balance the tradeoff is depends on how well the mempool is managed. > This patch takes a easy approach: put faith in URB objects management and put > a single slot of mempool in URB on demands. And the changes, by counting lines > in this patch, are very simple. > Base on the profiling, the number of active URB objects are kept at a very low scale, > only several could have a very long lifecycle. > I think URB is a good candidate for caching those memory needed for private data. > But I could be very wrong, due simply to the lack of knowledge. > > And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk > would yield high rate of memory allocation for priviate data in xhci_urb_enqueue: > +----------------------+----------------+-----------------------------------+-----------------------+ > | active memory(bytes) | active objects | alloc location | total objects created | > +----------------------+----------------+-----------------------------------+-----------------------+ > | 22784 | 23 | host/xhci.c:1555:xhci_urb_enqueue | 894281 << grow|ing very quick > | 10880 | 31 | core/urb.c:75:usb_alloc_urb | 4028 | > +----------------------+----------------+-----------------------------------+-----------------------+ > I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue > when I was copying 10G file, and had my webcam opened at the same time. > > And again, to be honest, I did not observe any observable performance improvement from > an enduser's point of view with this patch. The only significant improvement is memory footprint > _numbers_. > I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of > "the less allocation the better". No, this isn't necessarily true at all. Allocations are fast, and if we free/allocate things quickly, it's even faster. USB is limited by the hardware throughput, which is _very_ slow compared to memory accesses of the allocator. So unless you can show that we are using less CPU time, or something else "real" that is measurable in a real way in userspace, that would justify the extra complexity, it's going to be hard to get me to agree that this is something that needs to be addressed at all. Also, I'm totally confused as to what the "latest" version of this patchset is... thanks, greg k-h
On 14.05.25 09:29, Greg KH wrote: > No, this isn't necessarily true at all. Allocations are fast, and if we > free/allocate things quickly, it's even faster. USB is limited by the > hardware throughput, which is _very_ slow compared to memory accesses of > the allocator. If and only if we do not trigger disk IO. If you really want to give this patch a good performance testing you'd have to do it under memory pressure. Regards Oliver
At 2025-05-14 17:34:21, "Oliver Neukum" <oneukum@suse.com> wrote: >On 14.05.25 09:29, Greg KH wrote: > >> No, this isn't necessarily true at all. Allocations are fast, and if we >> free/allocate things quickly, it's even faster. USB is limited by the >> hardware throughput, which is _very_ slow compared to memory accesses of >> the allocator. > >If and only if we do not trigger disk IO. If you really want to give this patch >a good performance testing you'd have to do it under memory pressure. > > Regards > Oliver Hi, I made some test: Using FPS for webcam and bitrate for audio mic for measurement. When system is under no memory pressure, no significant difference could be observed w/o this patch. When system is under heavy memory pressure, bitrate would drop from ~760.3kbits/s to ~524.3kbits/s, but this patch dose not make any significant difference, bitrate drops are almost the same w/o this. When under heavy memory pressure, my whole system gets slow.... But I think, in between no memory pressure and heavy memory pressure, there would be a point where an extra 1k/s would kick start a chain-of-effect landing a very bad performance, it is just very hard to pinpoint. Using my webcam would have ~250/s memory allocation rate, and my mic ~1k/s. I am imaging a system with several usb webcam/mic connected. There would be x*1k/s allocation if those devices are used at the same time. (Not sure whether all allctation could be avoided under heavy usage of usb devices, but I think good part of the allocations can be reused.) Still think this change benefits even without a solid evidence yet. (I have send out another version addressing Oliver's comments about urb managed by drivers) Thanks David
At 2025-05-14 15:29:42, "Greg KH" <gregkh@linuxfoundation.org> wrote: >On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote: >> Hi, >> >> Update memory footprints after hours of USB devices usage >> on my system: >> (I have webcam/mic/keyboard/mouse/harddisk connected via USB, >> a full picture of memory footprints is attached below) >> +----------------------+----------------+-------------------------------------------+-----------------------+ >> | active memory(bytes) | active objects | alloc location | total objects created | >> +----------------------+----------------+-------------------------------------------+-----------------------+ >> | 22912 | 24 | core/urb.c:1054:urb_hcpriv_mempool_zalloc | 10523 | >> | 11776 | 31 | core/urb.c:76:usb_alloc_urb | 11027 | >> +----------------------+----------------+-------------------------------------------+-----------------------+ >> >> The count for active URB objects remain at low level, >> its peak is about 12KB when I copied 10G file to my harddisk. >> The memory pool in this patch takes about 22KB, its peak is 23KB. >> The patch meant to reuse memory via a mempool, the memory kept in pool is indeed >> the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.) >> How balance the tradeoff is depends on how well the mempool is managed. >> This patch takes a easy approach: put faith in URB objects management and put >> a single slot of mempool in URB on demands. And the changes, by counting lines >> in this patch, are very simple. >> Base on the profiling, the number of active URB objects are kept at a very low scale, >> only several could have a very long lifecycle. >> I think URB is a good candidate for caching those memory needed for private data. >> But I could be very wrong, due simply to the lack of knowledge. >> >> And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk >> would yield high rate of memory allocation for priviate data in xhci_urb_enqueue: >> +----------------------+----------------+-----------------------------------+-----------------------+ >> | active memory(bytes) | active objects | alloc location | total objects created | >> +----------------------+----------------+-----------------------------------+-----------------------+ >> | 22784 | 23 | host/xhci.c:1555:xhci_urb_enqueue | 894281 << grow|ing very quick >> | 10880 | 31 | core/urb.c:75:usb_alloc_urb | 4028 | >> +----------------------+----------------+-----------------------------------+-----------------------+ >> I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue >> when I was copying 10G file, and had my webcam opened at the same time. >> >> And again, to be honest, I did not observe any observable performance improvement from >> an enduser's point of view with this patch. The only significant improvement is memory footprint >> _numbers_. >> I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of >> "the less allocation the better". > >No, this isn't necessarily true at all. Allocations are fast, and if we >free/allocate things quickly, it's even faster. USB is limited by the >hardware throughput, which is _very_ slow compared to memory accesses of >the allocator. > >So unless you can show that we are using less CPU time, or something >else "real" that is measurable in a real way in userspace, that would >justify the extra complexity, it's going to be hard to get me to agree >that this is something that needs to be addressed at all. Thanks for feedbacks~! That's very reasonable to me, and I have been pondering on how to profile a USB performance, but still no clue. I will keep thinking about it, hopefully this 1k+/s allocation would show up somewhere, or conclude that it really has no significant impact at all. Thanks David > >Also, I'm totally confused as to what the "latest" version of this >patchset is... > sorry, I think I mess up the mails when I add "reply-to" header to newer patches >thanks, > >greg k-h
On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote: > --- > Changes: > 1. Use caller's mem_flags if a larger memory is needed. > Thanks to Oliver Neukum <oneukum@suse.com>'s review. > --- > URB objects have long lifecycle, an urb can be reused between > enqueue-dequeue loops; The private data needed by some host controller > has very short lifecycle, the memory is alloced when enqueue, and > released when dequeue. For example, on a system with xhci, several > minutes of usage of webcam/keyboard/mouse have memory alloc counts: > drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 > drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 > Memory allocation frequency for host-controller private data can reach > ~1k/s and above. > > High frequent allocations for host-controller private data can be > avoided if urb take over the ownership of memory, the memory then shares > the longer lifecycle with urb objects. > > Add a mempool to urb for hcpriv usage, the mempool only holds one block > of memory and grows when larger size is requested. > > Signed-off-by: David Wang <00107082@163.com> It should be possible to do what you want without touching the USB core code at all, changing only xhci-hcd. That's what Mathias is suggesting. Instead of having an URB keep ownership of its extra memory after it completes, xhci-hcd can put the memory area onto a free list. Then memory areas on the free list can be reused with almost no overhead when URBs are enqueued later on. This idea can become more elaborate if you maintain separate free lists for different devices, or even for different endpoints, or sort the free list by the size of the memory areas. But the basic idea is always the same: Don't change usbcore (including struct urb), and make getting and releasing the extra memory areas have extremely low overhead. Alan Stern
On 13.05.25 16:25, Alan Stern wrote: > Instead of having an URB keep ownership of its extra memory after it > completes, xhci-hcd can put the memory area onto a free list. Then > memory areas on the free list can be reused with almost no overhead when > URBs are enqueued later on. The number of URBs in flight is basically unlimited. When would you shrink the number of cached privates? That would seem to be the basic question if you do not link this with URBs. Regards Oliver
At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote: >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote: >> --- >> Changes: >> 1. Use caller's mem_flags if a larger memory is needed. >> Thanks to Oliver Neukum <oneukum@suse.com>'s review. >> --- >> URB objects have long lifecycle, an urb can be reused between >> enqueue-dequeue loops; The private data needed by some host controller >> has very short lifecycle, the memory is alloced when enqueue, and >> released when dequeue. For example, on a system with xhci, several >> minutes of usage of webcam/keyboard/mouse have memory alloc counts: >> drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 >> drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 >> Memory allocation frequency for host-controller private data can reach >> ~1k/s and above. >> >> High frequent allocations for host-controller private data can be >> avoided if urb take over the ownership of memory, the memory then shares >> the longer lifecycle with urb objects. >> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block >> of memory and grows when larger size is requested. >> >> Signed-off-by: David Wang <00107082@163.com> > >It should be possible to do what you want without touching the USB core >code at all, changing only xhci-hcd. That's what Mathias is suggesting. > >Instead of having an URB keep ownership of its extra memory after it >completes, xhci-hcd can put the memory area onto a free list. Then >memory areas on the free list can be reused with almost no overhead when >URBs are enqueued later on. I have to disagree, your suggestion has no much difference from requesting memory from system, locks and memory pool managements, all the same are needed, why bother? The reason I choose URB is that URB life cycle contains the private data's lifecycle, and no two HCD can take over the same URB as the same time. I think the memory pool here is not actually a pool, or I should say the mempool consists of pool of URBs, and each URB have only "single one" slot of mem pool in it. > >This idea can become more elaborate if you maintain separate free lists >for different devices, or even for different endpoints, or sort the free >list by the size of the memory areas. But the basic idea is always the >same: Don't change usbcore (including struct urb), and make getting and >releasing the extra memory areas have extremely low overhead. Why implements a device level memory pool would have extremely low overhead? Why making changes to usb core is bad? The idea in this thread is meant for all kinds of host controllers, xhci is what I have in my system; i think similar changes would benefit other HCs > >Alan Stern
On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote: > > > At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote: > >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote: > >> --- > >> Changes: > >> 1. Use caller's mem_flags if a larger memory is needed. > >> Thanks to Oliver Neukum <oneukum@suse.com>'s review. > >> --- > >> URB objects have long lifecycle, an urb can be reused between > >> enqueue-dequeue loops; The private data needed by some host controller > >> has very short lifecycle, the memory is alloced when enqueue, and > >> released when dequeue. For example, on a system with xhci, several > >> minutes of usage of webcam/keyboard/mouse have memory alloc counts: > >> drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 > >> drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 > >> Memory allocation frequency for host-controller private data can reach > >> ~1k/s and above. > >> > >> High frequent allocations for host-controller private data can be > >> avoided if urb take over the ownership of memory, the memory then shares > >> the longer lifecycle with urb objects. > >> > >> Add a mempool to urb for hcpriv usage, the mempool only holds one block > >> of memory and grows when larger size is requested. > >> > >> Signed-off-by: David Wang <00107082@163.com> > > > >It should be possible to do what you want without touching the USB core > >code at all, changing only xhci-hcd. That's what Mathias is suggesting. > > > >Instead of having an URB keep ownership of its extra memory after it > >completes, xhci-hcd can put the memory area onto a free list. Then > >memory areas on the free list can be reused with almost no overhead when > >URBs are enqueued later on. > > I have to disagree, your suggestion has no much difference from requesting memory from > system, locks and memory pool managements, all the same are needed, why bother? There are two differences. First, xhci-hcd already has its own lock that it acquires when enqueuing or dequeuing URBs, so no additional locks would be needed. Second, complicated memory pool management isn't necessary; the management can be extremely simple. (For example, Mathias suggested just reusing the most recently released memory area unless it is too small.) > The reason I choose URB is that URB life cycle contains the private data's lifecycle, > and no two HCD can take over the same URB as the same time. > > I think the memory pool here is not actually a pool, or I should say the mempool consists of > pool of URBs, and each URB have only "single one" slot of mem pool in it. > > > > > >This idea can become more elaborate if you maintain separate free lists > >for different devices, or even for different endpoints, or sort the free > >list by the size of the memory areas. But the basic idea is always the > >same: Don't change usbcore (including struct urb), and make getting and > >releasing the extra memory areas have extremely low overhead. > > Why implements a device level memory pool would have extremely low overhead? Because then getting or releasing memory areas from the pool could be carried out just by manipulating a couple of pointers. Some fraction of the time, URBs are created on demand and destroyed upon completion. Your approach would not save any time for these URBs, whereas my suggested approach would. (This fraction probably isn't very large, although I don't know how big it is.) > Why making changes to usb core is bad? The idea in this thread is meant for all kinds of > host controllers, xhci is what I have in my system; i think similar changes would benefit other > HCs Those other HC drivers would still require changes. They could be changed to support my scheme as easily as your scheme. If I were redesigning Linux's entire USB stack from the beginning, I would change it so that URBs would be dedicated to particular host controllers and endpoint types -- maybe even to particular endpoints. They would contain all the additional memory required for the HCD to use them, all pre-allocated, so that dynamic allocation would not be needed during normal use. (The gadget subsystem behaves this way already.) Such a drastic change isn't feasible at this point, although what you are suggesting is a step in that direction. In the end it comes down to a time/space tradeoff, and it's very difficult to know what the best balance is. I'm not saying that your approach is bad or wrong, just that there are other possibilities to consider. Alan Stern Alan Stern
At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote: >On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote: >> >> >> At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote: >> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote: >> >> --- >> >> Changes: >> >> 1. Use caller's mem_flags if a larger memory is needed. >> >> Thanks to Oliver Neukum <oneukum@suse.com>'s review. >> >> --- >> >> URB objects have long lifecycle, an urb can be reused between >> >> enqueue-dequeue loops; The private data needed by some host controller >> >> has very short lifecycle, the memory is alloced when enqueue, and >> >> released when dequeue. For example, on a system with xhci, several >> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts: >> >> drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 >> >> drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 >> >> Memory allocation frequency for host-controller private data can reach >> >> ~1k/s and above. >> >> >> >> High frequent allocations for host-controller private data can be >> >> avoided if urb take over the ownership of memory, the memory then shares >> >> the longer lifecycle with urb objects. >> >> >> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block >> >> of memory and grows when larger size is requested. >> >> >> >> Signed-off-by: David Wang <00107082@163.com> >> > >> >It should be possible to do what you want without touching the USB core >> >code at all, changing only xhci-hcd. That's what Mathias is suggesting. >> > >> >Instead of having an URB keep ownership of its extra memory after it >> >completes, xhci-hcd can put the memory area onto a free list. Then >> >memory areas on the free list can be reused with almost no overhead when >> >URBs are enqueued later on. >> >> I have to disagree, your suggestion has no much difference from requesting memory from >> system, locks and memory pool managements, all the same are needed, why bother? > >There are two differences. First, xhci-hcd already has its own lock >that it acquires when enqueuing or dequeuing URBs, so no additional >locks would be needed. Second, complicated memory pool management isn't >necessary; the management can be extremely simple. (For example, >Mathias suggested just reusing the most recently released memory area >unless it is too small.) My patch also just reuse memory, in a simpler way I would argue.... > >> The reason I choose URB is that URB life cycle contains the private data's lifecycle, >> and no two HCD can take over the same URB as the same time. >> >> I think the memory pool here is not actually a pool, or I should say the mempool consists of >> pool of URBs, and each URB have only "single one" slot of mem pool in it. >> >> >> > >> >This idea can become more elaborate if you maintain separate free lists >> >for different devices, or even for different endpoints, or sort the free >> >list by the size of the memory areas. But the basic idea is always the >> >same: Don't change usbcore (including struct urb), and make getting and >> >releasing the extra memory areas have extremely low overhead. >> >> Why implements a device level memory pool would have extremely low overhead? > >Because then getting or releasing memory areas from the pool could be >carried out just by manipulating a couple of pointers. A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB? I doubt that. There would be lots of details needs to consider, detail is devil and that why we prefer simpler solution, I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it, or it is just that in my mind, make changes to "usb core" is a big no-no! > >Some fraction of the time, URBs are created on demand and destroyed upon >completion. Your approach would not save any time for these URBs, >whereas my suggested approach would. (This fraction probably isn't very >large, although I don't know how big it is.) I am aiming to void extra tons of alloctions for "private data", not URB allocation. When I use my USB webcam, I would observer 1k+ allocation per seconds for private data, while urb allocation's frequency is already very low, people have already done the optimization I guess. The memory profiling feature introduced in 6.12 is a very good start for identifying where improvement could be made for memory behavior. > >> Why making changes to usb core is bad? The idea in this thread is meant for all kinds of >> host controllers, xhci is what I have in my system; i think similar changes would benefit other >> HCs > >Those other HC drivers would still require changes. They could be >changed to support my scheme as easily as your scheme. > >If I were redesigning Linux's entire USB stack from the beginning, I >would change it so that URBs would be dedicated to particular host >controllers and endpoint types -- maybe even to particular endpoints. >They would contain all the additional memory required for the HCD to use >them, all pre-allocated, so that dynamic allocation would not be needed >during normal use. (The gadget subsystem behaves this way already.) > >Such a drastic change isn't feasible at this point, although what you >are suggesting is a step in that direction. In the end it comes down >to a time/space tradeoff, and it's very difficult to know what the best >balance is. Drastic change? Do you mean make change to USB core? Because counting by lines of changes in this patch, I feel my patch is quite simple and efficient. I also don't get your time/space tradeoff here, are you talking about my patch? or you were just talking a solution in your mind.... This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace... > >I'm not saying that your approach is bad or wrong, just that there are >other possibilities to consider. > >Alan Stern > >Alan Stern David
On Wed, May 14, 2025 at 12:35:29AM +0800, David Wang wrote: > > At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote: > >> I have to disagree, your suggestion has no much difference from requesting memory from > >> system, locks and memory pool managements, all the same are needed, why bother? > > > >There are two differences. First, xhci-hcd already has its own lock > >that it acquires when enqueuing or dequeuing URBs, so no additional > >locks would be needed. Second, complicated memory pool management isn't > >necessary; the management can be extremely simple. (For example, > >Mathias suggested just reusing the most recently released memory area > >unless it is too small.) > > My patch also just reuse memory, in a simpler way I would argue.... I didn't say your approach wasn't simple. I said that my approach has very low overhead, a lot lower than the existing code, whereas you said my approach "has no much difference" from the existing code. > >> >This idea can become more elaborate if you maintain separate free lists > >> >for different devices, or even for different endpoints, or sort the free > >> >list by the size of the memory areas. But the basic idea is always the > >> >same: Don't change usbcore (including struct urb), and make getting and > >> >releasing the extra memory areas have extremely low overhead. > >> > >> Why implements a device level memory pool would have extremely low overhead? > > > >Because then getting or releasing memory areas from the pool could be > >carried out just by manipulating a couple of pointers. > > A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB? > I doubt that. I didn't say pointer manipulation was simpler than a reusable buffer. I said that it was very low overhead, in order to answer your question: "Why implements a device level memory pool would have extremely low overhead?" > There would be lots of details needs to consider, detail is devil and that why we prefer simpler solution, > I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it, > or it is just that in my mind, make changes to "usb core" is a big no-no! Neither one. However, you have forgotten about one thing: With your patch, each URB maintains ownership of a memory area even when the URB is not in use. With the existing code, that memory is freed when the URB is not in use, and with my approach the memory is shared among URBs. In this way, your patch will use more memory than the existing code or my approach. The question to answer is which is better: Using more memory (your patch) or using more time (the allocations and deallocations in the existing code or my approach)? > >If I were redesigning Linux's entire USB stack from the beginning, I > >would change it so that URBs would be dedicated to particular host > >controllers and endpoint types -- maybe even to particular endpoints. > >They would contain all the additional memory required for the HCD to use > >them, all pre-allocated, so that dynamic allocation would not be needed > >during normal use. (The gadget subsystem behaves this way already.) > > > >Such a drastic change isn't feasible at this point, although what you > >are suggesting is a step in that direction. In the end it comes down > >to a time/space tradeoff, and it's very difficult to know what the best > >balance is. > Drastic change? Do you mean make change to USB core? No, I meant redesigning the entire USB stack. It would require changing all the USB drivers to allocate URBs differently from the way they do now. And changing every HC driver to preallocate the memory it needs to perform transfers. I think you can agree that would be a pretty drastic change. > Because counting by > lines of changes in this patch, I feel my patch is quite simple and efficient. > I also don't get your time/space tradeoff here, are you talking about my patch? > or you were just talking a solution in your mind.... Both. See my comment above. > This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace... It also needs an extra memory area that is allocated the first time the URB is used, and is not deallocated until the URB is destroyed. You seem to be ignoring this fact. Alan Stern
At 2025-05-14 02:21:15, "Alan Stern" <stern@rowland.harvard.edu> wrote: >On Wed, May 14, 2025 at 12:35:29AM +0800, David Wang wrote: >> >> At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote: >> >> I have to disagree, your suggestion has no much difference from requesting memory from >> >> system, locks and memory pool managements, all the same are needed, why bother? >> > >> >There are two differences. First, xhci-hcd already has its own lock >> >that it acquires when enqueuing or dequeuing URBs, so no additional >> >locks would be needed. Second, complicated memory pool management isn't >> >necessary; the management can be extremely simple. (For example, >> >Mathias suggested just reusing the most recently released memory area >> >unless it is too small.) >> >> My patch also just reuse memory, in a simpler way I would argue.... > >I didn't say your approach wasn't simple. I said that my approach has >very low overhead, a lot lower than the existing code, whereas you said >my approach "has no much difference" from the existing code. > >> >> >This idea can become more elaborate if you maintain separate free lists >> >> >for different devices, or even for different endpoints, or sort the free >> >> >list by the size of the memory areas. But the basic idea is always the >> >> >same: Don't change usbcore (including struct urb), and make getting and >> >> >releasing the extra memory areas have extremely low overhead. >> >> >> >> Why implements a device level memory pool would have extremely low overhead? >> > >> >Because then getting or releasing memory areas from the pool could be >> >carried out just by manipulating a couple of pointers. >> >> A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB? >> I doubt that. > >I didn't say pointer manipulation was simpler than a reusable buffer. I >said that it was very low overhead, in order to answer your question: >"Why implements a device level memory pool would have extremely low >overhead?" Now I feels it becomes a wording game ..... > >> There would be lots of details needs to consider, detail is devil and that why we prefer simpler solution, >> I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it, >> or it is just that in my mind, make changes to "usb core" is a big no-no! > >Neither one. > >However, you have forgotten about one thing: With your patch, each URB >maintains ownership of a memory area even when the URB is not in use. >With the existing code, that memory is freed when the URB is not in use, >and with my approach the memory is shared among URBs. > >In this way, your patch will use more memory than the existing code or >my approach. The question to answer is which is better: Using more >memory (your patch) or using more time (the allocations and >deallocations in the existing code or my approach)? > >> >If I were redesigning Linux's entire USB stack from the beginning, I >> >would change it so that URBs would be dedicated to particular host >> >controllers and endpoint types -- maybe even to particular endpoints. >> >They would contain all the additional memory required for the HCD to use >> >them, all pre-allocated, so that dynamic allocation would not be needed >> >during normal use. (The gadget subsystem behaves this way already.) >> > >> >Such a drastic change isn't feasible at this point, although what you >> >are suggesting is a step in that direction. In the end it comes down >> >to a time/space tradeoff, and it's very difficult to know what the best >> >balance is. >> Drastic change? Do you mean make change to USB core? > >No, I meant redesigning the entire USB stack. It would require changing >all the USB drivers to allocate URBs differently from the way they do >now. And changing every HC driver to preallocate the memory it needs to >perform transfers. I think you can agree that would be a pretty drastic >change. > >> Because counting by >> lines of changes in this patch, I feel my patch is quite simple and efficient. >> I also don't get your time/space tradeoff here, are you talking about my patch? >> or you were just talking a solution in your mind.... > >Both. See my comment above. > >> This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace... > >It also needs an extra memory area that is allocated the first time the >URB is used, and is not deallocated until the URB is destroyed. You >seem to be ignoring this fact. > >Alan Stern It is not an "extra" memory area, the memory is needed by HC anyway, the memory pool just cache it. And about not freeing memory until URB released, you seems forgot that we are talking about "memory pool" . A URB only used once could be considered a memory pool never used. If your memory pool approach would not "waste" memory, I would rather happy to learn. I want to mention the purpose of this patch again: A lot of "private data" allocation could be avoided if we use a "mempool" to cache and reuse those memory. And use URB as the holder is a very simple way to implement this,. And to add , base on my memory profiling, URB usage is very efficient. I think it is a very good candidate to hold private data cache for HCs. David
On Wed, May 14, 2025 at 02:48:21AM +0800, David Wang wrote: > It is not an "extra" memory area, the memory is needed by HC anyway, the memory pool just cache it. > And about not freeing memory until URB released, you seems forgot that we are talking > about "memory pool" . A URB only used once could be considered a memory pool never used. > > If your memory pool approach would not "waste" memory, I would rather happy to learn. Here's a simple example to illustrate the point. Suppose a driver uses two URBs, call them A and B, but it never has more than one URB active at a time. Thus, when A completes B is submitted, and when B completes A is submitted. With your approach A and B each have their own memory area. With my approach, a single memory area is shared between A and B. Therefore my approach uses less total memory. Now, I admit this pattern is probably not at all common. Usually if a driver is going to reuse an URB, it resubmits the URB as soon as the URB completes rather than waiting for some other URB to complete. Drivers generally don't keep many unused URBs just sitting around -- although there may be exceptions, like a driver for a media device when the device isn't running. > I want to mention the purpose of this patch again: > A lot of "private data" allocation could be avoided if we use a "mempool" to cache and reuse those memory. > And use URB as the holder is a very simple way to implement this,. > > And to add , base on my memory profiling, URB usage is very efficient. I think it is a very good candidate to hold > private data cache for HCs. All right. I withdraw any objection to your patches. Alan Stern
URB objects have long lifecycle, an urb can be reused between
enqueue-dequeue loops; The private data needed by some host controller
has very short lifecycle, the memory is alloced when enqueue, and
released when dequeue. For example, on a system with xhci, several
minutes of usage of webcam/keyboard/mouse have memory alloc counts:
drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
Memory allocation frequency for host-controller private data can reach
~1k/s.
High frequent allocations for host-controller private data can be
avoided if urb take over the ownership of memory, the memory then shares
the longer lifecycle with urb objects.
Add a mempool to urb for hcpriv usage, the memory inherits the memory
flags from urb object itself, and will grow if larger size is requested.
Signed-off-by: David Wang <00107082@163.com>
---
drivers/usb/core/urb.c | 23 +++++++++++++++++++++++
include/linux/usb.h | 4 ++++
2 files changed, 27 insertions(+)
diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 5e52a35486af..7c13b971b435 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
if (urb->transfer_flags & URB_FREE_BUFFER)
kfree(urb->transfer_buffer);
+ kfree(urb->hcpriv_mempool);
kfree(urb);
}
@@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags)
if (!urb)
return NULL;
usb_init_urb(urb);
+ urb->hcpriv_mempool_flags = mem_flags;
return urb;
}
EXPORT_SYMBOL_GPL(usb_alloc_urb);
@@ -1037,3 +1039,24 @@ int usb_anchor_empty(struct usb_anchor *anchor)
EXPORT_SYMBOL_GPL(usb_anchor_empty);
+/**
+ * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
+ * @urb: pointer to URB being used
+ * @size: memory size requested by current host controller
+ *
+ * Return: NULL if out of memory, otherwise memory are zeroed
+ */
+void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size)
+{
+ if (urb->hcpriv_mempool_size < size) {
+ kfree(urb->hcpriv_mempool);
+ urb->hcpriv_mempool_size = size;
+ urb->hcpriv_mempool = kmalloc(size, urb->hcpriv_mempool_flags);
+ }
+ if (urb->hcpriv_mempool)
+ memset(urb->hcpriv_mempool, 0, size);
+ else
+ urb->hcpriv_mempool_size = 0;
+ return urb->hcpriv_mempool;
+}
+EXPORT_SYMBOL_GPL(urb_hcpriv_mempool_zalloc);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..a31535f3b2bc 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,9 @@ struct urb {
struct kref kref; /* reference count of the URB */
int unlinked; /* unlink error code */
void *hcpriv; /* private data for host controller */
+ void *hcpriv_mempool; /* cache for host controller's private data */
+ size_t hcpriv_mempool_size; /* current size of the memory pool */
+ gfp_t hcpriv_mempool_flags; /* mem flags for memory pool */
atomic_t use_count; /* concurrent submissions counter */
atomic_t reject; /* submissions will fail */
@@ -1790,6 +1793,7 @@ extern int usb_wait_anchor_empty_timeout(struct usb_anchor *anchor,
extern struct urb *usb_get_from_anchor(struct usb_anchor *anchor);
extern void usb_scuttle_anchored_urbs(struct usb_anchor *anchor);
extern int usb_anchor_empty(struct usb_anchor *anchor);
+extern void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size);
#define usb_unblock_urb usb_unpoison_urb
--
2.39.2
Hi, On 13.05.25 07:54, David Wang wrote: > URB objects have long lifecycle, an urb can be reused between > enqueue-dequeue loops; The private data needed by some host controller > has very short lifecycle, the memory is alloced when enqueue, and > released when dequeue. For example, on a system with xhci, several > minutes of usage of webcam/keyboard/mouse have memory alloc counts: > drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 > drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 > Memory allocation frequency for host-controller private data can reach > ~1k/s. First of all, thank you for trying to tackle this long running issue. > @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags) > if (!urb) > return NULL; > usb_init_urb(urb); > + urb->hcpriv_mempool_flags = mem_flags; No. You cannot do this. The flags you pass to usb_alloc_urb() depend on the context you call it in. For example, if you are allocating it while holding a spinlock, ou need to use GFP_ATOMIC But that may or may not be the same context you submit the URB in. Recording mem_flags here makes no sense. Regards Oliver
At 2025-05-13 16:11:20, "Oliver Neukum" <oneukum@suse.com> wrote: >Hi, > >On 13.05.25 07:54, David Wang wrote: >> URB objects have long lifecycle, an urb can be reused between >> enqueue-dequeue loops; The private data needed by some host controller >> has very short lifecycle, the memory is alloced when enqueue, and >> released when dequeue. For example, on a system with xhci, several >> minutes of usage of webcam/keyboard/mouse have memory alloc counts: >> drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 >> drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 >> Memory allocation frequency for host-controller private data can reach >> ~1k/s. > >First of all, thank you for trying to tackle this long running issue. > >> @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags) >> if (!urb) >> return NULL; >> usb_init_urb(urb); >> + urb->hcpriv_mempool_flags = mem_flags; > >No. You cannot do this. The flags you pass to usb_alloc_urb() >depend on the context you call it in. For example, if you are >allocating it while holding a spinlock, ou need to use GFP_ATOMIC > >But that may or may not be the same context you submit the URB in. >Recording mem_flags here makes no sense. > > Regards > Oliver Thanks for reviewing this. The memory flag thing do raise concern. I think I can make adjustment: realloc the memory if flag changed. Thanks David
On 13.05.25 10:23, David Wang wrote: Hi, > Thanks for reviewing this. The memory flag thing do raise concern. > I think I can make adjustment: realloc the memory if flag changed. I am sorry. I have been unclear. Here comes a detailed explanation: What we call "gfp_t" is a combination of flags. They describe A - the type of memory (always valid) B - the way the memory can be allocated (valid only at a specific time) The URB is a generic data structure to be processed by the CPU, _not_ the HC. It is always generic kernel memory. Flags of type A make no sense to pass. In fact you may not know for which device an URB will be used when you allocate it. The only valid mem_flags you can pass to usb_alloc_urb() are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC. If you need to reallocate memory for private data you _must_ use the flags passed with usb_submit_urb(). A HCD can modify them by adding flags of type A, but you cannot change flags of type B. For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB, but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save and reuse the GFP_KERNEL. HTH Oliver
At 2025-05-13 16:46:39, "Oliver Neukum" <oneukum@suse.com> wrote: >On 13.05.25 10:23, David Wang wrote: > >Hi, > > Thanks for reviewing this. The memory flag thing do raise concern. >> I think I can make adjustment: realloc the memory if flag changed. > >I am sorry. I have been unclear. Here comes a detailed explanation: > >What we call "gfp_t" is a combination of flags. They describe > >A - the type of memory (always valid) >B - the way the memory can be allocated (valid only at a specific time) > >The URB is a generic data structure to be processed by the CPU, _not_ >the HC. It is always generic kernel memory. Flags of type A make no sense >to pass. >In fact you may not know for which device an URB will be used when you >allocate it. The only valid mem_flags you can pass to usb_alloc_urb() >are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC. > >If you need to reallocate memory for private data you _must_ use >the flags passed with usb_submit_urb(). A HCD can modify them by adding >flags of type A, but you cannot change flags of type B. >For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB, >but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save >and reuse the GFP_KERNEL. > > HTH > Oliver > Hi, I have one question about mem flags. If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with flags B and its size is big enough, is it safe to return this memory to usb_submit_urb which is in the context of flags A? Thanks David
On 13.05.25 11:49, David Wang wrote: > Hi, I have one question about mem flags. > If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with flags B and its size > is big enough, is it safe to return this memory to usb_submit_urb which is in the context of flags A? Yes, that is perfectly safe. HTH Oliver
At 2025-05-13 19:02:42, "Oliver Neukum" <oneukum@suse.com> wrote: > > >On 13.05.25 11:49, David Wang wrote: > >> Hi, I have one question about mem flags. >> If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with flags B and its size >> is big enough, is it safe to return this memory to usb_submit_urb which is in the context of flags A? > >Yes, that is perfectly safe. > > HTH > Oliver Copy that~ Thanks David
At 2025-05-13 16:46:39, "Oliver Neukum" <oneukum@suse.com> wrote: >On 13.05.25 10:23, David Wang wrote: > >Hi, > > Thanks for reviewing this. The memory flag thing do raise concern. >> I think I can make adjustment: realloc the memory if flag changed. > >I am sorry. I have been unclear. Here comes a detailed explanation: > >What we call "gfp_t" is a combination of flags. They describe > >A - the type of memory (always valid) >B - the way the memory can be allocated (valid only at a specific time) > >The URB is a generic data structure to be processed by the CPU, _not_ >the HC. It is always generic kernel memory. Flags of type A make no sense >to pass. >In fact you may not know for which device an URB will be used when you >allocate it. The only valid mem_flags you can pass to usb_alloc_urb() >are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC. > >If you need to reallocate memory for private data you _must_ use >the flags passed with usb_submit_urb(). A HCD can modify them by adding >flags of type A, but you cannot change flags of type B. >For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB, >but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save >and reuse the GFP_KERNEL. > > HTH > Oliver > Thanks for explaining this. I will adjust code to handle mem_flags changes, and update soon after I made some tests. David
---
Changes:
1. remove a unused variable declaration in xhci-ring.c
---
xhci keeps alloc/free private data for each enqueue/dequeue cycles,
when using a USB webcam, the memory allocation frequency could reach
about 1k/s and above.
URB objects have longer lifecycle than private data, hand over ownership
of private data to urb can save lots of memory allocations over time.
Signed-off-by: David Wang <00107082@163.com>
---
drivers/usb/host/xhci-ring.c | 2 --
drivers/usb/host/xhci.c | 5 +----
2 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 423bf3649570..f082215f140b 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -821,7 +821,6 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
struct xhci_td *cur_td, int status)
{
struct urb *urb = cur_td->urb;
- struct urb_priv *urb_priv = urb->hcpriv;
struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS) {
@@ -831,7 +830,6 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
usb_amd_quirk_pll_enable();
}
}
- xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
trace_xhci_urb_giveback(urb);
usb_hcd_giveback_urb(hcd, urb, status);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..f88385e0488e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1552,7 +1552,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
else
num_tds = 1;
- urb_priv = kzalloc(struct_size(urb_priv, td, num_tds), mem_flags);
+ urb_priv = urb_hcpriv_mempool_zalloc(urb, struct_size(urb_priv, td, num_tds), mem_flags);
if (!urb_priv)
return -ENOMEM;
@@ -1626,7 +1626,6 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
if (ret) {
free_priv:
- xhci_urb_free_priv(urb_priv);
urb->hcpriv = NULL;
}
spin_unlock_irqrestore(&xhci->lock, flags);
@@ -1789,8 +1788,6 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
return ret;
err_giveback:
- if (urb_priv)
- xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
spin_unlock_irqrestore(&xhci->lock, flags);
usb_hcd_giveback_urb(hcd, urb, -ESHUTDOWN);
--
2.39.2
xhci keeps alloc/free private data for each enqueue/dequeue cycles,
when using a USB webcam, the memory allocation frequency could reach
about 1k/s.
URB objects have longer lifecycle than private data, hand over ownership
of private data to urb can save lots of memory allocations over time.
Signed-off-by: David Wang <00107082@163.com>
---
drivers/usb/host/xhci-ring.c | 1 -
drivers/usb/host/xhci.c | 5 +----
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 423bf3649570..b98e5211693d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -831,7 +831,6 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
usb_amd_quirk_pll_enable();
}
}
- xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
trace_xhci_urb_giveback(urb);
usb_hcd_giveback_urb(hcd, urb, status);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..94ff5fa9dcf2 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1552,7 +1552,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
else
num_tds = 1;
- urb_priv = kzalloc(struct_size(urb_priv, td, num_tds), mem_flags);
+ urb_priv = urb_hcpriv_mempool_zalloc(urb, struct_size(urb_priv, td, num_tds));
if (!urb_priv)
return -ENOMEM;
@@ -1626,7 +1626,6 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
if (ret) {
free_priv:
- xhci_urb_free_priv(urb_priv);
urb->hcpriv = NULL;
}
spin_unlock_irqrestore(&xhci->lock, flags);
@@ -1789,8 +1788,6 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
return ret;
err_giveback:
- if (urb_priv)
- xhci_urb_free_priv(urb_priv);
usb_hcd_unlink_urb_from_ep(hcd, urb);
spin_unlock_irqrestore(&xhci->lock, flags);
usb_hcd_giveback_urb(hcd, urb, -ESHUTDOWN);
--
2.39.2
Hi, On 13.05.25 07:55, David Wang wrote: > xhci keeps alloc/free private data for each enqueue/dequeue cycles, > when using a USB webcam, the memory allocation frequency could reach > about 1k/s. > > URB objects have longer lifecycle than private data, hand over ownership > of private data to urb can save lots of memory allocations over time. I am afraid I need to make a comment about a principal issue. This patch set overlooks a fundamental issue. You cannot guarantee that an URB is reused by the same HC. For example you cannot rule out that the next time your URB is resubmitted, it will land with XHCI again. That means you cannot touch just how one HCD handles private data. Or you need to record which HC the URB was last used for. Regards Oliver
At 2025-05-13 16:21:06, "Oliver Neukum" <oneukum@suse.com> wrote: >Hi, > >On 13.05.25 07:55, David Wang wrote: >> xhci keeps alloc/free private data for each enqueue/dequeue cycles, >> when using a USB webcam, the memory allocation frequency could reach >> about 1k/s. >> >> URB objects have longer lifecycle than private data, hand over ownership >> of private data to urb can save lots of memory allocations over time. > >I am afraid I need to make a comment about a principal issue. > >This patch set overlooks a fundamental issue. You cannot guarantee >that an URB is reused by the same HC. For example you cannot rule >out that the next time your URB is resubmitted, it will land with >XHCI again. > >That means you cannot touch just how one HCD handles private data. >Or you need to record which HC the URB was last used for. No, I don't think which HC the URB was last used for concerns here. A URB cannot be used by two HC at the same time, as long as this hold, I don't see reason worrying about who use this private data before. This patch for xhci dose not assume which HC use the private data mempool previously, nor should other HCs. Or, do I misunderstand your comments? David > > Regards > Oliver
Hi, On 13.05.25 10:31, David Wang wrote: > No, I don't think which HC the URB was last used for concerns here. > A URB cannot be used by two HC at the same time, as long as this hold, I don't see reason > worrying about who use this private data before. > > This patch for xhci dose not assume which HC use the private data mempool previously, nor > should other HCs. OK, I need to correct myself. Though I have to ask in the long run what happens if two HCDs need to allocate the mempool differently. But for now it will work. > Or, do I misunderstand your comments? No. The problem is with the first patch in the series. Basically you need to pass mem_flags from usb_submit_urb() to urb_hcpriv_mempool_zalloc() and use it. Regards Oliver
© 2016 - 2026 Red Hat, Inc.