[PATCH net-next 0/2] GCPS Spec Compliance Patch Set

kalavakunta.hari.prasad@gmail.com posted 2 patches 10 months, 1 week ago
net/ncsi/internal.h | 21 +++++-----
net/ncsi/ncsi-pkt.h | 95 +++++++++++++++++++++++++--------------------
net/ncsi/ncsi-rsp.c | 31 +++++++++------
3 files changed, 82 insertions(+), 65 deletions(-)
[PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by kalavakunta.hari.prasad@gmail.com 10 months, 1 week ago
From: Hari Kalavakunta <kalavakunta.hari.prasad@gmail.com>

Make Get Controller Packet Statistics (GCPS) 64-bit member variables
spec-compliant, as per DSP0222 v1.0.0 and forward specs

Hari Kalavakunta (2):
  net: ncsi: Format structure for longer names
  net: ncsi: Fix GCPS 64-bit member variables

 net/ncsi/internal.h | 21 +++++-----
 net/ncsi/ncsi-pkt.h | 95 +++++++++++++++++++++++++--------------------
 net/ncsi/ncsi-rsp.c | 31 +++++++++------
 3 files changed, 82 insertions(+), 65 deletions(-)

-- 
2.47.1
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Sam Mendoza-Jonas 10 months ago
On 8/04/2025 4:19 am, kalavakunta.hari.prasad@gmail.com wrote:
> From: Hari Kalavakunta <kalavakunta.hari.prasad@gmail.com>
>
> Make Get Controller Packet Statistics (GCPS) 64-bit member variables
> spec-compliant, as per DSP0222 v1.0.0 and forward specs
>
> Hari Kalavakunta (2):
>    net: ncsi: Format structure for longer names
>    net: ncsi: Fix GCPS 64-bit member variables
>
>   net/ncsi/internal.h | 21 +++++-----
>   net/ncsi/ncsi-pkt.h | 95 +++++++++++++++++++++++++--------------------
>   net/ncsi/ncsi-rsp.c | 31 +++++++++------
>   3 files changed, 82 insertions(+), 65 deletions(-)

Looking at e.g. DSP0222 1.2.0a, you're right about the field widths, but 
it's not particularly explicit about whether the full 64 bits is used. 
I'd assume so, but do you see the upper bits of e.g. the packet counters 
return expected data? Otherwise looks good.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/7/2025 2:44 PM, Sam Mendoza-Jonas wrote:
> On 8/04/2025 4:19 am, kalavakunta.hari.prasad@gmail.com wrote:

> 
> Looking at e.g. DSP0222 1.2.0a, you're right about the field widths, but 
> it's not particularly explicit about whether the full 64 bits is used. 
> I'd assume so, but do you see the upper bits of e.g. the packet counters 
> return expected data? Otherwise looks good.
> 
It is possible that these statistics have not been previously explored 
or utilized, which may explain why they went unnoticed. As you pointed 
out, the checksum offset within the struct is not currently being 
checked, and similarly, the returned packet sizes are also not being 
verified.
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Paul Fertser 10 months ago
Hello Hari,

On Tue, Apr 08, 2025 at 10:01:03AM -0700, Hari Kalavakunta wrote:
> On 4/7/2025 2:44 PM, Sam Mendoza-Jonas wrote:
> > On 8/04/2025 4:19 am, kalavakunta.hari.prasad@gmail.com wrote:
> > 
> > Looking at e.g. DSP0222 1.2.0a, you're right about the field widths, but
> > it's not particularly explicit about whether the full 64 bits is used.
> > I'd assume so, but do you see the upper bits of e.g. the packet counters
> > return expected data? Otherwise looks good.
> > 
> It is possible that these statistics have not been previously explored or
> utilized, which may explain why they went unnoticed. As you pointed out, the
> checksum offset within the struct is not currently being checked, and
> similarly, the returned packet sizes are also not being verified.

Can you please add the checks so that we are sure that hardware,
software and the specification all match after your fixes?

Also, please do provide the example counter values read from real
hardware (even if they're not yet exposed properly you can still
obtain them with some hack; don't forget to mention what network card
they were read from).
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/8/2025 11:26 AM, Paul Fertser wrote:
> Can you please add the checks so that we are sure that hardware,
> software and the specification all match after your fixes?
Sure, I can give a try. Could you please provide an example or guideline 
that I can use as a reference for proper alignment?
> 
> Also, please do provide the example counter values read from real
> hardware (even if they're not yet exposed properly you can still
> obtain them with some hack; don't forget to mention what network card
> they were read from).
Our verification process is centered on confirming that the counter 
values are accurately populated within the ncsi_channel_stats struct, 
specifically in the ncsi_rsp_handler_gcps function. This verification is 
conducted using synthesized statistics, rather than actual data from a 
network card. Sure, I will provide NCSI packet capture showing the 
synthesized data used for testing by end of the day.
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Paul Fertser 10 months ago
On Tue, Apr 08, 2025 at 11:53:47AM -0700, Hari Kalavakunta wrote:
> On 4/8/2025 11:26 AM, Paul Fertser wrote:
> > Can you please add the checks so that we are sure that hardware,
> > software and the specification all match after your fixes?
>
> Sure, I can give a try. Could you please provide an example or guideline
> that I can use as a reference for proper alignment?

Well, there's ncsi_validate_rsp_pkt() and also some handlers use
netdev_warn() or netdev_err() as appropriate and in any case they do
not try to use the returned data if it didn't pass validation and
return an error instead.

> > Also, please do provide the example counter values read from real
> > hardware (even if they're not yet exposed properly you can still
> > obtain them with some hack; don't forget to mention what network card
> > they were read from).
>
> Our verification process is centered on confirming that the counter values
> are accurately populated within the ncsi_channel_stats struct, specifically
> in the ncsi_rsp_handler_gcps function. This verification is conducted using
> synthesized statistics, rather than actual data from a network card. Sure, I
> will provide NCSI packet capture showing the synthesized data used for
> testing by end of the day.

In other words, you're testing your code only with simulated data so
there's no way to guarantee it's going to work on any real life
hardware (as we know hardware doesn't always exactly match the specs)?
That's unsettling. Please do mention it in the commit log, it's an
essential point. Better yet, consider going a bit off-centre after the
regular verification and do a control run on real hardware.

After all, that's what the code is for so if it all possible it's
better to know if it does the actual job before merging (to avoid
noise from follow-up patches like yours which fix something that never
worked because it was never tested).
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/8/2025 12:19 PM, Paul Fertser wrote:

> In other words, you're testing your code only with simulated data so
> there's no way to guarantee it's going to work on any real life
> hardware (as we know hardware doesn't always exactly match the specs)?
> That's unsettling. Please do mention it in the commit log, it's an
> essential point. Better yet, consider going a bit off-centre after the
> regular verification and do a control run on real hardware.
> 
> After all, that's what the code is for so if it all possible it's
> better to know if it does the actual job before merging (to avoid
> noise from follow-up patches like yours which fix something that never
> worked because it was never tested).

I would like to request a week's time to integrate a real hardware 
interface, which will enable me to test and demonstrate end-to-end 
results. This will also allow me to identify and address any additional 
issues that may arise during the testing process. Thank you for the 
feedback.
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Paul Fertser 10 months ago
On Tue, Apr 08, 2025 at 03:02:14PM -0700, Hari Kalavakunta wrote:
> On 4/8/2025 12:19 PM, Paul Fertser wrote:
> 
> > In other words, you're testing your code only with simulated data so
> > there's no way to guarantee it's going to work on any real life
> > hardware (as we know hardware doesn't always exactly match the specs)?
> > That's unsettling. Please do mention it in the commit log, it's an
> > essential point. Better yet, consider going a bit off-centre after the
> > regular verification and do a control run on real hardware.
> > 
> > After all, that's what the code is for so if it all possible it's
> > better to know if it does the actual job before merging (to avoid
> > noise from follow-up patches like yours which fix something that never
> > worked because it was never tested).
> 
> I would like to request a week's time to integrate a real hardware
> interface, which will enable me to test and demonstrate end-to-end results.
> This will also allow me to identify and address any additional issues that
> may arise during the testing process. Thank you for the feedback.

Thank you for doing the right thing! Looking forward to your updated
patch (please do not forget to consider __be64 for the fields).
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/8/2025 3:35 PM, Paul Fertser wrote:
> On Tue, Apr 08, 2025 at 03:02:14PM -0700, Hari Kalavakunta wrote:
>> On 4/8/2025 12:19 PM, Paul Fertser wrote:
>>
>>> In other words, you're testing your code only with simulated data so
>>> there's no way to guarantee it's going to work on any real life
>>> hardware (as we know hardware doesn't always exactly match the specs)?
>>> That's unsettling. Please do mention it in the commit log, it's an
>>> essential point. Better yet, consider going a bit off-centre after the
>>> regular verification and do a control run on real hardware.
>>>
>>> After all, that's what the code is for so if it all possible it's
>>> better to know if it does the actual job before merging (to avoid
>>> noise from follow-up patches like yours which fix something that never
>>> worked because it was never tested).
>>
>> I would like to request a week's time to integrate a real hardware
>> interface, which will enable me to test and demonstrate end-to-end results.
>> This will also allow me to identify and address any additional issues that
>> may arise during the testing process. Thank you for the feedback.
> 
> Thank you for doing the right thing! Looking forward to your updated
> patch (please do not forget to consider __be64 for the fields).

I had not previously considered using __be64 for the struct 
ncsi_rsp_gcps_pkt, as it is an interface structure. I would like to seek 
your input on whether it is a good idea to use __be64 for interface 
messages. In my experience, I haven't come across implementations that 
utilize __be64. I am unsure about the portability of this approach, 
particularly with regards to the Management Controller (MC).
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Paul Fertser 10 months ago
On Tue, Apr 08, 2025 at 04:23:43PM -0700, Hari Kalavakunta wrote:
> On 4/8/2025 3:35 PM, Paul Fertser wrote:
> > Thank you for doing the right thing! Looking forward to your updated
> > patch (please do not forget to consider __be64 for the fields).
> 
> I had not previously considered using __be64 for the struct
> ncsi_rsp_gcps_pkt, as it is an interface structure. I would like to seek
> your input on whether it is a good idea to use __be64 for interface
> messages. In my experience, I haven't come across implementations that
> utilize __be64. I am unsure about the portability of this approach,
> particularly with regards to the Management Controller (MC).

I do not see why not[0][1]. What makes MC special, do you imply it
doesn't have be64_to_cpu() (be64_to_cpup() for unaligned data) or
what? If the values you get from hardware are indeed 64-bit BE clearly
open-coding conversions from them is suboptimal.

[0] https://elixir.bootlin.com/linux/v6.13.7/A/ident/__be64
[1] https://elixir.bootlin.com/linux/v6.13.7/source/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h#L155
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/9/2025 2:08 AM, Paul Fertser wrote:
> On Tue, Apr 08, 2025 at 04:23:43PM -0700, Hari Kalavakunta wrote:
>> On 4/8/2025 3:35 PM, Paul Fertser wrote:
>>> Thank you for doing the right thing! Looking forward to your updated
>>> patch (please do not forget to consider __be64 for the fields).
>>
>> I had not previously considered using __be64 for the struct
>> ncsi_rsp_gcps_pkt, as it is an interface structure. I would like to seek
>> your input on whether it is a good idea to use __be64 for interface
>> messages. In my experience, I haven't come across implementations that
>> utilize __be64. I am unsure about the portability of this approach,
>> particularly with regards to the Management Controller (MC).
> 
> I do not see why not[0][1]. What makes MC special, do you imply it
> doesn't have be64_to_cpu() (be64_to_cpup() for unaligned data) or
> what? If the values you get from hardware are indeed 64-bit BE clearly
> open-coding conversions from them is suboptimal.
> 
> [0] https://elixir.bootlin.com/linux/v6.13.7/A/ident/__be64
> [1] https://elixir.bootlin.com/linux/v6.13.7/source/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h#L155
Thank you for providing the references. I will proceed with running a 
test on real hardware with be64_to_cpu() to verify that stat mediation 
works correctly all the way into the local driver structure. If 
everything looks correct, I will submit a revised patch.
Re: [PATCH net-next 0/2] GCPS Spec Compliance Patch Set
Posted by Hari Kalavakunta 10 months ago
On 4/8/2025 4:23 PM, Hari Kalavakunta wrote:
> On 4/8/2025 3:35 PM, Paul Fertser wrote:
>> On Tue, Apr 08, 2025 at 03:02:14PM -0700, Hari Kalavakunta wrote:
>>> On 4/8/2025 12:19 PM, Paul Fertser wrote:
>>>
>>>> In other words, you're testing your code only with simulated data so
>>>> there's no way to guarantee it's going to work on any real life
>>>> hardware (as we know hardware doesn't always exactly match the specs)?
>>>> That's unsettling. Please do mention it in the commit log, it's an
>>>> essential point. Better yet, consider going a bit off-centre after the
>>>> regular verification and do a control run on real hardware.
>>>>
>>>> After all, that's what the code is for so if it all possible it's
>>>> better to know if it does the actual job before merging (to avoid
>>>> noise from follow-up patches like yours which fix something that never
>>>> worked because it was never tested).
>>>
>>> I would like to request a week's time to integrate a real hardware
>>> interface, which will enable me to test and demonstrate end-to-end 
>>> results.
>>> This will also allow me to identify and address any additional issues 
>>> that
>>> may arise during the testing process. Thank you for the feedback.
>>
>> Thank you for doing the right thing! Looking forward to your updated
>> patch (please do not forget to consider __be64 for the fields).
> 
> I had not previously considered using __be64 for the struct 
> ncsi_rsp_gcps_pkt, as it is an interface structure. I would like to seek 
> your input on whether it is a good idea to use __be64 for interface 
> messages. In my experience, I haven't come across implementations that 
> utilize __be64. I am unsure about the portability of this approach, 
> particularly with regards to the Management Controller (MC).

Here are the results from a real hardware test, which validates the patch.


a. Initiate GCPS command to the NIC-2
root@bmc:~# ./ncsi-cmd -i 3 -p 0 -c 0 raw 0x18
<7> Command: type 0x18, payload 0 bytes:
<7> Send Command, CHANNEL : 0x0 , PACKAGE : 0x0, INTERFACE: 0x008cd598
<7> Response 228 bytes: 00 01 00 3b 98 00 00 cc 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2b e5 fc e0 c0 00 00 00 00 
65 70 d5 54 00 00 00 00 09 50 66 56 00 00 00 00 03 08 a8 79 00 00 00 00 
00 3d 5c 44 00 00 00 00 00 79 38 23 00 00 00 00 00 02 fe 06 00 00 00 00 
00 00 02 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 b8 21 5f 03 8f da af 00 d8 6d 36 
00 73 a3 e7 00 17 20 70 00 00 00 00 00 00 d8 8a 00 00 0e 9c 00 56 4f 83 
00 10 17 e2 00 01 5b 76 00 13 6e a3 00 00 f8 cd 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2c 20 55 f5

b. tcpdump capture of the GCPS
GCPS Command Capture:
         0x0000:  0001 003b 1800 0000 0000 0000 0000 0000
         0x0010:  ffff e7c4 0000 0000 0000 0000 0000 0000
         0x0020:  0000 0000 0000 0000 0000 0000 0000

GCPS Response
         0x0000:  0001 003b 9800 00cc 0000 0000 0000 0000
         0x0010:  0000 0000 0000 0000 0000 0000 0000 002b
         0x0020:  e5fc e0c0 0000 0000 6570 d554 0000 0000
         0x0030:  0950 6656 0000 0000 0308 a879 0000 0000
         0x0040:  003d 5c44 0000 0000 0079 3823 0000 0000
         0x0050:  0002 fe06 0000 0000 0000 02be 0000 0000
         0x0060:  0000 0000 0000 0000 0000 0000 0000 0000
         0x0070:  0000 0000 0000 0000 0000 0000 0000 0000
         0x0080:  0000 0000 0000 0000 0000 0000 0000 0000
         0x0090:  0000 0000 00b8 215f 038f daaf 00d8 6d36
         0x00a0:  0073 a3e7 0017 2070 0000 0000 0000 d88a
         0x00b0:  0000 0e9c 0056 4f83 0010 17e2 0001 5b76
         0x00c0:  0013 6ea3 0000 f8cd 0000 0000 0000 0000
         0x00d0:  0000 0000 0000 0000 0000 0000 0000 0000
         0x00e0:  2c20 55f5

c. dmesg log (Debug purpose only) to demonstrate correct value read.
[ 1350.369741] ncsi_rsp_handler_gcps ENTRY
[ 1350.369768] ncs->hnc_rx_bytes = 188542148800 (0x2be5fce0c0)


Let us examine the "Total Bytes Received" statistic. Specifically, we'll 
look at offset 28..35 (0x1c..0x24), bytes read - '0000 002b e5fc e0c0'. 
NCSI now correctly collects this value into it's internal structure.


I just noticed a warning regarding line length in the patch submission, 
I will address this issue in version 2 of the patch. Let me know if we 
need additional information.