[PATCH] docs: iostats: Update introduction with flush fields

David Reaver posted 1 patch 10 months, 1 week ago
Documentation/admin-guide/iostats.rst | 33 +++++++++++++++------------
1 file changed, 18 insertions(+), 15 deletions(-)
[PATCH] docs: iostats: Update introduction with flush fields
Posted by David Reaver 10 months, 1 week ago
Counters for flush requests were added to the kernel in
b6866318657 ("block: add iostat counters for flush requests") [1]. While
iostats.rst was updated with descriptions for the new fields, the
introduction still mentions 15 fields instead of 17.

Correct the introduction to state that there are 17 fields instead of 15.
Also, replace the 2.4 vs. 2.6+ comparison with a distinction between
/proc/diskstats and the sysfs stat file.

Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]

Signed-off-by: David Reaver <me@davidreaver.com>
---

I noticed this small discrepancy while writing an observability tool
that uses /proc/diskstats. I did a double take because I noticed the
extra fields in my own system's /proc/diskstats while I was reading this
doc, but _before_ I got to the descriptions for fields 16 and 17.

I think the discussion of historical formats for 2.4, 2.6, and 4.18 in
this document is confusing and not very useful. If you'd like, I'm happy
to make a patch that rewrites the intro to simplify it and remove
discussion of the historical formats.

 Documentation/admin-guide/iostats.rst | 33 +++++++++++++++------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
index 609a3201fd4e..1df7961bdc89 100644
--- a/Documentation/admin-guide/iostats.rst
+++ b/Documentation/admin-guide/iostats.rst
@@ -34,6 +34,9 @@ Here are examples of these different formats::
    4.18+ diskstats:
       3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
 
+   5.5+ diskstats:
+      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0 0 0
+
 On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
 a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
 
@@ -43,21 +46,21 @@ be a better choice if you are watching a large number of disks because
 you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
 each snapshot of your disk statistics.
 
-In 2.4, the statistics fields are those after the device name. In
-the above example, the first field of statistics would be 446216.
-By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
-find just the 15 fields, beginning with 446216.  If you look at
-``/proc/diskstats``, the 15 fields will be preceded by the major and
-minor device numbers, and device name.  Each of these formats provides
-15 fields of statistics, each meaning exactly the same things.
-All fields except field 9 are cumulative since boot.  Field 9 should
-go to zero as I/Os complete; all others only increase (unless they
-overflow and wrap). Wrapping might eventually occur on a very busy
-or long-lived system; so applications should be prepared to deal with
-it. Regarding wrapping, the types of the fields are either unsigned
-int (32 bit) or unsigned long (32-bit or 64-bit, depending on your
-machine) as noted per-field below. Unless your observations are very
-spread in time, these fields should not wrap twice before you notice it.
+In ``/proc/diskstats``, the statistics fields are those after the device
+name. In the above example, the first field of statistics would
+be 446216. By contrast, in ``/sys/block/hda/stat`` you'll find just the
+17 fields, beginning with 446216. If you look at ``/proc/diskstats``,
+the 17 fields will be preceded by the major and minor device numbers,
+and device name. Each of these formats provides 17 fields of statistics,
+each meaning exactly the same things. All fields except field 9 are
+cumulative since boot. Field 9 should go to zero as I/Os complete; all
+others only increase (unless they overflow and wrap). Wrapping might
+eventually occur on a very busy or long-lived system; so applications
+should be prepared to deal with it. Regarding wrapping, the types of the
+fields are either unsigned int (32 bit) or unsigned long (32-bit or
+64-bit, depending on your machine) as noted per-field below. Unless your
+observations are very spread in time, these fields should not wrap twice
+before you notice it.
 
 Each set of stats only applies to the indicated device; if you want
 system-wide stats you'll have to find all the devices and sum them all up.

base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
Re: [PATCH] docs: iostats: Update introduction with flush fields
Posted by Randy Dunlap 10 months, 1 week ago
Hi,

On 2/13/25 5:39 PM, David Reaver wrote:
> Counters for flush requests were added to the kernel in
> b6866318657 ("block: add iostat counters for flush requests") [1]. While
> iostats.rst was updated with descriptions for the new fields, the
> introduction still mentions 15 fields instead of 17.
> 
> Correct the introduction to state that there are 17 fields instead of 15.
> Also, replace the 2.4 vs. 2.6+ comparison with a distinction between
> /proc/diskstats and the sysfs stat file.
> 
> Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]
> 
> Signed-off-by: David Reaver <me@davidreaver.com>
> ---
> 
> I noticed this small discrepancy while writing an observability tool
> that uses /proc/diskstats. I did a double take because I noticed the
> extra fields in my own system's /proc/diskstats while I was reading this
> doc, but _before_ I got to the descriptions for fields 16 and 17.
> 
> I think the discussion of historical formats for 2.4, 2.6, and 4.18 in
> this document is confusing and not very useful. If you'd like, I'm happy
> to make a patch that rewrites the intro to simplify it and remove
> discussion of the historical formats.

Please do IMO.

>  Documentation/admin-guide/iostats.rst | 33 +++++++++++++++------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
> index 609a3201fd4e..1df7961bdc89 100644
> --- a/Documentation/admin-guide/iostats.rst
> +++ b/Documentation/admin-guide/iostats.rst
> @@ -34,6 +34,9 @@ Here are examples of these different formats::
>     4.18+ diskstats:
>        3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
>  
> +   5.5+ diskstats:
> +      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0 0 0
> +
>  On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
>  a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
>  
> @@ -43,21 +46,21 @@ be a better choice if you are watching a large number of disks because
>  you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
>  each snapshot of your disk statistics.
>  
> -In 2.4, the statistics fields are those after the device name. In
> -the above example, the first field of statistics would be 446216.
> -By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
> -find just the 15 fields, beginning with 446216.  If you look at
> -``/proc/diskstats``, the 15 fields will be preceded by the major and
> -minor device numbers, and device name.  Each of these formats provides
> -15 fields of statistics, each meaning exactly the same things.
> -All fields except field 9 are cumulative since boot.  Field 9 should
> -go to zero as I/Os complete; all others only increase (unless they
> -overflow and wrap). Wrapping might eventually occur on a very busy
> -or long-lived system; so applications should be prepared to deal with
> -it. Regarding wrapping, the types of the fields are either unsigned
> -int (32 bit) or unsigned long (32-bit or 64-bit, depending on your
> -machine) as noted per-field below. Unless your observations are very
> -spread in time, these fields should not wrap twice before you notice it.
> +In ``/proc/diskstats``, the statistics fields are those after the device
> +name. In the above example, the first field of statistics would
> +be 446216. By contrast, in ``/sys/block/hda/stat`` you'll find just the
> +17 fields, beginning with 446216. If you look at ``/proc/diskstats``,
> +the 17 fields will be preceded by the major and minor device numbers,
> +and device name. Each of these formats provides 17 fields of statistics,
> +each meaning exactly the same things. All fields except field 9 are
> +cumulative since boot. Field 9 should go to zero as I/Os complete; all
> +others only increase (unless they overflow and wrap). Wrapping might
> +eventually occur on a very busy or long-lived system; so applications

I prefer a comma instead of semi-colon above. Yes, I know, it was already
like this.

> +should be prepared to deal with it. Regarding wrapping, the types of the
> +fields are either unsigned int (32 bit) or unsigned long (32-bit or
> +64-bit, depending on your machine) as noted per-field below. Unless your
> +observations are very spread in time, these fields should not wrap twice
> +before you notice it.
>  
>  Each set of stats only applies to the indicated device; if you want
>  system-wide stats you'll have to find all the devices and sum them all up.
> 
> base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
> 

LGTM. Thanks.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

-- 
~Randy
[PATCH] docs: iostats: Rewrite intro, remove outdated formats
Posted by David Reaver 10 months, 1 week ago
The discussion of file formats for very old kernel versions obscured the
key information in this document. Additionally, the introduction was
missing a discussion of flush fields added in b6866318657 ("block: add
iostat counters for flush requests") [1].

Rewrite the introduction to discuss only the current kernel's disk I/O stat
file formats. Also, clean up wording to be more concise.

Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]

Signed-off-by: David Reaver <me@davidreaver.com>
---

Thanks for the encouragement Randy. Here is a rewrite of the intro.

This patch is mutually exclusive with the original patch I started this
thread with. Let me know if I should submit it as a standalone thread.
(I'm fairly new to contributing to the kernel.)

 Documentation/admin-guide/iostats.rst | 92 +++++++++++----------------
 1 file changed, 36 insertions(+), 56 deletions(-)

diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
index 609a3201fd4e..8e205c8afd80 100644
--- a/Documentation/admin-guide/iostats.rst
+++ b/Documentation/admin-guide/iostats.rst
@@ -2,62 +2,42 @@
 I/O statistics fields
 =====================
 
-Since 2.4.20 (and some versions before, with patches), and 2.5.45,
-more extensive disk statistics have been introduced to help measure disk
-activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
-the work for you, but in case you are interested in creating your own
-tools, the fields are explained here.
-
-In 2.4 now, the information is found as additional fields in
-``/proc/partitions``.  In 2.6 and upper, the same information is found in two
-places: one is in the file ``/proc/diskstats``, and the other is within
-the sysfs file system, which must be mounted in order to obtain
-the information. Throughout this document we'll assume that sysfs
-is mounted on ``/sys``, although of course it may be mounted anywhere.
-Both ``/proc/diskstats`` and sysfs use the same source for the information
-and so should not differ.
-
-Here are examples of these different formats::
-
-   2.4:
-      3     0   39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      3     1    9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
-
-   2.6+ sysfs:
-      446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      35486    38030    38030    38030
-
-   2.6+ diskstats:
-      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      3    1   hda1 35486 38030 38030 38030
-
-   4.18+ diskstats:
-      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
-
-On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
-a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
-
-The advantage of one over the other is that the sysfs choice works well
-if you are watching a known, small set of disks.  ``/proc/diskstats`` may
-be a better choice if you are watching a large number of disks because
-you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
-each snapshot of your disk statistics.
-
-In 2.4, the statistics fields are those after the device name. In
-the above example, the first field of statistics would be 446216.
-By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
-find just the 15 fields, beginning with 446216.  If you look at
-``/proc/diskstats``, the 15 fields will be preceded by the major and
-minor device numbers, and device name.  Each of these formats provides
-15 fields of statistics, each meaning exactly the same things.
-All fields except field 9 are cumulative since boot.  Field 9 should
-go to zero as I/Os complete; all others only increase (unless they
-overflow and wrap). Wrapping might eventually occur on a very busy
-or long-lived system; so applications should be prepared to deal with
-it. Regarding wrapping, the types of the fields are either unsigned
-int (32 bit) or unsigned long (32-bit or 64-bit, depending on your
-machine) as noted per-field below. Unless your observations are very
-spread in time, these fields should not wrap twice before you notice it.
+The kernel exposes disk statistics via ``/proc/diskstats`` and
+``/sys/block/<device>/stat``. These stats are usually accessed via tools
+such as ``sar`` and ``iostat``.
+
+Here are examples using a disk with two partitions::
+
+   /proc/diskstats:
+     259       0 nvme0n1 255999 814 12369153 47919 996852 81 36123024 425995 0 301795 580470 0 0 0 0 60602 106555
+     259       1 nvme0n1p1 492 813 17572 96 848 81 108288 210 0 76 307 0 0 0 0 0 0
+     259       2 nvme0n1p2 255401 1 12343477 47799 996004 0 36014736 425784 0 344336 473584 0 0 0 0 0 0
+
+   /sys/block/nvme0n1/stat:
+     255999 814 12369153 47919 996858 81 36123056 426009 0 301809 580491 0 0 0 0 60605 106562
+
+   /sys/block/nvme0n1/nvme0n1p1/stat:
+     492 813 17572 96 848 81 108288 210 0 76 307 0 0 0 0 0 0
+
+Both files contain the same 17 statistics. ``/sys/block/<device>/stat``
+contains the fields for ``<device>``. In ``/proc/diskstats`` the fields
+are prefixed with the major and minor device numbers and the device
+name. In the example above, the first stat value for ``nvme0n1`` is
+255999 in both files.
+
+The sysfs ``stat`` file is efficient for monitoring a small, known set
+of disks. If you're tracking a large number of devices,
+``/proc/diskstats`` is often the better choice since it avoids the
+overhead of opening and closing multiple files for each snapshot.
+
+All fields are cumulative, monotonic counters that start at zero at
+boot, except for field 9, which resets to zero as I/Os complete. Other
+fields only increase unless they overflow and wrap. Wrapping may occur
+on long-running or high-load systems, so applications should handle this
+properly. Field types are either 32-bit unsigned integers or unsigned
+longs, which may be 32-bit or 64-bit depending on the architecture. As
+long as observations are taken at reasonable intervals, wraparounds
+should be rare.
 
 Each set of stats only applies to the indicated device; if you want
 system-wide stats you'll have to find all the devices and sum them all up.

base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
Re: [PATCH] docs: iostats: Rewrite intro, remove outdated formats
Posted by Jonathan Corbet 10 months, 1 week ago
David Reaver <me@davidreaver.com> writes:

> The discussion of file formats for very old kernel versions obscured the
> key information in this document. Additionally, the introduction was
> missing a discussion of flush fields added in b6866318657 ("block: add
> iostat counters for flush requests") [1].
>
> Rewrite the introduction to discuss only the current kernel's disk I/O stat
> file formats. Also, clean up wording to be more concise.
>
> Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]
>
> Signed-off-by: David Reaver <me@davidreaver.com>
> ---
>
> Thanks for the encouragement Randy. Here is a rewrite of the intro.
>
> This patch is mutually exclusive with the original patch I started this
> thread with. Let me know if I should submit it as a standalone thread.
> (I'm fairly new to contributing to the kernel.)

As a separate thread is generally better; no need to resend, though, if
there are no other changes.

But ... I'm not quite sure what "mutually exclusive" means here.  That
they don't conflict, or that they cannot both be applied...?

Thanks,

jon
Re: [PATCH] docs: iostats: Rewrite intro, remove outdated formats
Posted by David Reaver 10 months, 1 week ago
Jonathan Corbet <corbet@lwn.net> writes:

>
> As a separate thread is generally better; no need to resend, though, if
> there are no other changes.
>

Understood! Thanks.

> But ... I'm not quite sure what "mutually exclusive" means here.  That
> they don't conflict, or that they cannot both be applied...?

Sorry, bad wording :) This patch conflicts with the original patch since
I rewrote that whole paragraph, so I ignore the first patch I sent.

Thanks,
David Reaver
Re: [PATCH] docs: iostats: Rewrite intro, remove outdated formats
Posted by Bagas Sanjaya 10 months, 1 week ago
On Thu, Feb 13, 2025 at 09:14:30PM -0800, David Reaver wrote:
> The discussion of file formats for very old kernel versions obscured the
> key information in this document. Additionally, the introduction was
> missing a discussion of flush fields added in b6866318657 ("block: add
> iostat counters for flush requests") [1].
> 
> Rewrite the introduction to discuss only the current kernel's disk I/O stat
> file formats. Also, clean up wording to be more concise.
> 
> Link: https://lore.kernel.org/lkml/157433282607.7928.5202409984272248322.stgit@buzz/T/ [1]
> 
> Signed-off-by: David Reaver <me@davidreaver.com>
> ---
> 
> Thanks for the encouragement Randy. Here is a rewrite of the intro.
> 
> This patch is mutually exclusive with the original patch I started this
> thread with. Let me know if I should submit it as a standalone thread.
> (I'm fairly new to contributing to the kernel.)

This is [PATCH v2] so the next version should be [PATCH v3] (sent as
separate thread).

> +All fields are cumulative, monotonic counters that start at zero at
> +boot, except for field 9, which resets to zero as I/Os complete. Other
> +fields only increase unless they overflow and wrap. Wrapping may occur
> +on long-running or high-load systems, so applications should handle this
> +properly. Field types are either 32-bit unsigned integers or unsigned
> +longs, which may be 32-bit or 64-bit depending on the architecture. As
> +long as observations are taken at reasonable intervals, wraparounds
> +should be rare.

So on x86_64 the field type is 32-bit-sized (u32) instead of u64, right?

Confused...

-- 
An old man doll... just what I always wanted! - Clara