initial documentation

2025-09-10 13:11:32 +00:00 · 2012-01-25 16:11:09 -05:00
parent ad552756b2
commit 5acec0d38e
1 changed files with 170 additions and 96 deletions
--- a/bin/pt-diskstats
+++ b/bin/pt-diskstats
@@ -3193,15 +3193,6 @@ sub help {
   ------------------- Press any key to continue -----------------------
 HELP
   print $help;
 =begin IGNORE
   my $lines = $help =~ tr/\n//;
   while ( $lines-- ) {
      $Diskstats::printed_lines--;
      print_header(%args) unless $Diskstats::printed_lines;
   }
 =cut
   pause(%args);
   return;
 }
@@ -3421,14 +3412,14 @@ if ( !caller ) { exit main(@ARGV); }
 =head1 NAME
-pt-diskstats - Aggregate and summarize F</proc/diskstats>.
+pt-diskstats - An interactive I/O monitoring tool for GNU/Linux.
 =head1 SYNOPSIS
 Usage: pt-diskstats [OPTION...] [FILES]
-pt-diskstats reads F</proc/diskstats> periodically, or files with the
+pt-diskstats prints disk I/O statistics for GNU/Linux.  It is somewhat similar
-contents of F</proc/diskstats>, aggregates the data, and prints it nicely.
+to iostat, but it is interactive and more detailed.
 =head1 RISKS
@@ -3437,7 +3428,7 @@ whether known or unknown, of using this tool.  The two main categories of risks
 are those created by the nature of the tool (e.g. read-only tools vs. read-write
 tools) and those created by bugs.
-pt-diskstats is a read-only tool.  It should be very low-risk.
+pt-diskstats simply reads /proc/diskstats.  It should be very low-risk.
 At the time of this release, we know of no bugs that could cause serious harm
 to users.
@@ -3451,87 +3442,133 @@ See also L<"BUGS"> for more information on filing bugs and getting help.
 =head1 DESCRIPTION
-pt-diskstats tool is similar to iostat, but has some advantages. It separates
+The pt-diskstats tool is similar to iostat, but has some advantages. It prints
-reads and writes, for example, and computes some things that iostat does in
+read and write statistics separately, and has more columns. It is menu-driven
-either incorrect or confusing ways.  It is also menu-driven and interactive
+and interactive, with several different ways to aggregate the data. It
-with several different ways to aggregate the data, and integrates well with
+integrates well with the L<pt-stalk> tool. It also does the "right thing" by
-the L<pt-collect> tool. These properties make it very convenient for quickly
+default, such as hiding disks that are idle.  These properties make it very
-drilling down into I/O performance at the desired level of granularity.
+convenient for quickly drilling down into I/O performance and inspecting disk
 behavior.
-This program works in two main modes. One way is to process a file with saved
+This program works in two modes. The default is to collect samples of
-disk statistics, which you specify on the command line.  The other way is to
+/proc/diskstats and print out the formatted statistics at intervals. The other
-start a background process gathering samples at intervals and saving them into
+mode is to process a file that contains saved samples of /proc/diskstats; there
-a file, and process this file in the foreground.  In both cases, the tool is
+is a shell script later in this documentation that shows how to collect such a
-interactively controlled by keystrokes, so you can redisplay and slice the
+file.
 data flexibly and easily.  If the tool is not attached to a terminal, it
 doesn't run interactively; it just processes and prints its output, then exits.
 Otherwise it loops until you exit with the 'q' key.
-If you press the '?' key, you will bring up the interactive help menu that
+In both cases, the tool is interactively controlled by keystrokes, so you can
-shows which keys control the program.
+redisplay and slice the data flexibly and easily.  It loops forever, until you
 exit with the 'q' key.  If you press the '?' key, you will bring up the
 interactive help menu that shows which keys control the program.
-Files should have this format:
+When the program is gathering samples of /proc/diskstats and refreshing its
 display, it prints information about the newest sample each time it refreshes.
 When it is operating on a file of saved samples, it redraws the entire file's
 contents every time you change an option.
-   TS <timestamp>  <-- must start with a TS line.
+The program doesn't print information about every disk device on the system. It
-   <contents of /proc/diskstats>
+hides devices that it has never observed to have any activity.  You can enable
-   TS <timestamp>
+and disable this by pressing the 'i' key.
   <contents of /proc/diskstats>
   ... et cetera
 Note that previously the format was backwards -- It would put the timestamp
 at the bottom of each sample, not the top. This was doubly troublesome:
 It was inconsistent with how the rest of the Toolkit deals with timestamps,
 and allowed malformed data to sit in the bottom of the file and give incorrect
 results.
 See L<http://aspersa.googlecode.com/svn/html/diskstats.html> for a detailed
 example of using the tool.
 =head1 OUTPUT
 The program's output looks like the following sample, which is too wide for this
 manual page, so we have formatted it as several samples with continuations:
  #ts device rd_s rd_avkb rd_mb_s rd_io_s rd_mrg rd_cnc rd_rt ...
 {10} sda     0.5     4.0     0.0     0.1     0%    0.0  15.6 ...
 {10} sdb     0.0     0.0     0.0     0.0     0%    0.0   0.0 ...
 {10} dm-0    0.0     0.0     0.0     0.0     0%    0.0   0.0 ...
 {10} dm-1    0.5     4.0     0.0     0.1     0%    0.0  15.6 ...
  #ts device ... wr_s wr_avkb wr_mb_s wr_io_s wr_mrg wr_cnc wr_rt ...
 {10} sda    ... 30.6     6.7     0.2     6.5    40%    0.7  22.8 ...
 {10} sdb    ...  1.7    17.8     0.0     0.0    77%    0.0   0.8 ...
 {10} dm-0   ...  2.5     4.0     0.0     0.1     0%    0.0   2.6 ...
 {10} dm-1   ... 38.2     4.0     0.1     7.6     0%    0.8  21.2 ...
  #ts device ... busy in_prg io_s qtime stime
 {10} sda    ...   2%      0  6.6   0.0   0.0
 {10} sdb    ...   0%      0  0.0   0.0   0.0
 {10} dm-0   ...   0%      0  0.1   0.0   0.0
 {10} dm-1   ...   2%      0  7.7   0.0   0.0
 The columns are as follows:
 =over
 =item #ts
-The number of seconds of samples in the line.  If there is only one, then
+This column's contents vary depending on the tool's aggregation mode.  In the
-the timestamp itself is shown, without the {curly braces}.
+default mode, when each line contains information about a single disk but
 possibly aggregates across several samples from that disk, this column shows the
 number of samples that were included into the line of output, in {curly braces}.
 In the example shown, each line of output aggregates {10} samples of
 /proc/diskstats.
 In the "all" group-by mode, this column shows timestamp offsets, relative to the
 time the tool began aggregating or the timestamp of the previous lines printed,
 depending on the mode.  The output can be confusing to explain, but it's rather
 intuitive when you see the lines appearing on your screen periodically.
 Similarly, in "sample" group-by mode, the number indicates the total time span
 that is grouped into each sample.
 =item device
 The device name.  If there is more than one device, then instead the number
 of devices aggregated into the line is shown, in {curly braces}.
 =item rd_s
 The average number of reads per second.  This is the number of I/O requests that
 were sent to the block device. However, the requests may be merged by the I/O
 scheduler, so they might be sent to the physical device differently.
 =item rd_avkb
 The average size of the reads, in kilobytes.
 =item rd_mb_s
 The average number of megabytes read per second.
 =item rd_io_s
-The number of IO reads per second, average, during the sampled interval.
+The average number of IO reads per second.  This is the number that is actually
 sent to the physical device after merging adjacent requests and any other
 processing in the queue.
 =item rd_mrg
 The percentage of read requests that were merged together in the disk
 scheduler before reaching the physical device.
 =item rd_cnc
-The average concurrency of the read operations, as computed by Little's Law
+The average concurrency of the read operations, as computed by Little's Law.
-(a.k.a. queueing theory).
+This is the end-to-end concurrency, including time spent in the queue.
 =item rd_rt
-The average response time of the read operations, in milliseconds.
+The average response time of the read operations, in milliseconds.  This is the
 end-to-end response time, including time spent in the queue.  It is the response
 time that the application making I/O requests sees.
-=item wr_mb_s
+=item wr_s, wr_avkb, wr_mb_s, wr_io_s, wr_mrg, wr_cnc, wr_rt
-IO writes per second, average.
+These columns show write activity, and they match the corresponding columns for
-
+read activity.
 =item wr_cnc
 Write concurrency, similar to read concurrency.
 =item wr_rt
 Write response time, similar to read response time.
 =item busy
 The fraction of time that the device had at least one request in progress;
-this is what iostat calls %util (which is a misleading name).
+this is what iostat calls %util.  It cannot exceed 100% unless there is a
 rounding error, but it is a common mistake to think that a device that's busy
 all the time is saturated.  A device such as a RAID volume should support
 concurrency higher than 1, and solid-state drives can support very high
 concurrency.  Concurrency can grow without bound, and is a more reliable
 indicator of how loaded the device really is.
 =item in_prg
@@ -3540,38 +3577,58 @@ concurrencies, which are averages that are generated from reliable numbers,
 this number is an instantaneous sample, and you can see that it might
 represent a spike of requests, rather than the true long-term average.
-=back
+=item ios_s
-In addition to the above columns, there are a few columns that are hidden by
+The average throughput of the physical device, in I/O operations per second.
-default. If you press the 'c' key, and then press Enter, you will blank out
+This column can be used to help you understand how much activity the underlying
-the regular expression pattern that selects columns to display, and you will
+device is actually doing.
 then see the extra columns:
-=over
+=item qtime
-=item rd_s
+The average queue time; that is, time a request spends in the device scheduler
 queue before being sent to the physical device.  This is an average over reads
 and writes.
-The number of reads per second.
+=item stime
-=item rd_avkb
+The average service time; that is, the time elapsed while the physical device
 processes the request, after the request leaves the queue.  This is an average
 over reads and writes.
-The average size of the reads, in kilobytes.
+You can compare the stime and qtime columns to see whether the response time for
-
+reads and writes is spent in the queue or on the physical device.  However, you
-=item rd_mrg
+cannot see the difference between reads and writes.  Changing the block device
-
+scheduler algorithm might improve queue time greatly.  The default algorithm,
-The percentage of read requests that were merged together in the disk
+cfq, is very bad for servers, and should only be used on laptops and
-scheduler before reaching the device.
+workstations that perform tasks such as working with spreadsheets and surfing
-
+the Internet.
 =item rd_mb_s
 The number of megabytes read per second, average, during the sampled interval.
 =item wr_s, wr_avgkb, and wr_mrg, wr_mb_s
 These are analogous to their C<rd_*> cousins.
 =back
 =head1 COLLECTING DATA
 It is straightforward to gather a sample of data for this tool.  Files should
 have this format:
   TS <timestamp>  <-- must start with a TS line.
   <contents of /proc/diskstats>
   TS <timestamp>
   <contents of /proc/diskstats>
   ... et cetera
 You can simply use pt-diskstats with L<"--save-samples"> to collect this data
 for you.  If you wish to capture samples as part of some other tool, and use
 pt-diskstats to analyze them, you can include a snippet of shell script such as
 the following:
   INTERVAL=1
   while true; do
      sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
      sleep $sleep
      date +"TS %s.%N %F %T" >> diskstats-samples.txt
      cat /proc/diskstats >> diskstats-samples.txt
   done
 =head1 OPTIONS
 This tool accepts additional command-line arguments.  Refer to the
@@ -3588,31 +3645,30 @@ first option on the command line.
 =item --columns-regex
-type: string; default: cnc|rt|busy|prg|time|io_s
+type: string; default: .
-Perl regex of which columns to include.
+Print columns that match this Perl regex.
 =item --devices-regex
 type: string
-Perl regex of which devices to include.
+Print devices that match this Perl regex.
 =item --group-by
 type: string; default: disk
-Group-by mode (default disk); specify one of the following:
+Group-by mode: disk, sample, or all.  In B<disk> mode, each line of output shows
-
+one disk device.  In B<sample> mode, each line of output shows one sample of
-   disk   - Each line of output shows one disk device.
+statistics.  In B<all> mode, each line of output shows one sample and one disk
-   sample - Each line of output shows one sample of statistics.
+device.
   all    - Each line of output shows one sample and one disk device.
 =item --sample-time
 type: int; default: 1
-In --group-by sample mode, include INTERVAL seconds of samples per group.
+In --group-by sample mode, include N seconds of samples per group.
 =item --save-samples
@@ -3624,7 +3680,7 @@ File to save diskstats samples in; these can be used for later analysis.
 type: int
-When in interactive mode, stop after N samples.
+When in interactive mode, stop after N samples.  Run forever by default.
 =item --refresh-interval
@@ -3640,7 +3696,8 @@ Show inactive devices.
 default: yes
-Print the headers as often as needed to prevent it from scrolling out of view.
+Print the headers as often as needed to prevent them from scrolling out of view.
 You can press the space bar to reprint headers at will.
 =item --help
@@ -3722,4 +3779,21 @@ This program is copyright 2010-2011 Baron Schwartz, 2011 Percona Inc.
 Feedback and improvements are welcome.
 THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
-WARRANTIES, INCLUDING, WITHOUT LIMITATION, TH
+WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 This program is free software; you can redistribute it and/or modify it under
 the terms of the GNU General Public License as published by the Free Software
 Foundation, version 2; OR the Perl Artistic License.  On UNIX and similar
 systems, you can issue `man perlgpl' or `man perlartistic' to read these
 licenses.
 You should have received a copy of the GNU General Public License along with
 this program; if not, write to the Free Software Foundation, Inc., 59 Temple
 Place, Suite 330, Boston, MA  02111-1307  USA.
 =head1 VERSION
 pt-diskstats 1.0.1
 =cut