mirror of
https://github.com/percona/percona-toolkit.git
synced 2026-04-14 01:00:23 +08:00
Update pt-stalk docs more.
This commit is contained in:
73
bin/pt-stalk
73
bin/pt-stalk
@@ -1441,11 +1441,11 @@ pt-stalk - Collect forensic data about MySQL when problems occur.
|
|||||||
|
|
||||||
Usage: pt-stalk [OPTIONS] [-- MYSQL OPTIONS]
|
Usage: pt-stalk [OPTIONS] [-- MYSQL OPTIONS]
|
||||||
|
|
||||||
pt-stalk watches for a trigger condition to occur, then collects data
|
pt-stalk waits for a trigger condition to occur, then collects data
|
||||||
to help diagnose problems. The tool is designed to run as a daemon with root
|
to help diagnose problems. The tool is designed to run as a daemon with root
|
||||||
privileges, so that you can diagnose intermittent problems that you cannot
|
privileges, so that you can diagnose intermittent problems that you cannot
|
||||||
observe directly. You can also use it to execute a custom command, or to
|
observe directly. You can also use it to execute a custom command, or to
|
||||||
collect data on demand without waiting for the stalk trigger to occur.
|
collect data on demand without waiting for the trigger to occur.
|
||||||
|
|
||||||
=head1 RISKS
|
=head1 RISKS
|
||||||
|
|
||||||
@@ -1476,16 +1476,20 @@ chance to see the system when it happens. How do you solve intermittent MySQL
|
|||||||
problems when you can't observe them? That's why pt-stalk exists. In addition to
|
problems when you can't observe them? That's why pt-stalk exists. In addition to
|
||||||
using it when there's a known problem on your servers, it is a good idea to run
|
using it when there's a known problem on your servers, it is a good idea to run
|
||||||
pt-stalk all the time, even when you think nothing is wrong. You will
|
pt-stalk all the time, even when you think nothing is wrong. You will
|
||||||
appreciate the data it gathers when a problem occurs, because problems such as
|
appreciate the data it collects when a problem occurs, because problems such as
|
||||||
MySQL lockups or spikes of activity typically leave no evidence to use in root
|
MySQL lockups or spikes in activity typically leave no evidence to use in root
|
||||||
cause analysis.
|
cause analysis.
|
||||||
|
|
||||||
This tool does two things: it watches a server (typically MySQL) for a trigger
|
pt-stalk does two things: it watches a MySQL server and waits for a trigger
|
||||||
to occur, and it gathers diagnostic data. To use it effectively, you need to
|
condition to occur, and it collects diagnostic data when that trigger occurs.
|
||||||
define a good trigger condition. A good trigger is sensitive enough to fire
|
To avoid false-positives caused by short-lived problems, the trigger condition
|
||||||
reliably when a problem occurs, so that you don't miss a chance to solve
|
must be true at least L<"--cycles"> times before a L<"--collect"> is triggered.
|
||||||
problems. On the other hand, a good trigger isn't prone to false positives, so
|
|
||||||
you don't gather information when the server is functioning normally.
|
To use pt-stalk effectively, you need to define a good trigger. A good trigger
|
||||||
|
is sensitive enough to fire reliably when a problem occurs, so that you don't
|
||||||
|
miss a chance to solve problems. On the other hand, a good trigger isn't
|
||||||
|
prone to false positives, so you don't gather information when the server
|
||||||
|
is functioning normally.
|
||||||
|
|
||||||
The most reliable triggers for MySQL tend to be the number of connections to the
|
The most reliable triggers for MySQL tend to be the number of connections to the
|
||||||
server, and the number of queries running concurrently. These are available in
|
server, and the number of queries running concurrently. These are available in
|
||||||
@@ -1495,14 +1499,15 @@ Threads_running usually is. Your job, as the tool's user, is to define an
|
|||||||
appropriate trigger condition for the tool. Choose carefully, because the
|
appropriate trigger condition for the tool. Choose carefully, because the
|
||||||
quality of your results will depend on the trigger you choose.
|
quality of your results will depend on the trigger you choose.
|
||||||
|
|
||||||
You can define the trigger with the L<"--function">, L<"--variable">, and
|
You define the trigger with the L<"--function">, L<"--variable">,
|
||||||
L<"--threshold"> options, among others. Please read the documentation for
|
L<"--threshold">, and L<"--cycles"> options. The default values
|
||||||
L<"--function"> to learn how to do this.
|
for these options define a reasonable trigger, but you should adjust
|
||||||
|
or change them to suite your particular system and needs.
|
||||||
|
|
||||||
The pt-stalk tool, by default, simply watches MySQL repeatedly until the trigger
|
By default, pt-stalk tool watches MySQL forever until the trigger occurs,
|
||||||
becomes true. It then gathers diagnostics for a while, and sleeps afterwards for
|
then it collects diagnostic data for a while, and sleeps afterwards to avoid
|
||||||
some time to prevent repeatedly gathering data if the condition remains true.
|
repeatedly collecting data if the trigger remains true. The general order of
|
||||||
In crude pseudocode, omitting some subtleties,
|
operations is:
|
||||||
|
|
||||||
while true; do
|
while true; do
|
||||||
if --variable from --function > --threshold; then
|
if --variable from --function > --threshold; then
|
||||||
@@ -1539,15 +1544,15 @@ In crude pseudocode, omitting some subtleties,
|
|||||||
|
|
||||||
The diagnostic data is written to files whose names begin with a timestamp, so
|
The diagnostic data is written to files whose names begin with a timestamp, so
|
||||||
you can distinguish samples from each other in case the tool collects data
|
you can distinguish samples from each other in case the tool collects data
|
||||||
multiple times. The pt-sift tool is designed to help you browse and analyze the
|
multiple times. The pt-sift tool is designed to help you browse and analyze
|
||||||
resulting samples of data.
|
the resulting data samples.
|
||||||
|
|
||||||
Although this sounds simple enough, in practice there are a number of
|
Although this sounds simple enough, in practice there are a number of
|
||||||
subtleties, such as detecting when the disk is beginning to fill up so that the
|
subtleties, such as detecting when the disk is beginning to fill up so that the
|
||||||
tool doesn't cause the server to run out of disk space. This tool handles these
|
tool doesn't cause the server to run out of disk space. This tool handles these
|
||||||
types of potential problems, so it's a good idea to use this tool instead of
|
types of potential problems, so it's a good idea to use this tool instead of
|
||||||
writing something from scratch and possibly experiencing some of the hazards
|
writing something from scratch and possibly experiencing some of the hazards
|
||||||
this tool is designed to prevent.
|
this tool is designed to avoid.
|
||||||
|
|
||||||
=head1 CONFIGURING
|
=head1 CONFIGURING
|
||||||
|
|
||||||
@@ -1555,15 +1560,15 @@ You can use standard Percona Toolkit configuration files to set command line
|
|||||||
options.
|
options.
|
||||||
|
|
||||||
You will probably want to run the tool as a daemon and customize at least the
|
You will probably want to run the tool as a daemon and customize at least the
|
||||||
diagnostic threshold. Here's a sample configuration file for triggering when
|
L<"--threshold">. Here's a sample configuration file for triggering when
|
||||||
there are more than 20 queries running at once:
|
there are more than 20 queries running at once:
|
||||||
|
|
||||||
daemonize
|
daemonize
|
||||||
threshold=20
|
threshold=20
|
||||||
|
|
||||||
If you're not running the tool as it's designed (as a root user, daemonized)
|
If you don't run the tool as root, then you will need specify several options,
|
||||||
then you'll need to set several options, such as L<"--dest">, to locations that
|
such as L<"--pid">, L<"--log">, and L<"--dest">, else the tool will probably
|
||||||
are writable by non-root users.
|
fail to start.
|
||||||
|
|
||||||
=head1 OPTIONS
|
=head1 OPTIONS
|
||||||
|
|
||||||
@@ -1573,8 +1578,8 @@ are writable by non-root users.
|
|||||||
|
|
||||||
default: yes; negatable: yes
|
default: yes; negatable: yes
|
||||||
|
|
||||||
Collect diagnostic data when the L<"--stalk"> trigger occurs. Specify
|
Collect diagnostic data when the trigger occurs. Specify C<--no-collect>
|
||||||
C<--no-collect> to make the tool watch the system but not collect data.
|
to make the tool watch the system but not collect data.
|
||||||
|
|
||||||
See also L<"--stalk">.
|
See also L<"--stalk">.
|
||||||
|
|
||||||
@@ -1673,23 +1678,23 @@ margins are satisfied.
|
|||||||
|
|
||||||
type: string; default: status
|
type: string; default: status
|
||||||
|
|
||||||
What to watch for L<"--stalk"> trigger. The default value watches
|
What to watch for the trigger. The default value watches
|
||||||
C<SHOW GLOBAL STATUS>, but you can also watch C<SHOW PROCESSLIST> and specify
|
C<SHOW GLOBAL STATUS>, but you can also watch C<SHOW PROCESSLIST> and specify
|
||||||
a file with your own custom code. This function supplies the value of
|
a file with your own custom code. This function supplies the value of
|
||||||
L<"--variable">, which is then compared against L<"--threshold"> to see if the
|
L<"--variable">, which is then compared against L<"--threshold"> to see if the
|
||||||
L<"--stalk"> trigger condition is met. Additional options may be required as
|
the trigger condition is met. Additional options may be required as
|
||||||
well; see below. Possible values are:
|
well; see below. Possible values are:
|
||||||
|
|
||||||
=over
|
=over
|
||||||
|
|
||||||
=item * status
|
=item * status
|
||||||
|
|
||||||
Watch C<SHOW GLOBAL STATUS> for the L<"--stalk"> trigger. The value of
|
Watch C<SHOW GLOBAL STATUS> for the trigger. The value of
|
||||||
L<"--variable"> then defines which status counter is the trigger.
|
L<"--variable"> then defines which status counter is the trigger.
|
||||||
|
|
||||||
=item * processlist
|
=item * processlist
|
||||||
|
|
||||||
Watch C<SHOW FULL PROCESSLIST> for the L<"--stalk"> trigger. The trigger
|
Watch C<SHOW FULL PROCESSLIST> for the trigger. The trigger
|
||||||
value is the count of processes whose L<"--variable"> column matches the
|
value is the count of processes whose L<"--variable"> column matches the
|
||||||
L<"--match"> option. For example, to trigger L<"--collect"> when more than
|
L<"--match"> option. For example, to trigger L<"--collect"> when more than
|
||||||
10 processes are in the "statistics" state, specify:
|
10 processes are in the "statistics" state, specify:
|
||||||
@@ -1733,14 +1738,14 @@ Print help and exit.
|
|||||||
|
|
||||||
type: int; default: 1
|
type: int; default: 1
|
||||||
|
|
||||||
How often to check the L<"--stalk"> trigger, in seconds.
|
How often to check the if trigger is true, in seconds.
|
||||||
|
|
||||||
=item --iterations
|
=item --iterations
|
||||||
|
|
||||||
type: int
|
type: int
|
||||||
|
|
||||||
How many times to L<"--collect"> diagnostic data. By default, the tool
|
How many times to L<"--collect"> diagnostic data. By default, the tool
|
||||||
runs forever and collects data every time the L<"--stalk"> trigger occurs.
|
runs forever and collects data every time the trigger occurs.
|
||||||
Specify L<"--iterations"> to collect data a limited number of times.
|
Specify L<"--iterations"> to collect data a limited number of times.
|
||||||
This option is also useful with C<--no-stalk> to collect data once and
|
This option is also useful with C<--no-stalk> to collect data once and
|
||||||
exit, for example.
|
exit, for example.
|
||||||
@@ -1791,7 +1796,7 @@ Called before stalking.
|
|||||||
|
|
||||||
=item before_collect
|
=item before_collect
|
||||||
|
|
||||||
Called when the L<"--stalk"> trigger occurs, before running a L<"--collect">
|
Called when the trigger occurs, before running a L<"--collect">
|
||||||
subprocesses in the background.
|
subprocesses in the background.
|
||||||
|
|
||||||
=item after_collect
|
=item after_collect
|
||||||
@@ -1857,7 +1862,7 @@ purged.
|
|||||||
|
|
||||||
type: int; default: 30
|
type: int; default: 30
|
||||||
|
|
||||||
How long to L<"--collect"> diagnostic data when the L<"--stalk"> trigger occurs.
|
How long to L<"--collect"> diagnostic data when the trigger occurs.
|
||||||
The value is in seconds and should not be longer than L<"--sleep">. It is
|
The value is in seconds and should not be longer than L<"--sleep">. It is
|
||||||
usually not necessary to change this; if the default 30 seconds doesn't
|
usually not necessary to change this; if the default 30 seconds doesn't
|
||||||
collect enough data, running longer is not likely to help because the system
|
collect enough data, running longer is not likely to help because the system
|
||||||
|
|||||||
Reference in New Issue
Block a user