Update documentation

This commit is contained in:
baron@percona.com
2012-03-30 20:47:56 -04:00
parent a18c08f16a
commit dacc36994e

View File

@@ -5864,12 +5864,12 @@ sub _d {
# ###########################################################################
# ###########################################################################
# MysqldumpParser package 7500
# MysqldumpParser package
# This package is a copy without comments from the original. The original
# with comments and its test file can be found in the SVN repository at,
# trunk/common/MysqldumpParser.pm
# trunk/common/t/MysqldumpParser.t
# See http://code.google.com/p/maatkit/wiki/Developers for more information.
# with comments and its test file can be found in the Bazaar repository at,
# lib/MysqldumpParser.pm
# t/lib/MysqldumpParser.t
# See https://launchpad.net/percona-toolkit for more information.
# ###########################################################################
package MysqldumpParser;
@@ -5968,12 +5968,12 @@ sub _d {
# ###########################################################################
# ###########################################################################
# SchemaQualifier package 7499
# SchemaQualifier package
# This package is a copy without comments from the original. The original
# with comments and its test file can be found in the SVN repository at,
# trunk/common/SchemaQualifier.pm
# trunk/common/t/SchemaQualifier.t
# See http://code.google.com/p/maatkit/wiki/Developers for more information.
# lib/SchemaQualifier.pm
# t/lib/SchemaQualifier.t
# See https://launchpad.net/percona-toolkit for more information.
# ###########################################################################
package SchemaQualifier;
@@ -6680,24 +6680,22 @@ if ( !caller ) { exit main(@ARGV); }
=head1 NAME
pt-table-usage - Read queries from a log and analyze how they use tables.
pt-table-usage - Analyze how queries use tables.
=head1 SYNOPSIS
Usage: pt-table-usage [OPTIONS] [FILES]
pt-table-usage reads queries from slow query logs and analyzes how they use
tables. If no FILE is specified, STDIN is read. Table usage for every query
is printed to STDOUT.
pt-table-usage reads queries from a log and analyzes how they use tables. If no
FILE is specified, it reads STDIN. It prints a report for each query.
=head1 RISKS
pt-table-use is very low risk because it only reads and examines queries from
a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option
is specified.
pt-table-use is very low risk. By default, it simply reads queries from a log.
It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended">
option.
At the time of this release, there are no known bugs that could cause serious
harm to users.
At the time of this release, we know of no bugs that could harm users.
The authoritative source for updated information is always the online issue
tracking system. Issues that affect this tool will be marked as such. You can
@@ -6708,40 +6706,32 @@ See also L<"BUGS"> for more information on filing bugs and getting help.
=head1 DESCRIPTION
pt-table-usage reads queries from slow query logs and analyzes how they use
tables. Table usage indicates more than just which tables are read from or
written to by the query, it also indicates data flow: data in and data out.
Data flow is determined by the contexts in which tables are used by the query.
A single table can be used in several different contexts in the same query.
The reported table usage for each query lists every context for every table.
This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table
usage. The L<"OUTPUT"> section lists the possible contexts and describes how
to read a table usage report.
pt-table-usage reads queries from a log and analyzes how they use tables. The
log should be in MySQL's slow query log format.
Since this tool analyzes table usage, it's important that queries use
table-qualified columns. If a query uses only one table, then all columns
must be from that table and there's no problem. But if a query uses
multiple tables and the columns are not table-qualified, then that creates a
problem that can only be solved by knowing the query's database and specifying
L<"--explain-extended">. If the slow log does not specify the database
used by the query, then you can specify a default database with L<"--database">.
There is no other way to know or guess the database, so the query will be
skipped. Secondly, if the database is known, then specifying
L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...>
C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL
(i.e. all identifiers are fully database- and/or table-qualified). For
best results, you should specify L<"--explain-extended"> and
L<"--database"> if you know that all queries use the same database.
Table usage is more than simply an indication of which tables the query reads or
writes. It also indicates data flow: data in and data out. The tool determines
the data flow by the contexts in which tables appear. A single query can use a
table in several different contexts simultaneously. The tool's output lists
every context for every table. This CONTEXT-TABLE list indicates how data flows
between tables. The L<"OUTPUT"> section lists the possible contexts and
describes how to read a table usage report.
Each query is identified in the output by either an MD5 hex checksum
of the query's fingerprint or the query's value for the specified
L<"--id-attribute">. The query ID is for parsing and storing the table
usage reports in a table that is keyed on the query ID. See L<"OUTPUT">
for more information.
The tool analyzes data flow down to the level of individual columns, so it is
helpful if columns are identified unambiguously in the query. If a query uses
only one table, then all columns must be from that table, and there's no
difficulty. But if a query uses multiple tables and the column names are not
table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by
C<SHOW WARNINGS>, to determine to which tables the columns belong.
If the tool does not know the query's default database, which can occur when the
database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this
case, you can specify a default database with L<"--database">. You can also use
the L<"--create-table-definitions"> option to help resolve ambiguities.
=head1 OUTPUT
The table usage report that is printed for each query looks similar to the
The tool prints a usage report for each table in every query, similar to the
following:
Query_id: 0x1CD27577D202A339.1
@@ -6758,45 +6748,43 @@ following:
JOIN t2
WHERE t1
Usage reports are separated by blank lines. The first line is always the
query ID: a unique ID that can be used to parse the output and store the
usage reports in a table keyed on this ID. The query ID has two parts
separated by a period: the query ID and the target table number.
The first line contains the query ID, which by default is the same as those
shown in pt-query-digest reports. It is an MD5 checksum of the query's
"fingerprint," which is what remains after removing literals, collapsing white
space, and a variety of other transformations. The query ID has two parts
separated by a period: the query ID and the table number. If you wish to use a
different value to identify the query, you can specify the L<"--id-attribute">
option.
If L<"--id-attribute"> is not specified, then query IDs are automatically
created by making an MD5 hex checksum of the query's fingerprint
(as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the
query's value for the given attribute.
The target table number starts at 1 and increments by 1 for each table that
the query affects. Only multi-table UPDATE queries can affect
multiple tables with a single query, so this number is 1 for all other types
of queries. (Multi-table DELETE queries are not supported.)
The example output above is from this query:
The previous example shows two paragraphs for a single query, not two queries.
Note that the query ID is identical for the two, but the table number differs.
The table number increments by 1 for each table that the query updates. Only
multi-table UPDATE queries can update multiple tables with a single query, so
the table number is 1 for all other types of queries. (The tool does not
support multi-table DELETE queries.) The example output above is from this
query:
UPDATE t1 AS a JOIN t2 AS b USING (id)
SET a.foo="bar", b.foo="bat"
WHERE a.id=1;
The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>,
and C<b> aliased as C<t2>. So two usage reports are printed, one for each
table, and this is indicated in the output by their common query ID but
incrementing target table number.
The C<SET> clause indicates that the query updates two tables: C<a> aliased as
C<t1>, and C<b> aliased as C<t2>.
After the first line is a variable number of CONTEXT-TABLE lines. Possible
contexts are:
After the first line, the tool prints a variable number of CONTEXT-TABLE lines.
Possible contexts are as follows:
=over
=item * SELECT
SELECT means that data is taken out of the table for one of two reasons:
to be returned to the user as part of a result set, or to be put into another
table as part of an INSERT or UPDATE. In the first case, since only SELECT
queries return result sets, a SELECT context is always listed for SELECT
queries. In the second case, data from one table is used to insert or
update rows in another table. For example, the UPDATE query in the example
above has the usage:
SELECT means that the query retrieves data from the table for one of two
reasons. The first is to be returned to the user as part of a result set. Only
SELECT queries return result sets, so the report always shows a SELECT context
for SELECT queries.
The second case is when data flows to another table as part of an INSERT or
UPDATE. For example, the UPDATE query in the example above has the usage:
SELECT DUAL
@@ -6804,9 +6792,9 @@ This refers to:
SET a.foo="bar", b.foo="bat"
DUAL is used for any values that does not originate in a table, in this case the
literal values "bar" and "bat". If that C<SET> clause were C<SET a.foo=b.foo>
instead, then the complete usage would be:
The tool uses DUAL for any values that do not originate in a table, in this case
the literal values "bar" and "bat". If that C<SET> clause were C<SET
a.foo=b.foo> instead, then the complete usage would be:
Query_id: 0x1CD27577D202A339.1
UPDATE t1
@@ -6820,20 +6808,15 @@ INSERT, indicates where the UPDATE or INSERT retrieves its data. The example
immediately above reflects an UPDATE query that updates rows in table C<t1>
with data from table C<t2>.
=item * Any other query type
=item * Any other verb
Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context.
All these types indicate that the table is written or altered in some way.
If a SELECT context follows one of these types, then data is read from the
SELECT table and written to this table. This happens, for example, with
INSERT..SELECT or UPDATE queries that set column values using values from
tables instead of constant values.
Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context. These
verbs indicate that the query modifies data in some way. If a SELECT context
follows one of these verbs, then the query reads data from the SELECT table and
writes it to this table. This happens, for example, with INSERT..SELECT or
UPDATE queries that use values from tables instead of constant values.
These query types are not supported:
SET
LOAD
multi-table DELETE
These query types are not supported: SET, LOAD, and multi-table DELETE.
=item * JOIN
@@ -6853,14 +6836,14 @@ Results in:
WHERE t1
WHERE t2
Only unique tables are listed; that is why table C<t1> is listed only once.
The tool lists only distinct tables; that is why table C<t1> is listed only
once.
=item * TLIST
The TLIST context lists tables that are accessed by the query but do not
appear in any other context. These tables are usually an implicit
full cartesian join, so they should be avoided. For example, the query
C<SELECT * FROM t1, t2> results in:
The TLIST context lists tables that the query accesses, but which do not appear
in any other context. These tables are usually an implicit cartesian join. For
example, the query C<SELECT * FROM t1, t2> results in:
Query_id: 0xBDDEB6EDA41897A8.1
SELECT t1
@@ -6871,7 +6854,7 @@ C<SELECT * FROM t1, t2> results in:
First of all, there are two SELECT contexts, because C<SELECT *> selects
rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables
are implicitly joined, but without any kind of join condition, which results
in a full cartesian join as indicated by the TLIST context for each.
in a cartesian join as indicated by the TLIST context for each.
=back
@@ -6911,24 +6894,23 @@ first option on the command line.
type: string; default: DUAL
Value to print for constant data. Constant data means all data not
from tables (or subqueries since subqueries are not supported). For example,
real constant values like strings ("foo") and numbers (42), and data from
functions like C<NOW()>. For example, in the query
C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the
table usage report is:
Table to print as the source for constant data (literals). This is any data not
retrieved from tables (or subqueries, because subqueries are not supported).
This includes literal values such as strings ("foo") and numbers (42), or
functions such as C<NOW()>. For example, in the query C<INSERT INTO t (c)
VALUES ('a')>, the string 'a' is constant data, so the table usage report is:
INSERT t
SELECT DUAL
The first line indicates that data is inserted into table C<t> and the second
line indicates that that data comes from some constant value.
The first line indicates that the query inserts data into table C<t>, and the
second line indicates that the inserted data comes from some constant value.
=item --[no]continue-on-error
default: yes
Continue parsing even if there is an error.
Continue to work even if there is an error.
=item --create-table-definitions
@@ -6939,9 +6921,9 @@ If you cannot use L<"--explain-extended"> to fully qualify table and column
names, you can save the output of C<mysqldump --no-data> to one or more files
and specify those files with this option. The tool will parse all
C<CREATE TABLE> definitions from the files and use this information to
qualify table and column names. If a column name is used in multiple tables,
or table name is used in multiple databases, these duplicates cannot be
qualified.
qualify table and column names. If a column name appears in multiple tables,
or a table name appears in multiple databases, the ambiguities cannot be
resolved.
=item --daemonize
@@ -6964,7 +6946,8 @@ Only read mysql options from the given file. You must give an absolute pathname
type: DSN
EXPLAIN EXTENDED queries on this host to fully qualify table and column names.
A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve
ambiguous (unqualified) column and table names.
=item --filter
@@ -6972,89 +6955,13 @@ type: string
Discard events for which this Perl code doesn't return true.
This option is a string of Perl code or a file containing Perl code that gets
compiled into a subroutine with one argument: $event. This is a hashref.
If the given value is a readable file, then pt-table-usage reads the entire
file and uses its contents as the code. The file should not contain
a shebang (#!/usr/bin/perl) line.
If the code returns true, the chain of callbacks continues; otherwise it ends.
The code is the last statement in the subroutine other than C<return $event>.
The subroutine template is:
sub { $event = shift; filter && return $event; }
Filters given on the command line are wrapped inside parentheses like like
C<( filter )>. For complex, multi-line filters, you must put the code inside
a file so it will not be wrapped inside parentheses. Either way, the filter
must produce syntactically valid code given the template. For example, an
if-else branch given on the command line would not be valid:
--filter 'if () { } else { }' # WRONG
Since it's given on the command line, the if-else branch would be wrapped inside
parentheses which is not syntactically valid. So to accomplish something more
complex like this would require putting the code in a file, for example
filter.txt:
my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
Then specify C<--filter filter.txt> to read the code from filter.txt.
If the filter code won't compile, pt-table-usage will die with an error.
If the filter code does compile, an error may still occur at runtime if the
code tries to do something wrong (like pattern match an undefined value).
pt-table-usage does not provide any safeguards so code carefully!
An example filter that discards everything but SELECT statements:
--filter '$event->{arg} =~ m/^select/i'
This is compiled into a subroutine like the following:
sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
It is permissible for the code to have side effects (to alter C<$event>).
You can find an explanation of the structure of $event at
L<http://code.google.com/p/maatkit/wiki/EventAttributes>.
Here are more examples of filter code:
=over
=item Host/IP matches domain.com
--filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'
Sometimes MySQL logs the host where the IP is expected. Therefore, we
check both.
=item User matches john
--filter '($event->{user} || "") =~ m/john/'
=item More than 1 warning
--filter '($event->{Warning_count} || 0) > 1'
=item Query does full table scan or full join
--filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'
=item Query was not served from query cache
--filter '($event->{QC_Hit} || "") eq "No"'
=item Query is 1 MB or larger
--filter '$event->{bytes} >= 1_048_576'
=back
Since L<"--filter"> allows you to alter C<$event>, you can use it to do other
things, like create new attributes.
This option is a string of Perl code or a file containing Perl code that is
compiled into a subroutine with one argument: $event. If the given value is a
readable file, then pt-table-usage reads the entire file and uses its contents
as the code.
Filters are implemented in the same fashion as in the pt-query-digest tool, so
please refer to its documentation for more information.
=item --help
@@ -7070,9 +6977,8 @@ Connect to host.
type: string
Identify each event using this attribute. If not ID attribute is given, then
events are identified with the query's checksum: an MD5 hex checksum of the
query's fingerprint.
Identify each event using this attribute. The default is to use a query ID,
which is an MD5 checksum of the query's fingerprint.
=item --log
@@ -7115,10 +7021,7 @@ number of iterations.
type: string
Analyze only this given query. If you want to analyze the table usage of
one simple query by providing on the command line instead of reading it
from a slow log file, then specify that query with this option. The default
L<"--id-attribute"> will be used which is the query's checksum.
Analyze the specified query instead of reading a log file.
=item --read-timeout
@@ -7127,7 +7030,7 @@ type: time; default: 0
Wait this long for an event from the input; 0 to wait forever.
This option sets the maximum time to wait for an event from the input. If an
event is not received after the specified time, the script stops reading the
event is not received after the specified time, the tool stops reading the
input and prints its reports.
This option requires the Perl POSIX module.