Add --retries. Increase ROWS col by 1 char. Update POD. Add --max-load='' in tests until fixed.

This commit is contained in:
Daniel Nichter
2011-10-20 10:55:21 -06:00
parent ac9373bbff
commit e3bc2496c2
20 changed files with 184 additions and 176 deletions

View File

@@ -6449,7 +6449,7 @@ sub exec_nibble {
my $chunk_index = $nibble_iter->nibble_index();
return $retry->retry(
tries => 2,
tries => $o->get('retries'),
wait => sub { return; },
try => sub {
# ###################################################################
@@ -6581,7 +6581,7 @@ sub exec_nibble {
}
{
my $line_fmt = "%14s %6s %6s %7s %7s %7s %7s %-s\n";
my $line_fmt = "%14s %6s %6s %8s %7s %7s %7s %-s\n";
my @headers = qw(TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE);
sub print_checksum_results {
@@ -6983,13 +6983,27 @@ Usage: pt-table-checksum [OPTION...] [DSN]
pt-table-checksum performs an online replication consistency check by executing
checksum queries on the master. The checksum queries replicate and re-execute
on replicas, where they will produce different results if the replicas have
different data from the master. The C<DSN>, if specified, must be the master
on replicas, where they produce different results if the replicas have
different data from the master. The C<DSN>, if specified, must be the master
host. The tool exits non-zero if any differences are found, or if any warnings
or error occur. To execute the tool:
or error occur.
Connect to master on localhost, checksum every table, and check every replica:
pt-table-checksum
Connect to master on host1 and checksum only tables in the C<widgets> database:
pt-table-checksum h=host1 --databases widgets
Do not checksum, just check all replicas for differences:
pt-table-checksum --no-recheck
Only checksum, do not check replicas for differences:
pt-table-checksum --no-replicate-check
=head1 RISKS
The following section is included to inform users about the potential risks,
@@ -7013,53 +7027,73 @@ See also L<"BUGS"> for more information on filing bugs and getting help.
=head1 DESCRIPTION
pt-table-checksum generates table checksums for MySQL tables, typically
useful for verifying your slaves are in sync with the master. The checksums
are generated by a query on the server, and there is very little network
traffic as a result.
pt-table-checksum verifies that data on replicas is identical to data on the
master. The tool executes checksum queries on the master which replicate to
and re-execute on replicas. Then a simple query is executed on each replica
to compare its checksum results to the master's. Different results indicate
different data.
Checksums typically take about twice as long as COUNT(*) on very large InnoDB
tables. For smaller tables, COUNT(*) is a good bit faster than the checksums.
Finding data differences efficiently and safely is the singular function
of pt-table-checksum. If differences are found, the easiest way to isolate
or sync them is with pt-table-sync.
pt-table-checksum is designed to be simple and automatically do the right
thing in almost every case. You should familiarize yourself with its
L<"OPTIONS"> to see how its internal processes can be altered if necessary.
When in doubt, use L<"--explain"> to see how the tool will checksum a table.
=head1 OUTPUT
The tool prints tabular output to indicate the results as it goes, such as
The tool prints tabular results, a header and one table per line, such as:
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
10-13T16:41:32 0 0 0 1 0 0.475 mysql.columns_priv
10-13T16:41:33 0 0 2 1 0 0.389 mysql.db
10-13T16:41:33 0 0 0 1 0 0.318 mysql.event
10-13T16:41:33 0 0 0 1 0 0.197 mysql.func
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
10-20T08:36:50 0 0 200 1 0 0.005 sakila.actor
10-20T08:36:50 0 0 603 7 0 0.035 sakila.address
10-20T08:36:50 0 0 16 1 0 0.003 sakila.category
10-20T08:36:50 0 0 600 6 0 0.024 sakila.city
Each table's results are printed when the tool finishes checksumming the table.
The columns are as follows:
=over
=item TS
The timestamp at which the line was printed.
The timestamp (without the year) when the tool finished checksumming the table.
=item ERRORS
The number of errors encountered during checksumming the table.
The number of errors and warnings which occurred while checksumming the table.
Errors and warnings are printed to C<STDERR> before the table results to which
the errors and warnings refer.
=item DIFFS
The number of chunks in the table that are different on one or more replicas
than they are on the master.
The number of chunks in the table that differ on one or more replicas
from the master. If C<--no-replicate-check> is specified, this column
will always have zero values. If C<--no-recheck> is specified, then
only tables with diffs are printed.
=item ROWS
The number of rows selected and checksummed from the table. This value
should equal the real total number of rows in the table unless L<"--where">
is specified.
=item CHUNKS
The number of chunks into which the table was divided.
The number of chunks into which the table was divided. This value may
vary between runs due to L<"--chunk-time">.
=item SKIPPED
The number of chunks that were skipped due to an error or warning, or because
The number of chunks that were skipped due errors or warnings, or because
they were oversized.
=item TIME
The number of seconds elapsed to checksum the table.
The number of seconds elapsed to checksum the table, including all time
waiting for L<"--max-lag">, L<"--max-load">, and L<"--[no]replicate-check">.
=item TABLE
@@ -7067,11 +7101,26 @@ The database and table that was checksummed.
=back
Errors, warnings, and L<"--progress"> reports are printed to C<STDERR>.
See also L<"--quiet">.
=head1 EXIT STATUS
A non-zero exit status indicates one or more error, warning, or checksum
difference.
=head1 FINDING REPLICAS
pt-table-checksum automatically finds all replicas connected to the master
(depth 1), and all replicas of those replicas (depth 2), etc. L<"--recurse">
and L<"--recursion-method"> control finding replicas because sometimes
automatic discovery fails.
Beware that specifying L<"--defaults-file"> or C<F> (see L<"DSN OPTIONS">)
for the master DSN which defines a MySQL socket will probably break automatic
replica discovery.
=head1 QUERIES
If you are using innotop (see L<http://code.google.com/p/innotop>),
@@ -7111,10 +7160,11 @@ Sleep time between checks for L<"--max-lag">.
default: yes; group: Safety
Do not L<"--replicate"> if any replication filters are set. When
--replicate is specified, pt-table-checksum tries to detect slaves and look
for options that filter replication, such as binlog_ignore_db and
replicate_do_db. If it finds any such filters, it aborts with an error.
Do not checksum if any replication filters are set on any replicas.
pt-table-checksum looks for options that filter replication, such as
binlog_ignore_db and replicate_do_db. If it finds any such filters,
it aborts with an error.
Replication filtering makes it impossible to be sure that the checksum
queries won't break replication or simply fail to replicate. If you are sure
that it's OK to run the checksum queries, you can negate this option to
@@ -7142,28 +7192,19 @@ use when you are checksumming only a single table, not an entire server.
type: size; default: 1000
Approximate number of rows or size of data to checksum at a time. Allowable
suffixes are k, M, G.
Number of rows to select for each checksum query. Allowable suffixes are
k, M, G.
If you specify a chunk size, pt-table-checksum will try to find an index that
will let it split the table into ranges of approximately L<"--chunk-size">
rows, based on the table's index statistics. Currently only numeric and date
types can be chunked.
The chunk size is automatically adjusted to satisify L<"--chunk-time"> when
that option is not zero (and it's not by default).
If the table is chunkable, pt-table-checksum will checksum each range separately
with parameters in the checksum query's WHERE clause. If pt-table-checksum
cannot find a suitable index, it will do the entire table in one chunk as though
you had not specified L<"--chunk-size"> at all. Each table is handled
individually, so some tables may be chunked and others not.
In general, the chunk size limits how many rows the tool selects for
each checksum query. If a table's rows are large, this prevents overloading
MySQL with trying to checksum too much data.
The chunks will be approximately sized, and depending on the distribution of
values in the indexed column, some chunks may be larger than the value you
specify.
If you specify a suffix (one of k, M or G), the parameter is treated as a data
size rather than a number of rows. The output of SHOW TABLE STATUS is then used
to estimate the amount of data the table contains, and convert that to a number
of rows.
If a table does not have any unique indexes, the chunk size may be
inaccurate, in which case L<"--chunk-size-limit"> prevent overloading
MySQL.
=item --chunk-size-limit
@@ -7171,17 +7212,12 @@ type: float; default: 2.0; group: Safety
Do not checksum chunks with this many times more rows than L<"--chunk-size">.
When L<"--chunk-size"> is given it specifies an ideal size for each chunk
of a chunkable table (in rows; size values are converted to rows). Before
checksumming each chunk, pt-table-checksum checks how many rows are in the
chunk with EXPLAIN. If the number of rows reported by EXPLAIN is this many
times greater than L<"--chunk-size">, then the chunk is skipped and C<OVERSIZE>
is printed for the C<COUNT> column of the L<"OUTPUT">.
For example, if you specify L<"--chunk-size"> 100 and a chunk has 150 rows,
then it is checksummed with the default L<"--chunk-size-limit"> value 2.0
because 150 is less than 100 * 2.0. But if the chunk has 205 rows, then it
is not checksummed because 205 is greater than 100 * 2.0.
When a table has no unique indexes, chunking may result in inaccurate
chunk sizes. This option specifies an upper limit to the inaccuracy.
C<EXPLAIN> is used to get an estimate of how many rows are in the chunk.
If that estimate exceeds the limit, the chunk is skipped. Since
L<"--chunk-size"> is adjust automatically (unless L<"--chunk-time"> is zero),
the limit varies.
The minimum value for this option is 1 which means that no chunk can be any
larger than L<"--chunk-size">. You probably don't want to specify 1 because
@@ -7195,7 +7231,15 @@ You can disable oversize chunk checking by specifying L<"--chunk-size-limit"> 0.
type: float; default: 0.5
Taget time for each chunk. Set to 0 to disable.
Adjust L<"--chunk-size"> so each checksum query takes this long to execute.
The tool tracks the checksum rate (rows/second) for all tables and each
table individually. These rates are used to adjust L<"--chunk-size">
after each checksum query so that the next checksum query takes this amount
of time (in seconds) to execute.
If this option is set to zero, L<"--chunk-size"> is not adjust automatically,
so query checksum times will vary, but query checksum sizes will not.
=item --columns
@@ -7214,17 +7258,9 @@ first option on the command line.
default: yes
Create the replicate table given by L<"--replicate"> if it does not exist.
Normally, if the replicate table given by L<"--replicate"> does not exist,
C<pt-table-checksum> will die. With this option, however, C<pt-table-checksum>
will create the replicate table for you, using the database.table name given to
L<"--replicate">.
Create the L<"--replicate"> database and table if they do not exist.
The structure of the replicate table is the same as the suggested table
mentioned in L<"--replicate">. Note that since ENGINE is not specified, the
replicate table will use the server's default storage engine. If you want to
use a different engine, you need to create the table yourself.
mentioned in L<"--replicate">.
=item --databases
@@ -7347,6 +7383,8 @@ Ignore this comma-separated list of tables.
Table names may be qualified with the database name.
The L<"--replicate"> table is always automatically ignored.
=item --ignore-tables-regex
type: string; group: Filter
@@ -7439,7 +7477,7 @@ Re-checksum chunks that L<"--[no]replicate-check"> found to be different.
type: int
Number of levels to recurse in the hierarchy when discovering slaves.
Number of levels to recurse in the hierarchy when discovering replicas.
Default is infinite.
See L<"--recursion-method">.
@@ -7448,7 +7486,7 @@ See L<"--recursion-method">.
type: string
Preferred recursion method for discovering slaves.
Preferred recursion method for discovering replicas.
Possible methods are:
@@ -7458,10 +7496,14 @@ Possible methods are:
hosts SHOW SLAVE HOSTS
dsn=DSN DSNs from a table
The C<processlist> method is preferred because SHOW SLAVE HOSTS is not reliable.
However, the C<hosts> method is required if the server uses a non-standard
port (not 3306). Usually the tool does the right thing and finds all slaves,
but you may give a preferred method and it will be used first.
The C<processlist> method is preferred because C<SHOW SLAVE HOSTS> is not
reliable. However, the C<hosts> method is required if the server uses a
non-standard port (not 3306). Usually the tool does the right thing and
finds all replicas, but you may give a preferred method and it will be used
first.
The <hosts> method requires replicas to be configured with C<report-host>,
C<report-port>, etc.
The C<dsn> method is special: it specifies a DSN from which other DSN strings
are read. The specified DSN must specify a D and t, or a database-qualified
@@ -7481,28 +7523,8 @@ ordered by C<id>, but C<id> and C<parent_id> are otherwise ignored.
type: string; default: percona.checksums
Replicate checksums to slaves.
This option enables a completely different checksum strategy for a consistent,
lock-free checksum across a master and its slaves. Instead of running the
checksum queries on each server, you run them only on the master. You specify a
table, fully qualified in db.table format, to insert the results into. The
checksum queries will insert directly into the table, so they will be replicated
through the binlog to the slaves.
When the queries are finished replicating, you can run a simple query on each
slave to see which tables have differences from the master. With the
L<"--[no]replicate-check"> option, pt-table-checksum can run the query for
you to make it even easier.
If you find tables that have differences, you can use the chunk boundaries in a
WHERE clause with L<pt-table-sync> to help repair them more efficiently. See
L<pt-table-sync> for details.
The table must have at least these columns: db, tbl, chunk, boundaries,
this_crc, master_crc, this_cnt, master_cnt. The table may be named anything you
wish. Here is a suggested table structure, which is automatically used for
L<"--create-replicate-table"> (MAGIC_create_replicate):
Write checksum results to this table. The replicate table must have this
structure (MAGIC_create_replicate):
CREATE TABLE checksums (
db char(64) NOT NULL,
@@ -7521,35 +7543,21 @@ L<"--create-replicate-table"> (MAGIC_create_replicate):
INDEX (ts)
) ENGINE=InnoDB;
Be sure to choose an appropriate storage engine for the checksum table. If you
are checksumming InnoDB tables, for instance, a deadlock will break replication
if the checksum table is non-transactional, because the transaction will still
be written to the binlog. It will then replay without a deadlock on the
slave and break replication with "different error on master and slave." This
is not a problem with pt-table-checksum, it's a problem with MySQL
replication, and you can read more about it in the MySQL manual.
By default, L<"--[no]create-replicate-table"> is true, so the database and
the table specified by this option are created automatically if they do not
exist.
This works only with statement-based replication (pt-table-checksum will switch
the binlog format to STATEMENT for the duration of the session if your server
uses row-based replication).
Be sure to choose an appropriate storage engine for the replicate table.
If you are checksumming InnoDB tables, for instance, a deadlock will break
replication if the replicate table is non-transactional because the transaction
will still be written to the binlog. It will then replay without a deadlock
on the replicas and break replication with "different error on master and
slave." This is not a problem with pt-table-checksum; it's a problem with
MySQL replication, and you can read more about it in the MySQL manual.
In contrast to running the tool against multiple servers at once, using this
option eliminates the complexities of synchronizing checksum queries across
multiple servers, which normally requires locking and unlocking, waiting for
master binlog positions, and so on.
The checksum queries actually do a REPLACE into this table, so existing rows
need not be removed before running. However, you may wish to do this anyway to
remove rows related to tables that don't exist anymore. The
L<"--[no]empty-replicate-table"> option does this for you.
Since the table must be qualified with a database (e.g. C<db.checksums>),
pt-table-checksum will only USE this database. This may be important if any
replication options are set because it could affect whether or not changes
to the table are replicated.
If the slaves have any --replicate-do-X or --replicate-ignore-X options, you
should be careful not to checksum any databases or tables that exist on the
If the slaves have any C<--replicate-do-X> or C<--replicate-ignore-X> options,
you should be careful not to checksum any databases or tables that exist on the
master and not the slaves. Changes to such tables may not normally be executed
on the slaves because of the --replicate options, but the checksum queries
modify the contents of the table that stores the checksums, not the tables whose
@@ -7558,44 +7566,20 @@ slave, and if the table or database you're checksumming does not exist, the
queries will cause replication to fail. For more information on replication
rules, see L<http://dev.mysql.com/doc/en/replication-rules.html>.
The table specified by L<"--replicate"> will never be checksummed itself.
The replicate table is never checksummed itself.
=item --[no]replicate-check
default: yes
Check results in L<"--replicate"> table, to the specified depth. You must use
this after you run the tool normally; it skips the checksum step and only checks
results.
Check replicas for data differences.
It recursively finds differences recorded in the table given by
L<"--replicate">. It recurses to the depth you specify: 0 is no recursion
(check only the server you specify), 1 is check the server and its slaves, 2 is
check the slaves of its slaves, and so on.
Differences are found by recursing to replicas, to the depth specified
by L<"--recurse">, and executing a simple C<SELECT> statement to compare
the replica's checksum results to the master's checksum results. Any
differences are reported in the C<DIFFS> column of the L<"OUTPUT">.
It finds differences by running the query shown in L<"CONSISTENT CHECKSUMS">,
and prints results, then exits after printing. This is just a convenient way of
running the query so you don't have to do it manually.
The output is one informational line per slave host, followed by the results
of the query, if any. If
there are no differences between the master and any slave, there is no output.
If any slave has chunks that differ from the master, pt-table-checksum's
exit status is 1; otherwise it is 0.
This option makes C<pt-table-checksum> look for slaves by running C<SHOW
PROCESSLIST>. If it finds connections that appear to be from slaves, it derives
connection information for each slave with the same default-and-override method
described in L<"SPECIFYING HOSTS">.
If C<SHOW PROCESSLIST> doesn't return any rows, C<pt-table-checksum> looks at
C<SHOW SLAVE HOSTS> instead. The host and port, and user and password if
available, from C<SHOW SLAVE HOSTS> are combined into a DSN and used as the
argument. This requires slaves to be configured with C<report-host>,
C<report-port> and so on.
This requires the @@SERVER_ID system variable, so it works only on MySQL
3.23.26 or newer.
See also L<"FINDING REPLICAS">.
=item --replicate-database
@@ -7622,6 +7606,12 @@ be resumed from the last successful checksum by specifying this option.
This option disables L<"--[no]empty-replicate-table"> because if previous
checksums are deleted then there is nothing to resume.
=item --retries
type: int; default: 2
Number of checksum query retries for non-fatal failures and warnings.
=item --separator
type: string; default: #
@@ -7861,6 +7851,6 @@ Place, Suite 330, Boston, MA 02111-1307 USA.
=head1 VERSION
pt-table-checksum 1.0.1
pt-table-checksum 2.0
=cut