mirror of
https://github.com/percona/percona-toolkit.git
synced 2025-09-27 16:12:04 +00:00
Add --retries. Increase ROWS col by 1 char. Update POD. Add --max-load='' in tests until fixed.
This commit is contained in:
@@ -6449,7 +6449,7 @@ sub exec_nibble {
|
||||
my $chunk_index = $nibble_iter->nibble_index();
|
||||
|
||||
return $retry->retry(
|
||||
tries => 2,
|
||||
tries => $o->get('retries'),
|
||||
wait => sub { return; },
|
||||
try => sub {
|
||||
# ###################################################################
|
||||
@@ -6581,7 +6581,7 @@ sub exec_nibble {
|
||||
}
|
||||
|
||||
{
|
||||
my $line_fmt = "%14s %6s %6s %7s %7s %7s %7s %-s\n";
|
||||
my $line_fmt = "%14s %6s %6s %8s %7s %7s %7s %-s\n";
|
||||
my @headers = qw(TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE);
|
||||
|
||||
sub print_checksum_results {
|
||||
@@ -6983,13 +6983,27 @@ Usage: pt-table-checksum [OPTION...] [DSN]
|
||||
|
||||
pt-table-checksum performs an online replication consistency check by executing
|
||||
checksum queries on the master. The checksum queries replicate and re-execute
|
||||
on replicas, where they will produce different results if the replicas have
|
||||
different data from the master. The C<DSN>, if specified, must be the master
|
||||
on replicas, where they produce different results if the replicas have
|
||||
different data from the master. The C<DSN>, if specified, must be the master
|
||||
host. The tool exits non-zero if any differences are found, or if any warnings
|
||||
or error occur. To execute the tool:
|
||||
or error occur.
|
||||
|
||||
Connect to master on localhost, checksum every table, and check every replica:
|
||||
|
||||
pt-table-checksum
|
||||
|
||||
Connect to master on host1 and checksum only tables in the C<widgets> database:
|
||||
|
||||
pt-table-checksum h=host1 --databases widgets
|
||||
|
||||
Do not checksum, just check all replicas for differences:
|
||||
|
||||
pt-table-checksum --no-recheck
|
||||
|
||||
Only checksum, do not check replicas for differences:
|
||||
|
||||
pt-table-checksum --no-replicate-check
|
||||
|
||||
=head1 RISKS
|
||||
|
||||
The following section is included to inform users about the potential risks,
|
||||
@@ -7013,53 +7027,73 @@ See also L<"BUGS"> for more information on filing bugs and getting help.
|
||||
|
||||
=head1 DESCRIPTION
|
||||
|
||||
pt-table-checksum generates table checksums for MySQL tables, typically
|
||||
useful for verifying your slaves are in sync with the master. The checksums
|
||||
are generated by a query on the server, and there is very little network
|
||||
traffic as a result.
|
||||
pt-table-checksum verifies that data on replicas is identical to data on the
|
||||
master. The tool executes checksum queries on the master which replicate to
|
||||
and re-execute on replicas. Then a simple query is executed on each replica
|
||||
to compare its checksum results to the master's. Different results indicate
|
||||
different data.
|
||||
|
||||
Checksums typically take about twice as long as COUNT(*) on very large InnoDB
|
||||
tables. For smaller tables, COUNT(*) is a good bit faster than the checksums.
|
||||
Finding data differences efficiently and safely is the singular function
|
||||
of pt-table-checksum. If differences are found, the easiest way to isolate
|
||||
or sync them is with pt-table-sync.
|
||||
|
||||
pt-table-checksum is designed to be simple and automatically do the right
|
||||
thing in almost every case. You should familiarize yourself with its
|
||||
L<"OPTIONS"> to see how its internal processes can be altered if necessary.
|
||||
When in doubt, use L<"--explain"> to see how the tool will checksum a table.
|
||||
|
||||
=head1 OUTPUT
|
||||
|
||||
The tool prints tabular output to indicate the results as it goes, such as
|
||||
The tool prints tabular results, a header and one table per line, such as:
|
||||
|
||||
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
|
||||
10-13T16:41:32 0 0 0 1 0 0.475 mysql.columns_priv
|
||||
10-13T16:41:33 0 0 2 1 0 0.389 mysql.db
|
||||
10-13T16:41:33 0 0 0 1 0 0.318 mysql.event
|
||||
10-13T16:41:33 0 0 0 1 0 0.197 mysql.func
|
||||
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
|
||||
10-20T08:36:50 0 0 200 1 0 0.005 sakila.actor
|
||||
10-20T08:36:50 0 0 603 7 0 0.035 sakila.address
|
||||
10-20T08:36:50 0 0 16 1 0 0.003 sakila.category
|
||||
10-20T08:36:50 0 0 600 6 0 0.024 sakila.city
|
||||
|
||||
Each table's results are printed when the tool finishes checksumming the table.
|
||||
The columns are as follows:
|
||||
|
||||
=over
|
||||
|
||||
=item TS
|
||||
|
||||
The timestamp at which the line was printed.
|
||||
The timestamp (without the year) when the tool finished checksumming the table.
|
||||
|
||||
=item ERRORS
|
||||
|
||||
The number of errors encountered during checksumming the table.
|
||||
The number of errors and warnings which occurred while checksumming the table.
|
||||
Errors and warnings are printed to C<STDERR> before the table results to which
|
||||
the errors and warnings refer.
|
||||
|
||||
=item DIFFS
|
||||
|
||||
The number of chunks in the table that are different on one or more replicas
|
||||
than they are on the master.
|
||||
The number of chunks in the table that differ on one or more replicas
|
||||
from the master. If C<--no-replicate-check> is specified, this column
|
||||
will always have zero values. If C<--no-recheck> is specified, then
|
||||
only tables with diffs are printed.
|
||||
|
||||
=item ROWS
|
||||
|
||||
The number of rows selected and checksummed from the table. This value
|
||||
should equal the real total number of rows in the table unless L<"--where">
|
||||
is specified.
|
||||
|
||||
=item CHUNKS
|
||||
|
||||
The number of chunks into which the table was divided.
|
||||
The number of chunks into which the table was divided. This value may
|
||||
vary between runs due to L<"--chunk-time">.
|
||||
|
||||
=item SKIPPED
|
||||
|
||||
The number of chunks that were skipped due to an error or warning, or because
|
||||
The number of chunks that were skipped due errors or warnings, or because
|
||||
they were oversized.
|
||||
|
||||
=item TIME
|
||||
|
||||
The number of seconds elapsed to checksum the table.
|
||||
The number of seconds elapsed to checksum the table, including all time
|
||||
waiting for L<"--max-lag">, L<"--max-load">, and L<"--[no]replicate-check">.
|
||||
|
||||
=item TABLE
|
||||
|
||||
@@ -7067,11 +7101,26 @@ The database and table that was checksummed.
|
||||
|
||||
=back
|
||||
|
||||
Errors, warnings, and L<"--progress"> reports are printed to C<STDERR>.
|
||||
|
||||
See also L<"--quiet">.
|
||||
|
||||
=head1 EXIT STATUS
|
||||
|
||||
A non-zero exit status indicates one or more error, warning, or checksum
|
||||
difference.
|
||||
|
||||
=head1 FINDING REPLICAS
|
||||
|
||||
pt-table-checksum automatically finds all replicas connected to the master
|
||||
(depth 1), and all replicas of those replicas (depth 2), etc. L<"--recurse">
|
||||
and L<"--recursion-method"> control finding replicas because sometimes
|
||||
automatic discovery fails.
|
||||
|
||||
Beware that specifying L<"--defaults-file"> or C<F> (see L<"DSN OPTIONS">)
|
||||
for the master DSN which defines a MySQL socket will probably break automatic
|
||||
replica discovery.
|
||||
|
||||
=head1 QUERIES
|
||||
|
||||
If you are using innotop (see L<http://code.google.com/p/innotop>),
|
||||
@@ -7111,10 +7160,11 @@ Sleep time between checks for L<"--max-lag">.
|
||||
|
||||
default: yes; group: Safety
|
||||
|
||||
Do not L<"--replicate"> if any replication filters are set. When
|
||||
--replicate is specified, pt-table-checksum tries to detect slaves and look
|
||||
for options that filter replication, such as binlog_ignore_db and
|
||||
replicate_do_db. If it finds any such filters, it aborts with an error.
|
||||
Do not checksum if any replication filters are set on any replicas.
|
||||
pt-table-checksum looks for options that filter replication, such as
|
||||
binlog_ignore_db and replicate_do_db. If it finds any such filters,
|
||||
it aborts with an error.
|
||||
|
||||
Replication filtering makes it impossible to be sure that the checksum
|
||||
queries won't break replication or simply fail to replicate. If you are sure
|
||||
that it's OK to run the checksum queries, you can negate this option to
|
||||
@@ -7142,28 +7192,19 @@ use when you are checksumming only a single table, not an entire server.
|
||||
|
||||
type: size; default: 1000
|
||||
|
||||
Approximate number of rows or size of data to checksum at a time. Allowable
|
||||
suffixes are k, M, G.
|
||||
Number of rows to select for each checksum query. Allowable suffixes are
|
||||
k, M, G.
|
||||
|
||||
If you specify a chunk size, pt-table-checksum will try to find an index that
|
||||
will let it split the table into ranges of approximately L<"--chunk-size">
|
||||
rows, based on the table's index statistics. Currently only numeric and date
|
||||
types can be chunked.
|
||||
The chunk size is automatically adjusted to satisify L<"--chunk-time"> when
|
||||
that option is not zero (and it's not by default).
|
||||
|
||||
If the table is chunkable, pt-table-checksum will checksum each range separately
|
||||
with parameters in the checksum query's WHERE clause. If pt-table-checksum
|
||||
cannot find a suitable index, it will do the entire table in one chunk as though
|
||||
you had not specified L<"--chunk-size"> at all. Each table is handled
|
||||
individually, so some tables may be chunked and others not.
|
||||
In general, the chunk size limits how many rows the tool selects for
|
||||
each checksum query. If a table's rows are large, this prevents overloading
|
||||
MySQL with trying to checksum too much data.
|
||||
|
||||
The chunks will be approximately sized, and depending on the distribution of
|
||||
values in the indexed column, some chunks may be larger than the value you
|
||||
specify.
|
||||
|
||||
If you specify a suffix (one of k, M or G), the parameter is treated as a data
|
||||
size rather than a number of rows. The output of SHOW TABLE STATUS is then used
|
||||
to estimate the amount of data the table contains, and convert that to a number
|
||||
of rows.
|
||||
If a table does not have any unique indexes, the chunk size may be
|
||||
inaccurate, in which case L<"--chunk-size-limit"> prevent overloading
|
||||
MySQL.
|
||||
|
||||
=item --chunk-size-limit
|
||||
|
||||
@@ -7171,17 +7212,12 @@ type: float; default: 2.0; group: Safety
|
||||
|
||||
Do not checksum chunks with this many times more rows than L<"--chunk-size">.
|
||||
|
||||
When L<"--chunk-size"> is given it specifies an ideal size for each chunk
|
||||
of a chunkable table (in rows; size values are converted to rows). Before
|
||||
checksumming each chunk, pt-table-checksum checks how many rows are in the
|
||||
chunk with EXPLAIN. If the number of rows reported by EXPLAIN is this many
|
||||
times greater than L<"--chunk-size">, then the chunk is skipped and C<OVERSIZE>
|
||||
is printed for the C<COUNT> column of the L<"OUTPUT">.
|
||||
|
||||
For example, if you specify L<"--chunk-size"> 100 and a chunk has 150 rows,
|
||||
then it is checksummed with the default L<"--chunk-size-limit"> value 2.0
|
||||
because 150 is less than 100 * 2.0. But if the chunk has 205 rows, then it
|
||||
is not checksummed because 205 is greater than 100 * 2.0.
|
||||
When a table has no unique indexes, chunking may result in inaccurate
|
||||
chunk sizes. This option specifies an upper limit to the inaccuracy.
|
||||
C<EXPLAIN> is used to get an estimate of how many rows are in the chunk.
|
||||
If that estimate exceeds the limit, the chunk is skipped. Since
|
||||
L<"--chunk-size"> is adjust automatically (unless L<"--chunk-time"> is zero),
|
||||
the limit varies.
|
||||
|
||||
The minimum value for this option is 1 which means that no chunk can be any
|
||||
larger than L<"--chunk-size">. You probably don't want to specify 1 because
|
||||
@@ -7195,7 +7231,15 @@ You can disable oversize chunk checking by specifying L<"--chunk-size-limit"> 0.
|
||||
|
||||
type: float; default: 0.5
|
||||
|
||||
Taget time for each chunk. Set to 0 to disable.
|
||||
Adjust L<"--chunk-size"> so each checksum query takes this long to execute.
|
||||
|
||||
The tool tracks the checksum rate (rows/second) for all tables and each
|
||||
table individually. These rates are used to adjust L<"--chunk-size">
|
||||
after each checksum query so that the next checksum query takes this amount
|
||||
of time (in seconds) to execute.
|
||||
|
||||
If this option is set to zero, L<"--chunk-size"> is not adjust automatically,
|
||||
so query checksum times will vary, but query checksum sizes will not.
|
||||
|
||||
=item --columns
|
||||
|
||||
@@ -7214,17 +7258,9 @@ first option on the command line.
|
||||
|
||||
default: yes
|
||||
|
||||
Create the replicate table given by L<"--replicate"> if it does not exist.
|
||||
|
||||
Normally, if the replicate table given by L<"--replicate"> does not exist,
|
||||
C<pt-table-checksum> will die. With this option, however, C<pt-table-checksum>
|
||||
will create the replicate table for you, using the database.table name given to
|
||||
L<"--replicate">.
|
||||
|
||||
Create the L<"--replicate"> database and table if they do not exist.
|
||||
The structure of the replicate table is the same as the suggested table
|
||||
mentioned in L<"--replicate">. Note that since ENGINE is not specified, the
|
||||
replicate table will use the server's default storage engine. If you want to
|
||||
use a different engine, you need to create the table yourself.
|
||||
mentioned in L<"--replicate">.
|
||||
|
||||
=item --databases
|
||||
|
||||
@@ -7347,6 +7383,8 @@ Ignore this comma-separated list of tables.
|
||||
|
||||
Table names may be qualified with the database name.
|
||||
|
||||
The L<"--replicate"> table is always automatically ignored.
|
||||
|
||||
=item --ignore-tables-regex
|
||||
|
||||
type: string; group: Filter
|
||||
@@ -7439,7 +7477,7 @@ Re-checksum chunks that L<"--[no]replicate-check"> found to be different.
|
||||
|
||||
type: int
|
||||
|
||||
Number of levels to recurse in the hierarchy when discovering slaves.
|
||||
Number of levels to recurse in the hierarchy when discovering replicas.
|
||||
Default is infinite.
|
||||
|
||||
See L<"--recursion-method">.
|
||||
@@ -7448,7 +7486,7 @@ See L<"--recursion-method">.
|
||||
|
||||
type: string
|
||||
|
||||
Preferred recursion method for discovering slaves.
|
||||
Preferred recursion method for discovering replicas.
|
||||
|
||||
Possible methods are:
|
||||
|
||||
@@ -7458,10 +7496,14 @@ Possible methods are:
|
||||
hosts SHOW SLAVE HOSTS
|
||||
dsn=DSN DSNs from a table
|
||||
|
||||
The C<processlist> method is preferred because SHOW SLAVE HOSTS is not reliable.
|
||||
However, the C<hosts> method is required if the server uses a non-standard
|
||||
port (not 3306). Usually the tool does the right thing and finds all slaves,
|
||||
but you may give a preferred method and it will be used first.
|
||||
The C<processlist> method is preferred because C<SHOW SLAVE HOSTS> is not
|
||||
reliable. However, the C<hosts> method is required if the server uses a
|
||||
non-standard port (not 3306). Usually the tool does the right thing and
|
||||
finds all replicas, but you may give a preferred method and it will be used
|
||||
first.
|
||||
|
||||
The <hosts> method requires replicas to be configured with C<report-host>,
|
||||
C<report-port>, etc.
|
||||
|
||||
The C<dsn> method is special: it specifies a DSN from which other DSN strings
|
||||
are read. The specified DSN must specify a D and t, or a database-qualified
|
||||
@@ -7481,28 +7523,8 @@ ordered by C<id>, but C<id> and C<parent_id> are otherwise ignored.
|
||||
|
||||
type: string; default: percona.checksums
|
||||
|
||||
Replicate checksums to slaves.
|
||||
|
||||
This option enables a completely different checksum strategy for a consistent,
|
||||
lock-free checksum across a master and its slaves. Instead of running the
|
||||
checksum queries on each server, you run them only on the master. You specify a
|
||||
table, fully qualified in db.table format, to insert the results into. The
|
||||
checksum queries will insert directly into the table, so they will be replicated
|
||||
through the binlog to the slaves.
|
||||
|
||||
When the queries are finished replicating, you can run a simple query on each
|
||||
slave to see which tables have differences from the master. With the
|
||||
L<"--[no]replicate-check"> option, pt-table-checksum can run the query for
|
||||
you to make it even easier.
|
||||
|
||||
If you find tables that have differences, you can use the chunk boundaries in a
|
||||
WHERE clause with L<pt-table-sync> to help repair them more efficiently. See
|
||||
L<pt-table-sync> for details.
|
||||
|
||||
The table must have at least these columns: db, tbl, chunk, boundaries,
|
||||
this_crc, master_crc, this_cnt, master_cnt. The table may be named anything you
|
||||
wish. Here is a suggested table structure, which is automatically used for
|
||||
L<"--create-replicate-table"> (MAGIC_create_replicate):
|
||||
Write checksum results to this table. The replicate table must have this
|
||||
structure (MAGIC_create_replicate):
|
||||
|
||||
CREATE TABLE checksums (
|
||||
db char(64) NOT NULL,
|
||||
@@ -7521,35 +7543,21 @@ L<"--create-replicate-table"> (MAGIC_create_replicate):
|
||||
INDEX (ts)
|
||||
) ENGINE=InnoDB;
|
||||
|
||||
Be sure to choose an appropriate storage engine for the checksum table. If you
|
||||
are checksumming InnoDB tables, for instance, a deadlock will break replication
|
||||
if the checksum table is non-transactional, because the transaction will still
|
||||
be written to the binlog. It will then replay without a deadlock on the
|
||||
slave and break replication with "different error on master and slave." This
|
||||
is not a problem with pt-table-checksum, it's a problem with MySQL
|
||||
replication, and you can read more about it in the MySQL manual.
|
||||
By default, L<"--[no]create-replicate-table"> is true, so the database and
|
||||
the table specified by this option are created automatically if they do not
|
||||
exist.
|
||||
|
||||
This works only with statement-based replication (pt-table-checksum will switch
|
||||
the binlog format to STATEMENT for the duration of the session if your server
|
||||
uses row-based replication).
|
||||
Be sure to choose an appropriate storage engine for the replicate table.
|
||||
If you are checksumming InnoDB tables, for instance, a deadlock will break
|
||||
replication if the replicate table is non-transactional because the transaction
|
||||
will still be written to the binlog. It will then replay without a deadlock
|
||||
on the replicas and break replication with "different error on master and
|
||||
slave." This is not a problem with pt-table-checksum; it's a problem with
|
||||
MySQL replication, and you can read more about it in the MySQL manual.
|
||||
|
||||
In contrast to running the tool against multiple servers at once, using this
|
||||
option eliminates the complexities of synchronizing checksum queries across
|
||||
multiple servers, which normally requires locking and unlocking, waiting for
|
||||
master binlog positions, and so on.
|
||||
|
||||
The checksum queries actually do a REPLACE into this table, so existing rows
|
||||
need not be removed before running. However, you may wish to do this anyway to
|
||||
remove rows related to tables that don't exist anymore. The
|
||||
L<"--[no]empty-replicate-table"> option does this for you.
|
||||
|
||||
Since the table must be qualified with a database (e.g. C<db.checksums>),
|
||||
pt-table-checksum will only USE this database. This may be important if any
|
||||
replication options are set because it could affect whether or not changes
|
||||
to the table are replicated.
|
||||
|
||||
If the slaves have any --replicate-do-X or --replicate-ignore-X options, you
|
||||
should be careful not to checksum any databases or tables that exist on the
|
||||
If the slaves have any C<--replicate-do-X> or C<--replicate-ignore-X> options,
|
||||
you should be careful not to checksum any databases or tables that exist on the
|
||||
master and not the slaves. Changes to such tables may not normally be executed
|
||||
on the slaves because of the --replicate options, but the checksum queries
|
||||
modify the contents of the table that stores the checksums, not the tables whose
|
||||
@@ -7558,44 +7566,20 @@ slave, and if the table or database you're checksumming does not exist, the
|
||||
queries will cause replication to fail. For more information on replication
|
||||
rules, see L<http://dev.mysql.com/doc/en/replication-rules.html>.
|
||||
|
||||
The table specified by L<"--replicate"> will never be checksummed itself.
|
||||
The replicate table is never checksummed itself.
|
||||
|
||||
=item --[no]replicate-check
|
||||
|
||||
default: yes
|
||||
|
||||
Check results in L<"--replicate"> table, to the specified depth. You must use
|
||||
this after you run the tool normally; it skips the checksum step and only checks
|
||||
results.
|
||||
Check replicas for data differences.
|
||||
|
||||
It recursively finds differences recorded in the table given by
|
||||
L<"--replicate">. It recurses to the depth you specify: 0 is no recursion
|
||||
(check only the server you specify), 1 is check the server and its slaves, 2 is
|
||||
check the slaves of its slaves, and so on.
|
||||
Differences are found by recursing to replicas, to the depth specified
|
||||
by L<"--recurse">, and executing a simple C<SELECT> statement to compare
|
||||
the replica's checksum results to the master's checksum results. Any
|
||||
differences are reported in the C<DIFFS> column of the L<"OUTPUT">.
|
||||
|
||||
It finds differences by running the query shown in L<"CONSISTENT CHECKSUMS">,
|
||||
and prints results, then exits after printing. This is just a convenient way of
|
||||
running the query so you don't have to do it manually.
|
||||
|
||||
The output is one informational line per slave host, followed by the results
|
||||
of the query, if any. If
|
||||
there are no differences between the master and any slave, there is no output.
|
||||
If any slave has chunks that differ from the master, pt-table-checksum's
|
||||
exit status is 1; otherwise it is 0.
|
||||
|
||||
This option makes C<pt-table-checksum> look for slaves by running C<SHOW
|
||||
PROCESSLIST>. If it finds connections that appear to be from slaves, it derives
|
||||
connection information for each slave with the same default-and-override method
|
||||
described in L<"SPECIFYING HOSTS">.
|
||||
|
||||
If C<SHOW PROCESSLIST> doesn't return any rows, C<pt-table-checksum> looks at
|
||||
C<SHOW SLAVE HOSTS> instead. The host and port, and user and password if
|
||||
available, from C<SHOW SLAVE HOSTS> are combined into a DSN and used as the
|
||||
argument. This requires slaves to be configured with C<report-host>,
|
||||
C<report-port> and so on.
|
||||
|
||||
This requires the @@SERVER_ID system variable, so it works only on MySQL
|
||||
3.23.26 or newer.
|
||||
See also L<"FINDING REPLICAS">.
|
||||
|
||||
=item --replicate-database
|
||||
|
||||
@@ -7622,6 +7606,12 @@ be resumed from the last successful checksum by specifying this option.
|
||||
This option disables L<"--[no]empty-replicate-table"> because if previous
|
||||
checksums are deleted then there is nothing to resume.
|
||||
|
||||
=item --retries
|
||||
|
||||
type: int; default: 2
|
||||
|
||||
Number of checksum query retries for non-fatal failures and warnings.
|
||||
|
||||
=item --separator
|
||||
|
||||
type: string; default: #
|
||||
@@ -7861,6 +7851,6 @@ Place, Suite 330, Boston, MA 02111-1307 USA.
|
||||
|
||||
=head1 VERSION
|
||||
|
||||
pt-table-checksum 1.0.1
|
||||
pt-table-checksum 2.0
|
||||
|
||||
=cut
|
||||
|
Reference in New Issue
Block a user