diff --git a/bin/pt-table-checksum b/bin/pt-table-checksum index 1b406c65..75d956c4 100755 --- a/bin/pt-table-checksum +++ b/bin/pt-table-checksum @@ -9020,17 +9020,21 @@ won't break replication (or simply fail to replicate). If you are sure that it's OK to run the checksum queries, you can negate this option to disable the checks. See also L<"--replicate-database">. +See also L<"REPLICA CHECKS">. + =item --check-slave-lag type: string; group: Throttle Pause checksumming until this replica's lag is less than L<"--max-lag">. The value is a DSN that inherits properties from the master host and the connection -options (L<"--port">, L<"--user">, etc.). This option overrides the normal -behavior of finding and continually monitoring replication lag on ALL connected -replicas. If you don't want to monitor ALL replicas, but you want more than -just one replica to be monitored, then use the DSN option to the -L<"--recursion-method"> option instead of this option. +options (L<"--port">, L<"--user">, etc.). By default, pt-table-checksum +monitors lag on all connected replicas, but this option limits lag monitoring +to the specified replica. This is useful if certain replicas are intentionally +lagged (with L for example), in which case you can specify +a normal replica to monitor. + +See also L<"REPLICA CHECKS">. =item --chunk-index @@ -9301,8 +9305,7 @@ all replicas to which it connects, using Seconds_Behind_Master. If any replica is lagging more than the value of this option, then pt-table-checksum will sleep for L<"--check-interval"> seconds, then check all replicas again. If you specify L<"--check-slave-lag">, then the tool only examines that server for -lag, not all servers. If you want to control exactly which servers the tool -monitors, use the DSN value to L<"--recursion-method">. +lag, not all servers. The tool waits forever for replicas to stop lagging. If any replica is stopped, the tool waits forever until the replica is started. Checksumming @@ -9312,6 +9315,8 @@ The tool prints progress reports while waiting. If a replica is stopped, it prints a progress report immediately, then again at every progress report interval. +See also L<"REPLICA CHECKS">. + =item --max-load type: Array; default: Threads_running=25; group: Throttle @@ -9393,13 +9398,15 @@ or checksum differences. type: int Number of levels to recurse in the hierarchy when discovering replicas. -Default is infinite. See also L<"--recursion-method">. +Default is infinite. See also L<"--recursion-method"> and L<"REPLICA CHECKS">. =item --recursion-method type: array; default: processlist,hosts -Preferred recursion method for discovering replicas. Possible methods are: +Preferred recursion method for discovering replicas. pt-table-checksum +performs several L<"REPLICA CHECKS"> before and while running. +Possible methods are: METHOD USES =========== ================== @@ -9408,18 +9415,21 @@ Preferred recursion method for discovering replicas. Possible methods are: dsn=DSN DSNs from a table none Do not find slaves -The processlist method is the default, because SHOW SLAVE HOSTS is not -reliable. However, the hosts method can work better if the server uses a -non-standard port (not 3306). The tool usually does the right thing and -finds all replicas, but you may give a preferred method and it will be used -first. +The C method is the default, because C is not +reliable. However, if the server uses a non-standard port (not 3306), then +the C method becomes the default because it works better in this case. -The hosts method requires replicas to be configured with report_host, -report_port, etc. +The C method requires replicas to be configured with C, +C, etc. -The dsn method is special: it specifies a table from which other DSN strings -are read. The specified DSN must specify a D and t, or a database-qualified -t. The DSN table should have the following structure: +The C method is special: rather than automatically discovering replicas, +this method specifies a table with replica DSNs. The tool will only connect +to these replicas. This method works best when replicas do not use the same +MySQL username or password as the master, or when you want to prevent the tool +from connecting to certain replicas. The C method is specified like: +C<--recursion-method dsn=h=host,D=percona,t=dsns>. The specified DSN must +have D and t parts, or just a database-qualified t part, which specify the +DSN table. The DSN table must have the following structure: CREATE TABLE `dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, @@ -9428,10 +9438,13 @@ t. The DSN table should have the following structure: PRIMARY KEY (`id`) ); -To make the tool monitor only the hosts 10.10.1.16 and 10.10.1.17 for -replication lag and checksum differences, insert the values C and -C into the table. Currently, the DSNs are ordered by id, but id -and parent_id are otherwise ignored. +DSNs are ordered by C, but C and C are otherwise ignored. +The C column contains a replica DSN like it would be given on the command +line, for example: C<"h=replica_host,u=repl_user,p=repl_pass">. + +The C method prevents the tool from connecting to any replicas. +This effectively disables all the L<"REPLICA CHECKS"> because there will +not be any replicas to check. Thefore, this method is not recommended. =item --replicate @@ -9587,6 +9600,60 @@ keyword. You might need to quote the value. Here is an example: =back +=head1 REPLICA CHECKS + +By default, pt-table-checksum attempts to find and connect to all replicas +connected to the master host. This automated process is called +"slave recursion" and is controlled by the L<"--recursion-method"> and +L<"--recurse"> options. The tool performs these checks on all replicas: + +=over + +=item 1. L<"--[no]check-replication-filters"> + +pt-table-checksum checks for replication filters on all replicas because +they can complicate or break the checksum process. By default, the tool +will exit if any replication filters are found, but this check can be +disabled by specifying C<--no-check-replication-filters>. + +=item 2. L<"--replicate"> table + +pt-table-cheksum checks that the L<"--replicate"> table exists on all +replicas, else checksumming can break replication when updates to the table +on the master replicate to a replica that doesn't have the table. This +check cannot be disabled, and the tool wait forever until the table +exists on all replicas, printing L<"--progress"> messages while it waits. + +=item 3. Single chunk size + +If a table can be checksummed in a single chunk on the master, +pt-table-checksum will check that the table size on all replicas is +approximately the same. This prevents a rare problem where the table +on the master is empty or small, but on a replica it is much larger. +In this case, the single chunk checksum on the master would overload +the replica. This check cannot be disabled. + +=item 4. Lag + +After each chunk, pt-table-checksum checks the lag on all replicas, or only +the replica specified by L<"--check-slave-lag">. This helps the tool +not to overload the replicas with checksum data. There is no way to +disable this check, but you can specify a single replica to check with +L<"--check-slave-lag">, and if that replica is the fastest, it will help +prevent the tool from waiting too long for replica lag to abate. + +=item 5. Checksum chunks + +When pt-table-checksum finishes checksumming a table, it waits for the last +checksum chunk to replicate to all replicas so it can perform the +L<"--[no]replicate-check">. Disabling that option by specifying +L<--no-replicate-check> disables this check, but it also disables +immediate reporting of checksum differences, thereby requiring a second run +of the tool with L<"--replicate-check-only"> to find and print checksum +differences. + +=back + =head1 DSN OPTIONS These DSN options are used to create a DSN. Each option is given like